MIPS and choosing a microcontroller
Thursday, 20 May, 2010
Measuring CPU performance in real-time embedded-control applications has always been a widely debated and subjective topic. The market is composed of 8-, 16- and 32-bit microcontroller and digital signal controller products with a wide variety of device pin counts, memory sizes and types and integrated system support and peripheral functions.
One of the key decision criteria for real-time embedded-control designers is processor performance. A common measure of processor performance is millions of instructions per second (MIPS).
On the surface, this is a straightforward metric. But the picture of code executing in a real application can be quite different across several products, all stating the same MIPS. In addition, interrupt responsiveness, code execution predictability and the ability to easily and quickly manipulate I/O pins and register bits are also important considerations.
Standardised benchmarks such as the Dhrystones and EEMBC benchmarks provide product comparisons against a known set of software. Each standardised benchmark has its strengths. However, the best benchmark is always based on the real-time performance requirements set by the development team for each application.
Two key factors that impact processor throughput are the instruction-set architecture, RISC or CISC and the data-path architecture, von Neumann or Harvard.
Before discussing their system impacts, let’s define these terms:
CISC - a CPU where each instruction can execute multiple operations, such as load operand(s) from memory, arithmetic or logical operation and store results in memory.
RISC - a CPU where each instruction executes a simple task, such as loading a register from memory. Other instructions perform arithmetic or logical functions or store a register to memory.
Von Neumann - a data-path architecture where the instructions and operands both use the same address and data bus.
Harvard - a data-path architecture where the instructions and operands use separate address and data-bus structures.
The combined instruction-set and data-path architectures of a controller together determine the peak MIPS of its CPU and, more importantly, the sustainable MIPS across the instruction-set and addressing modes.
Clearly, processors that execute an equal number of instructions across all addressing modes in a single cycle can sustain their peak MIPS. However, it is just as important to consider how much is accomplished with each instruction.
Many microcontroller families implement the combination of a CISC instruction set and a von Neumann data-path architecture. This yields a processor with a complex instruction set executing multiple tasks per instruction, such as: fetch two operands, add them together and store them in a destination address.
The addressing modes for these operands can be quite complex. The von Neumann data path means that a single address and data path are used for all instruction fetches, operand address derivation, operand fetches, execution and results store operations.
Since the data path becomes a bottleneck, instructions can take four, six or more clock cycles to execute. These multiple-cycle instructions reduce the peak MIPS performance by 15 to 50%, depending on the instruction mix. Most manufacturers indicate the number of machine cycles required per addressing mode and per instruction, which can be used to calculate the total number of clock cycles per instruction.
The end result is that the real application throughput can be below the expectations of a processor’s quoted peak MIPS rate.
When RISC architectures were defined, CPU speeds and memory access times were at near parity. So, an architecture that executed small instructions at the same rate as memory access would be optimal. And, since RISC processors typically have operands in CPU registers, the bottleneck to fetch operands is not severe.
RISC architectures certainly execute at high peak MIPS rates. On the other hand, several instructions are required to execute an equivalent CISC multi-operand addition and store function. With CISC, the peak MIPS rate is high but the work accomplished per instruction is relatively low. To support the high clock rates of a RISC processor, the memory needs assistance to keep pace.
This problem has been addressed by instruction prefetch pipelines and cache memory accelerators. These are adequate solutions in data-processing or fixed-algorithm applications, but in real-time control applications where the interrupt response and software execution are unpredictable, these solutions fall short.
The Harvard architecture is prevalent in DSP products but has not been widely adopted in microcontroller families (see Figure 1). Two primary advantages of the Harvard architecture are the multiple data paths for instruction streams and operands and the separation of the instruction word width from the native data-type width.
|
For example, Microchip’s PIC24 16-bit microcontroller and dsPIC33 digital-signal-controller families implement a Harvard architecture with 24-bit instruction words and 16-bit data.
First, since the instruction stream and operands use separate data paths, the instruction width is decoupled from the data-path width. This enables a wider instruction word and richer CISC-like instructions, while maintaining the single-cycle fetch and execution of a RISC machine.
The separate data paths and wide instruction word reduce the number of cycles spent fetching instructions. The wider instruction word also improves instruction-set encoding, thereby decreasing compiled C code size. The Harvard architecture facilitates this blending of the best attributes of RISC and CISC.
Second, the parallel data paths of the Harvard architecture support simultaneous fetching of multiple operands, instruction operation execution and the storing of results - all in a single CPU clock cycle. This combination results in a CPU where peak MIPS and actual application MIPS are nearly identical.
The only exception to single-cycle execution is typically found in branch instructions, when the branch is taken and iterative instructions, such as divide, are executed.
These Harvard-architecture devices can accomplish the same or more processing power at lower clock speeds, due to their internal parallel data paths. Additionally, lower clock speeds typically result in lower power dissipation and reduced EMI.
Another important consideration in embedded real-time systems is interrupt latency. Interrupt response time can be critical to shutting off a motor. The process of transferring from one stream of code to servicing an interrupt requires several steps:
- Completion of the currently executing instruction;
- Saving the CPU registers, such as PC and status, and stack pointer;
- Saving the working registers for the currently executing software routine;
- Loading the CPU’s PC and stack pointer for the new interrupt service routine.
Calculating the worst-case interrupt latency can be heavily weighted by the time to execute the worst-case instruction. For example, if a divide or multi-bit shift instruction takes 15, 20 or more cycles, the worst case interrupt is impacted.
Some advanced 16-bit MCUs and most DSC products have hardware-assisted multi-bit shifts and some even provide interruptible divide instructions. These CPU attributes significantly decrease the worst-case interrupt latency.
The data types in real-time control applications vary widely among 32-bit words, 16-bit words, bytes and bit manipulation. Processors designed for embedded control typically have dedicated instructions that quickly and predictably execute bit manipulation in a single cycle.
Since RISC architectures do not typically include these instructions, a multiple-instruction, multiple-cycle stream is required to set or clear a bit in a peripheral control register or to modify the state of an I/O pin - which is often times critical. This type of bit manipulation operation is typically not included in standardised benchmarks.
The combination of instruction-set and data-path architectures has a significant impact on the continuous throughput and real-time performance in an application. More important than the peak MIPS a processor can achieve on a few simple instructions, is the measure of continuous MIPS and the work done per instruction.
Architectural decisions are impacted by the cost and relative performance of logic and memory semiconductor technology and the time in history during which a product family is defined. Legacy and compatibility has carried those decisions forward, even though semiconductor technology advances have changed the equation.
For today’s dense semiconductor technology, the parallelism of the Harvard architecture offers many advantages. These advanced implementations deliver instructions that accomplish several functions per instruction, at lower clock speeds, offering advantages in power dissipation and improved EMI performance.
Microchip Technology
www.microchip.com
Unlocking next-gen chip efficiency
By studying how heat moves through ultra-thin metal layers, researchers have provided a...
Ancient, 3D paper art helps shape modern wireless tech
Researchers have used ancient 3D paper art, known as kirigami, to create tuneable radio antennas...
Hidden semiconductor activity spotted by researchers
Researchers have discovered that the material that a semiconductor chip device is built on,...