Intel's Centrino CPU (Pentium-M): Revolutionizing the Mobile Worldby Anand Lal Shimpi on March 12, 2003 6:10 AM EST
- Posted in
Longer than a Pentium III, Shorter than a Pentium 4
The first thing that you'll notice about our coverage of the Banias' architecture is that the amount of detail we can provide you is sketchy at best. The reason being that Intel is guarding a great deal of what went into Banias very carefully, so carefully in fact that there are technologies that went into Banias that Intel is afraid to patent, because of the danger of the competition picking up on them through the patent filing.
You get your first dose of Intel's close guarded nature in regards to Banias with the talk of its integer/floating point pipelines. The chip itself has a longer pipeline than the Pentium III, but a shorter pipeline than the Pentium 4. The reason for this is simple; the Pentium III's architecture ended up topping out at just above 1.20GHz on a 0.13-micron process, but in order for Banias to fulfill Intel's desires for a high performing mobile CPU, they would need a higher clock speed. At the same time, remember our earlier discussion about the Pentium 4's pipeline being too long for the good of a mobile CPU. The end result? Something in between the Pentium III's 10-stage pipeline and the Pentium 4's 20-stage pipeline.
Intel wouldn't reveal the exact number of stages, nor what the individual stages are responsible for, but over time we will probably come across this information. For the purposes of this article, just know that the pipeline is longer than the P6 and shorter than NetBurst.
Remember that one of the downsides to having a long pipeline is the penalty incurred for a mispredicted branch. As we've discussed in our articles on the Pentium 4's NetBurst architecture, one of the approaches to improving superscalar microprocessor performance is to predict the path taken in a branch in the code being executed (e.g. choosing the outcome of an if-then statement without knowing whether the 'if' condition can be fulfilled).
Generally speaking, most branches can be correctly predicted, but it's what happens when a branch is incorrectly taken (or not taken) that performance suffers tremendously. In the case of a desktop CPU like the Pentium 4, a branch mispredict means that the entire pipeline must be flushed and execution will start over again, which means we've just wasted a good number of precious clock cycles. For a mobile CPU, the process is the same but now we're not only wasting clock cycles, we're also wasting battery power, which is a limited resource in the mobile world. Now can you begin to understand why a longer pipeline is undesirable in a mobile CPU?
By going with a pipeline that's longer than the Pentium III, the design team immediately brought it upon themselves to make sure that Banias doesn't suffer as much from a mispredicted branch. One way of reducing the penalty of a mispredicted branch is to use a trace cache, just like in the Pentium 4. A trace cache stores decoded micro-ops in their sequence of execution, meaning that in the event of a branch mispredict, the CPU can start later in the pipeline instead of having to go back to square one. The problem with a trace cache is that it eats up quite a few gates and is very power hungry, two things that kept it out of the Banias design.
Without a trace cache, the design team was forced to develop a more accurate branch predictor unit for the Banias core. Although beyond the scope of this article, Banias was outfitted with a branch predictor significantly superior to what was in the Pentium III. The end result was a reduction of mispredicted branches by around 20%.