Longer than a Pentium III, Shorter than a Pentium 4

The first thing that you'll notice about our coverage of the Banias' architecture is that the amount of detail we can provide you is sketchy at best. The reason being that Intel is guarding a great deal of what went into Banias very carefully, so carefully in fact that there are technologies that went into Banias that Intel is afraid to patent, because of the danger of the competition picking up on them through the patent filing.

You get your first dose of Intel's close guarded nature in regards to Banias with the talk of its integer/floating point pipelines. The chip itself has a longer pipeline than the Pentium III, but a shorter pipeline than the Pentium 4. The reason for this is simple; the Pentium III's architecture ended up topping out at just above 1.20GHz on a 0.13-micron process, but in order for Banias to fulfill Intel's desires for a high performing mobile CPU, they would need a higher clock speed. At the same time, remember our earlier discussion about the Pentium 4's pipeline being too long for the good of a mobile CPU. The end result? Something in between the Pentium III's 10-stage pipeline and the Pentium 4's 20-stage pipeline.

Intel wouldn't reveal the exact number of stages, nor what the individual stages are responsible for, but over time we will probably come across this information. For the purposes of this article, just know that the pipeline is longer than the P6 and shorter than NetBurst.

Remember that one of the downsides to having a long pipeline is the penalty incurred for a mispredicted branch. As we've discussed in our articles on the Pentium 4's NetBurst architecture, one of the approaches to improving superscalar microprocessor performance is to predict the path taken in a branch in the code being executed (e.g. choosing the outcome of an if-then statement without knowing whether the 'if' condition can be fulfilled).

Generally speaking, most branches can be correctly predicted, but it's what happens when a branch is incorrectly taken (or not taken) that performance suffers tremendously. In the case of a desktop CPU like the Pentium 4, a branch mispredict means that the entire pipeline must be flushed and execution will start over again, which means we've just wasted a good number of precious clock cycles. For a mobile CPU, the process is the same but now we're not only wasting clock cycles, we're also wasting battery power, which is a limited resource in the mobile world. Now can you begin to understand why a longer pipeline is undesirable in a mobile CPU?

By going with a pipeline that's longer than the Pentium III, the design team immediately brought it upon themselves to make sure that Banias doesn't suffer as much from a mispredicted branch. One way of reducing the penalty of a mispredicted branch is to use a trace cache, just like in the Pentium 4. A trace cache stores decoded micro-ops in their sequence of execution, meaning that in the event of a branch mispredict, the CPU can start later in the pipeline instead of having to go back to square one. The problem with a trace cache is that it eats up quite a few gates and is very power hungry, two things that kept it out of the Banias design.

Without a trace cache, the design team was forced to develop a more accurate branch predictor unit for the Banias core. Although beyond the scope of this article, Banias was outfitted with a branch predictor significantly superior to what was in the Pentium III. The end result was a reduction of mispredicted branches by around 20%.

The History of Banias Banias' Architecture (continued)
Comments Locked

8 Comments

View All Comments

  • zigCorsair - Wednesday, July 14, 2004 - link

    I thought it was a very informative article. Of course, I'll be upset if it's biased, but being a master's student in CS, many of the exact details I was looking for were in here, and for that I say thank you.
  • Zebo - Monday, May 10, 2004 - link

    I don't see whats so impressive. An athlon mobile 2600/2800 xp 35W version, which runs ~2000Mhz will kill these. To little to late.
  • Anonymous User - Wednesday, September 10, 2003 - link

    how the hell could this be a balanced and informative article when in their own analysis they ignored their own data?

    There is no mention of the anamolous nature of the BAPCO test..absolutely NOTHING...

    Its enough for me to question the competency of this site...and even to the point where I suspect that certain unethical compromises have been made.
  • Anonymous User - Wednesday, September 10, 2003 - link

    Yeah, I agree with Sprockkets... same reason Athlon XP loses to the P4 in this benchmark... someone was trying to make the P4 look better, and everything else look worse. Now all the sudden, this new great CPU is getting it's but kicked because of all the P4 optimizations (and probably non-P4 deoptomizations).
  • sprockkets - Tuesday, September 9, 2003 - link

    I wonder why the P4 trashes the PM on Content Creation Performance and nothing else? Maybe it's the stupid skewing toward the P4. Why else would it lose here and kick butt everywhere else? www.theinquirer.net has an article which brought this to readers attention.
  • Anonymous User - Thursday, August 21, 2003 - link

    "Without a trace cache, the design team was forced to develop a more accurate branch predictor unit for the Banias core. Although beyond the scope of this article, Banias was outfitted with a branch predictor significantly superior to what was in the Pentium III. The end result was a reduction of mispredicted branches by around 20%."

    Wouldn't he mean that the branch predictor was superior to the P4?
  • Anonymous User - Tuesday, August 19, 2003 - link

    looks good
  • Anonymous User - Friday, August 8, 2003 - link

    An outstanding well balanced article, after this read I feel I really know about Centrino. Thanks

Log in

Don't have an account? Sign up now