A quick look back at Banias

The core technologies of the Pentium M remain unchanged in Dothan. We've already explained them in great detail but here's a quick recap for those of you who haven't read or don't remember the original article.

The Pentium M is characterized by the following 7 design features and principles:

Mid-Length pipeline

The Pentium M has a pipeline that's shorter than that of the Pentium 4 (much shorter than that of Prescott), but longer than that of the Pentium III. Intel needed a longer pipeline to ensure that higher clock speeds would be possible, but shunned the Pentium 4's extremely long pipeline as it is quite a power hog. Although extremely high clock speeds can be wonderful for performance and marketing, they are a nightmare when it comes to power consumption. The longer your pipeline, the harder you have to work to keep that pipeline filled at all times and the bigger the penalty that you pay if the pipeline is ever left idle or has to be flushed (thanks to a mispredicted branch, for example).

To this day, Intel has still not disclosed the number of stages in the Pentium M pipeline out of an extreme desire to protect the processor's underlying architecture. The only thing we know is that Dothan's pipeline remains unchanged from Banias; a very good thing considering the surprise we all got with Prescott .

Much of Banias (and also Dothan) remains unpatented and protected using trade secret law in order to prevent the underlying ideas behind the CPUs' design from being picked up by competitors.

Micro Ops Fusion

The Pentium M, like all of Intel's modern day microprocessors, decodes regular x86 instructions into smaller micro-ops that are the actual operations sent down the pipeline for execution. Micro Ops Fusion takes certain micro-ops and "fuses" them together so that they are sent down the pipeline together and are either executed in parallel or serially without being reordered (or separated from one another). Micro Ops Fusion can only apply to certain types of instructions, which Intel has not officially disclosed.

The benefits of Micro Ops Fusion are multi-faceted; first, you have the obvious performance improvements, but alongside them, you also have reduced power consumption, thanks to not wasting any cycles waiting for dependent micro ops to retire before working on others.

Dedicated Stack Manager

Banias' dedicated stack manager is another power saving tool integrated into the Banias architecture that is designed to manage stack pointers and other stack-related data. Remember that stacks are used to store information about the current state of the CPU, including data that cannot be kept in registers due to limits in the number of available registers; thus, a dedicated manager can help performance considerably. As usual, whenever efficiency is improved, power consumption is optimized, which is the case with Banias here as well.

High Performance Branch Predictor

Banias' branch predictor reduced mispredicted branches by around 20% when compared to the Pentium III (when running SPEC CPU 2000 tests, but the improvements are very real world). The improvements are thanks to a larger branch history table (for storing data used to predict branches) and better handling of branching in loops, the latter of which is improved in Dothan.

Pentium 4 FSB, Pentium III Execution Units

The execution back end of Banias is identical to that of the Pentium III, making the Pentium M a relatively narrow microprocessor when compared to AMD's Athlon 64 and Intel's Pentium 4. Given the low power target for Banias, this decision makes a lot of sense as it reduces power consumption and die size; but keep in mind that the lack of extreme width in the pipeline means that technologies like Hyper Threading will be kept away from the Pentium M. Instead, we can look forward to having multi-core Pentium M designs, which is made somewhat easier to implement, thanks to a relatively small die.

In order to keep the processor fed, however, Intel implemented the Pentium 4's 64-bit quad-pumped front side bus. Currently, the FSB clock on all Banias (and Dothan) parts is 100MHz quad-pumped (effectively, 400MHz for 3.2GB/s of bandwidth), but by the end of this year, it will move to 133MHz (effectively 533MHz).

Power Saving Cache

Banias (and Dothan) implement an 8-way set associative L2 cache, which is not uncommon amongst modern day microprocessors. A set associative cache increases hit rate (likelihood that something you want will be found in cache) at the expense of increased cache latency. Cache latency is increased because once the location of data is found in cache, in which "way" it exists must be determined and selected - an incorrect determination will further increase cache latency.

In order to optimize the 8-way set associative cache for low power consumption, each "way" is further divided into quadrants. Once a "way" is selected, the L2 controller will determine in which quadrant the needed data resides and only activates that part of the cache. With such a large cache, it is important to save power here as much as possible.

Artificially Limited Clock Speed Design

Generally speaking, when you design a microprocessor, you want it to run as fast as possible. Normally, there's an initial idea of target clock speed and once the chip is actually back from the plant, it's not uncommon to find parts of the chip that run slower than your clock target, while others run faster (sometimes much faster). In desktop microprocessor design, the goal is to speed up the slowest parts of the chip (or critical paths as they are known among chip designers) and tweak the chip and the manufacturing process to run as fast as the fastest parts.

With Banias, Intel took a different approach. The design team set a clock speed target, and if any part of the chip exceeded that clock speed target, then that part of the chip had to be slowed down. The idea was that if a chip can run faster than its target, then you're wasting power - a luxury that isn't present in mobile chip design. The upside to this design methodology is that power consumption is further reduced, and when coupled with the other power-saving advancements that we've talked about, we're dealing with a fairly low power chip. The downside is that each generation of the Pentium M has a very well defined clock speed wall, and the only way over that wall is to use a smaller, cooler and faster manufacturing process. This is why you will see Pentium M ramp much slower in clock speed than any other Intel chip and why you will see clock speed bumps coincide with new manufacturing processes. It also means that if Intel ever has yield problems with a new manufacturing process (which isn't uncommon), the Pentium M will suffer. It's a risky move, but it's the type of move that is necessary to truly build a good mobile CPU.

Index The 5 Things that Comprise Dothan
POST A COMMENT

28 Comments

View All Comments

  • nserra - Wednesday, July 21, 2004 - link

    #3 I agree. Banias is a better chip. It would be nice to see Banias at 0.09 with 1MB cache, would be smaller, cheaper and a lot more chips per waffer, but Intel isn't interested in these yet, at least maybe a Celeron line when Banias phased out.

    Isn't Ati 9100 chipset compatible with Banias and P4 compatible? A bios change or something more wouldn’t do the trick?
    Reply
  • Matthew Daws - Wednesday, July 21, 2004 - link

    Interesting read. Some comments though: the Dothan has a HUGE L2 cache, which people, in a thread over at Ace's, suggest gives it a large edge in many applications (there were complaints that it excels in SpecInt simply because of this, and with very large datasets, performance rapidly tails off). Nothing wrong with that, but it might explain why the Dothan has issues with media-encoding and the like, where the volume of data is so large that the size of the L2 cache becomes less important.

    Also, the test was a little bit of comparing apples to oranges. I see why this was done: to try and give a laptop-like playing field. But Dothan is almost certainly highly optimised to run with, say, single channel, slow RAM. By forcing this on Athlon64 and Pentium 4 desktops, which are optimised for fatter memory channels, you are slightly crippling performance. As such, it's probably a fair test for laptop performance, but probably doesn't indicate how a Dothan-like desktop chip would hold up. This might explain how well it holds its own against the Athlon64 and beats the P4 in many tests.

    Anyhow, good to see a great test of Dothan! Cheers, --Matt
    Reply
  • xsilver - Wednesday, July 21, 2004 - link

    Just a question... I thought the new sucessor to the prescott was going to be the derivative of the dothan -- eg merging back the mobile and desktop solutions? I'm wrong right? So what exactly are they going to replace prescott with? Reply
  • morcegovermelho - Wednesday, July 21, 2004 - link

    Where are the Athlon 64 3000+ scores in Sysmark 2004? (page 8) Reply
  • DigitalDivine - Wednesday, July 21, 2004 - link

    interesting to see that we are going back to the old days when intel and amd matches each other clock for clock. a 1.8ghz centrino about the same as a 1.8ghz athlon64.

    still another note that the p4 is still king in media encoding.

    overall a nice review.

    Reply
  • adntaylor - Wednesday, July 21, 2004 - link

    Excellent chip. However, it's bloody expensive. At $637 it is exactly the same price as a 3.6GHz Prescott 560 or right between Athlon 64 3500+ and 3700+, so it's not a good choice for the desktop.

    Also Anand's comment "...it's faster and uses less power than Banias" is not quite accurate.

    Under full CPU load, yes this is certainly true but, as you'd expect from 90nm, the leakage power has shot right up, meaning that in its low power states, the CPU is draining a great deal more power than Banias. How much time does a laptop spend idling relative to flat out? My guess: quite a bit. I'd still choose a Banias in my laptop for that reason alone.

    Still good article, and I'd love (from a purely academic point of view) to see what this baby could do when coupled up with a dual-channel memory interface and a good desktop chipset!
    Reply
  • sprockkets - Wednesday, July 21, 2004 - link

    Probably the best heat vs. performance processor out there, at least for x86. Why Intel is dumb to shove Prescotts which use 5x more power for the same performance is beyond me; I would get this for a desktop quicklike.

    Of course, we have Intel's TDP instead of what the processor may acutally put out on worst case conditions. That and we don't know what the Athlon 64 at 90nm will put out, at least at 2.0ghz, since all they are doing is a few tweaks to the core (isn't it smaller than 100mm?) That and I guess if you really meant unpatented, that was what to make sure no one really knows why it's so great?
    Reply
  • mkruer - Wednesday, July 21, 2004 - link

    I’m glad that Intel seem to be moving in the right direction with the Dothan, but I do have a question. Why on half the benchmarks is the Athlon benchmarks missing? Reply

Log in

Don't have an account? Sign up now