Intel's 90nm Pentium M 755: Dothan Investigatedby Anand Lal Shimpi on July 21, 2004 12:05 AM EST
- Posted in
A quick look back at BaniasThe core technologies of the Pentium M remain unchanged in Dothan. We've already explained them in great detail but here's a quick recap for those of you who haven't read or don't remember the original article.
The Pentium M is characterized by the following 7 design features and principles:
Mid-Length pipelineThe Pentium M has a pipeline that's shorter than that of the Pentium 4 (much shorter than that of Prescott), but longer than that of the Pentium III. Intel needed a longer pipeline to ensure that higher clock speeds would be possible, but shunned the Pentium 4's extremely long pipeline as it is quite a power hog. Although extremely high clock speeds can be wonderful for performance and marketing, they are a nightmare when it comes to power consumption. The longer your pipeline, the harder you have to work to keep that pipeline filled at all times and the bigger the penalty that you pay if the pipeline is ever left idle or has to be flushed (thanks to a mispredicted branch, for example).
To this day, Intel has still not disclosed the number of stages in the Pentium M pipeline out of an extreme desire to protect the processor's underlying architecture. The only thing we know is that Dothan's pipeline remains unchanged from Banias; a very good thing considering the surprise we all got with Prescott .
Much of Banias (and also Dothan) remains unpatented and protected using trade secret law in order to prevent the underlying ideas behind the CPUs' design from being picked up by competitors.
Micro Ops FusionThe Pentium M, like all of Intel's modern day microprocessors, decodes regular x86 instructions into smaller micro-ops that are the actual operations sent down the pipeline for execution. Micro Ops Fusion takes certain micro-ops and "fuses" them together so that they are sent down the pipeline together and are either executed in parallel or serially without being reordered (or separated from one another). Micro Ops Fusion can only apply to certain types of instructions, which Intel has not officially disclosed.
The benefits of Micro Ops Fusion are multi-faceted; first, you have the obvious performance improvements, but alongside them, you also have reduced power consumption, thanks to not wasting any cycles waiting for dependent micro ops to retire before working on others.
Dedicated Stack ManagerBanias' dedicated stack manager is another power saving tool integrated into the Banias architecture that is designed to manage stack pointers and other stack-related data. Remember that stacks are used to store information about the current state of the CPU, including data that cannot be kept in registers due to limits in the number of available registers; thus, a dedicated manager can help performance considerably. As usual, whenever efficiency is improved, power consumption is optimized, which is the case with Banias here as well.
High Performance Branch PredictorBanias' branch predictor reduced mispredicted branches by around 20% when compared to the Pentium III (when running SPEC CPU 2000 tests, but the improvements are very real world). The improvements are thanks to a larger branch history table (for storing data used to predict branches) and better handling of branching in loops, the latter of which is improved in Dothan.
Pentium 4 FSB, Pentium III Execution UnitsThe execution back end of Banias is identical to that of the Pentium III, making the Pentium M a relatively narrow microprocessor when compared to AMD's Athlon 64 and Intel's Pentium 4. Given the low power target for Banias, this decision makes a lot of sense as it reduces power consumption and die size; but keep in mind that the lack of extreme width in the pipeline means that technologies like Hyper Threading will be kept away from the Pentium M. Instead, we can look forward to having multi-core Pentium M designs, which is made somewhat easier to implement, thanks to a relatively small die.
In order to keep the processor fed, however, Intel implemented the Pentium 4's 64-bit quad-pumped front side bus. Currently, the FSB clock on all Banias (and Dothan) parts is 100MHz quad-pumped (effectively, 400MHz for 3.2GB/s of bandwidth), but by the end of this year, it will move to 133MHz (effectively 533MHz).
Power Saving CacheBanias (and Dothan) implement an 8-way set associative L2 cache, which is not uncommon amongst modern day microprocessors. A set associative cache increases hit rate (likelihood that something you want will be found in cache) at the expense of increased cache latency. Cache latency is increased because once the location of data is found in cache, in which "way" it exists must be determined and selected - an incorrect determination will further increase cache latency.
In order to optimize the 8-way set associative cache for low power consumption, each "way" is further divided into quadrants. Once a "way" is selected, the L2 controller will determine in which quadrant the needed data resides and only activates that part of the cache. With such a large cache, it is important to save power here as much as possible.
Artificially Limited Clock Speed DesignGenerally speaking, when you design a microprocessor, you want it to run as fast as possible. Normally, there's an initial idea of target clock speed and once the chip is actually back from the plant, it's not uncommon to find parts of the chip that run slower than your clock target, while others run faster (sometimes much faster). In desktop microprocessor design, the goal is to speed up the slowest parts of the chip (or critical paths as they are known among chip designers) and tweak the chip and the manufacturing process to run as fast as the fastest parts.
With Banias, Intel took a different approach. The design team set a clock speed target, and if any part of the chip exceeded that clock speed target, then that part of the chip had to be slowed down. The idea was that if a chip can run faster than its target, then you're wasting power - a luxury that isn't present in mobile chip design. The upside to this design methodology is that power consumption is further reduced, and when coupled with the other power-saving advancements that we've talked about, we're dealing with a fairly low power chip. The downside is that each generation of the Pentium M has a very well defined clock speed wall, and the only way over that wall is to use a smaller, cooler and faster manufacturing process. This is why you will see Pentium M ramp much slower in clock speed than any other Intel chip and why you will see clock speed bumps coincide with new manufacturing processes. It also means that if Intel ever has yield problems with a new manufacturing process (which isn't uncommon), the Pentium M will suffer. It's a risky move, but it's the type of move that is necessary to truly build a good mobile CPU.