Historically, mobile CPUs were designed as derivatives of their desktop counterparts. You'd usually cut down on the cache, lower the clock speed and voltage, and maybe tweak the package a bit, and you'd have your mobile CPU. For years, this process of trimming the fat off of desktop (and sometimes server) CPUs to make mobile versions was the industry norm - but then Timna came along.

Timna was supposed to be Intel's highly integrated CPU to be used in sub-$600 PCs, which were unheard of at the time. Timna featured an on-die memory controller (RDRAM however), integrated North Bridge and integrated graphics core. The Timna design was very power-optimized and very cost-optimized. In fact, a lot of the advancements developed by the Timna team were later put into use in other Intel CPUs simply because they were better and cheaper ways of doing things (e.g. some CPU packaging enhancements used in the Pentium 4 were originally developed for Timna). What set Timna apart from Intel's other processors was that it was designed in Israel by a team completely separate from those who handled the desktop Pentium 4 designs. Intel wanted a fresh approach for Timna, and that's exactly what they did get. Unfortunately, after the chip was completed, the market looked bleak for a sub-$600 computer and the chip was scrapped, and the team was reassigned to a new project a month later.

The new project was yet another "out-of-the-box" project called Banias. The idea behind Banias was to design a mobile processor from the ground up; instead of taking a higher end CPU and doing your best to cut down its power usage, you started with a low power consumption target and then built the best CPU that you could from there. With a chip on their shoulder (no pun intended) and a bone to pick with Intel management, the former Timna team did the best that they could on this new chip - and the results were impressive.

Banias, later called the Pentium M, proved to not only be an extremely powerful mobile CPU, but was also one of Intel's most on-time projects - missing the team's target deadline by less than 5 days. For a multi-year project, being off by 5 days is nothing short of impressive - and so was the CPU's architecture. While many will call the Pentium M a Pentium 3 and 4 hybrid, it is far from it. Intel knew that the Pentium 4 wasn't a low-power architecture. The Pentium 4's trace cache, double-pumped ALUs, extremely long pipeline and resulting high frequency operation were horrendous for low power mobile systems. So, as a basis for a mobile chip, the Pentium 4 was out of the question. Instead, Intel borrowed the execution core of the Pentium III; far from the most powerful execution core, but a good starting point for the Pentium M. Remember that the Pentium III's execution core was partly at fault for AMD's early successes with the Athlon, so performance-wise, Intel would have their work cut out for them.

Taking the Pentium III's execution units, Intel went to town on the Pentium M architecture. They implemented an extremely low power, but very large L2 cache - initially at 1MB and later growing to 2MB in the 90nm Pentium M. The large L2 cache plays a very important role in the Pentium M architecture, as it highlights a very bold design decision - to keep the Pentium M pipeline filled at all costs. In order to reach higher frequencies, Intel had to lengthen the pipeline of the Pentium M from that of the Pentium III. The problem with a lengthened pipeline is that any bubbles in the pipe (wasted cycles) are wasted power, and the more of them you have, the more power you're wasting. So Intel outfitted the Pentium M with a very large, very low latency L2 cache to keep that pipeline full. Think of it like placing a really big supermarket right next to your home instead of having a smaller one next to your home or a large one 10 miles away - there are obvious tradeoffs, but if your goal is to remain efficient, the choice is clear.

A large and low latency L2 cache isn't enough, however. Intel also equipped the Pentium M with a fairly sophisticated (at the time) branch prediction unit. With each mispredicted branch, you end up with a large number of wasted clock cycles and that translates into wasted power - so beef up the branch predictor and make sure that you hardly ever mispredict anything in the name of power.

The next thing to tackle was chip layout. Normally, CPUs are designed to exploit the fastest possible circuits within the microprocessor, but in the eyes of the power conscious, any circuit that could run faster than what it needed was wasting power. So, the Pentium M became the first Intel CPU designed with a clock speed wall in mind. Intel would have to rely on their manufacturing to ramp up clock speed from one generation to the next. This is why it took the move from 130nm down to 90nm for the Pentium M to hit 2.0GHz even though it launched at 1.6GHz.

There were other advancements made to the core to improve performance, things like micro-ops fusion and a dedicated stack manager are also at play. We've talked in detail about all of the features that went into the first Pentium M and its later 90nm revision (Dothan), but the end result is a CPU that is highly competitive with the Athlon 64 and the Pentium 4 in notebooks.

Take the first Pentium Ms for example; at 1.6GHz, the first Pentium Ms were faster than 2.66GHz Pentium 4s in notebooks in business and content creation applications. More recently, the first 2.0GHz Pentium Ms based on the Dothan core managed to outperform the Pentium 4 3.2GHz and the Athlon 64 3000+. Pretty impressive for a notebook platform, but what happens when you make the move to the desktop world?

On the desktop, the Pentium 4 runs at higher clock speeds, as does the Athlon 64. Both the Pentium 4 and Athlon 64 have dual channel DDR platforms on the desktop, unlike the majority of notebooks out there. Does the Pentium M have what it takes to be as competitive on the desktop as it is in the mobile sector? Now that the first desktop Pentium M motherboards are shipping, that's why this review is here - to find out.

Problem #1: Can't Use Desktop Chipsets


View All Comments

  • bobsmith1492 - Monday, February 07, 2005 - link

    Granted the T8000 here is an Intel fanboy, but please notice Anand was comparing clock-for-clock. Reply
  • T8000 - Monday, February 07, 2005 - link

    There is one big difference between this review and the reviews where the Pentium M did very well: CLOCKSPEED!

    While others where able to get over 2.8 Ghz with aircooling, Anand got just 2.4 Ghz. This may be a coincidence, but it is the difference between surprisingly good performance and a few % below others.

    As most of the benchmarks where based on the stock 2 Ghz, the difference became even greater.

    So this review just shows that the stock speed Pentium M performs about 30% less with about 30% less clockspeed than overclocked versions.

    A slightly redesigned version with higher voltages is not extremely unlikely to hit at least 3 Ghz. Combining that with a desktop chipset will result in stellar performance, as the benchmark scores in this review (x1.5) indicate.

    But since there is no slightly redesigned version and Intel has no good reason to make one, the current Pentium M desktops will only appeal to overclockers and silent computing people.

    Also, for some reason, Anand found the 90W TDP of the 2.4 Ghz A64 closer to the 20W of the P-M than to the 110W of the 3.8 Ghz P4.
  • CSMR - Monday, February 07, 2005 - link

    That's a very good option Zebo, thanks for posting it. Reply
  • teutonicknight - Monday, February 07, 2005 - link

    One suggestion: Why don't you start using a newer version of Premiere for testing? I personally don't use it, but every that I know who does says before Premiere Pro, the program sucked. I'm sure the render results would be much more realistic and accurate if you used a more up to date version of the program Reply
  • Regs - Monday, February 07, 2005 - link

    I was wondering the same thing too Jeff. If you feed it more bandwidth, it would eliminate the pipeline stalls and maybe give it a chance to reach higher clock speeds. Right? Or is it still prohibited by the shorter pipeline to reach higher clock speeds?

    Longer pipeline = wasted clock cycles. But to me that sounds like the PM should actually scale a lot better with a speed boost. Why exactly does it scale badly compared to a P4? Could it be remedied in anyway with a dual channel memory bus?
  • ozzimark - Monday, February 07, 2005 - link

    there's something wrong with the 3400+ in the spec tests. why is the 3000+ beating it consitantly? Reply
  • Warder45 - Monday, February 07, 2005 - link

    Maybe I missed something but I don't see the reason for all the negitivity in the final words. The 2.4Ghz P-M was very close to the A64 2.4Ghz in many of the tests, 3D rendering seemed to slow it down but that looked like it. With better boards and memory the P-M might best the A64 in a clock for clock match up.

    I do agree the prices are way too high. I think Intel really needs to wake up and smell what they have cooking here. With more support and more aggressive priceing they could easily have a winner in the HTPC and SFF markets.
  • plewis00 - Monday, February 07, 2005 - link

    Surely when someone builds a mainboard with the Sonoma (i915) platform using PCI-E and DDR2-533 then it will change. And I wouldn't have thought that's that far off assuming they don't charge rip-off prices for the technology. It would also be perfect for Shuttle systems where the emphasis is on quietness and coolness rather than so much on performance. Reply
  • Zebo - Monday, February 07, 2005 - link

    So's this one very soon..

    ...more than excellent performance wise if Dothan is excellent...power differential hopefully for AMD will be nominal.
  • Sokaku - Monday, February 07, 2005 - link

    While it is true that the A64 has way more bandwidth, I doubt that is the reason why it crushed the P-M in the Professional Applications. I think the real cause is to be found in the P-M's abillity to do FP divisions. The P-III had a pipelined FP unit, however div operations were extremly expensive. My guess would be that Intel haven't thrown much effort into improving on this.

Log in

Don't have an account? Sign up now