Introduction

Historically, mobile CPUs were designed as derivatives of their desktop counterparts. You'd usually cut down on the cache, lower the clock speed and voltage, and maybe tweak the package a bit, and you'd have your mobile CPU. For years, this process of trimming the fat off of desktop (and sometimes server) CPUs to make mobile versions was the industry norm - but then Timna came along.

Timna was supposed to be Intel's highly integrated CPU to be used in sub-$600 PCs, which were unheard of at the time. Timna featured an on-die memory controller (RDRAM however), integrated North Bridge and integrated graphics core. The Timna design was very power-optimized and very cost-optimized. In fact, a lot of the advancements developed by the Timna team were later put into use in other Intel CPUs simply because they were better and cheaper ways of doing things (e.g. some CPU packaging enhancements used in the Pentium 4 were originally developed for Timna). What set Timna apart from Intel's other processors was that it was designed in Israel by a team completely separate from those who handled the desktop Pentium 4 designs. Intel wanted a fresh approach for Timna, and that's exactly what they did get. Unfortunately, after the chip was completed, the market looked bleak for a sub-$600 computer and the chip was scrapped, and the team was reassigned to a new project a month later.

The new project was yet another "out-of-the-box" project called Banias. The idea behind Banias was to design a mobile processor from the ground up; instead of taking a higher end CPU and doing your best to cut down its power usage, you started with a low power consumption target and then built the best CPU that you could from there. With a chip on their shoulder (no pun intended) and a bone to pick with Intel management, the former Timna team did the best that they could on this new chip - and the results were impressive.

Banias, later called the Pentium M, proved to not only be an extremely powerful mobile CPU, but was also one of Intel's most on-time projects - missing the team's target deadline by less than 5 days. For a multi-year project, being off by 5 days is nothing short of impressive - and so was the CPU's architecture. While many will call the Pentium M a Pentium 3 and 4 hybrid, it is far from it. Intel knew that the Pentium 4 wasn't a low-power architecture. The Pentium 4's trace cache, double-pumped ALUs, extremely long pipeline and resulting high frequency operation were horrendous for low power mobile systems. So, as a basis for a mobile chip, the Pentium 4 was out of the question. Instead, Intel borrowed the execution core of the Pentium III; far from the most powerful execution core, but a good starting point for the Pentium M. Remember that the Pentium III's execution core was partly at fault for AMD's early successes with the Athlon, so performance-wise, Intel would have their work cut out for them.

Taking the Pentium III's execution units, Intel went to town on the Pentium M architecture. They implemented an extremely low power, but very large L2 cache - initially at 1MB and later growing to 2MB in the 90nm Pentium M. The large L2 cache plays a very important role in the Pentium M architecture, as it highlights a very bold design decision - to keep the Pentium M pipeline filled at all costs. In order to reach higher frequencies, Intel had to lengthen the pipeline of the Pentium M from that of the Pentium III. The problem with a lengthened pipeline is that any bubbles in the pipe (wasted cycles) are wasted power, and the more of them you have, the more power you're wasting. So Intel outfitted the Pentium M with a very large, very low latency L2 cache to keep that pipeline full. Think of it like placing a really big supermarket right next to your home instead of having a smaller one next to your home or a large one 10 miles away - there are obvious tradeoffs, but if your goal is to remain efficient, the choice is clear.

A large and low latency L2 cache isn't enough, however. Intel also equipped the Pentium M with a fairly sophisticated (at the time) branch prediction unit. With each mispredicted branch, you end up with a large number of wasted clock cycles and that translates into wasted power - so beef up the branch predictor and make sure that you hardly ever mispredict anything in the name of power.

The next thing to tackle was chip layout. Normally, CPUs are designed to exploit the fastest possible circuits within the microprocessor, but in the eyes of the power conscious, any circuit that could run faster than what it needed was wasting power. So, the Pentium M became the first Intel CPU designed with a clock speed wall in mind. Intel would have to rely on their manufacturing to ramp up clock speed from one generation to the next. This is why it took the move from 130nm down to 90nm for the Pentium M to hit 2.0GHz even though it launched at 1.6GHz.

There were other advancements made to the core to improve performance, things like micro-ops fusion and a dedicated stack manager are also at play. We've talked in detail about all of the features that went into the first Pentium M and its later 90nm revision (Dothan), but the end result is a CPU that is highly competitive with the Athlon 64 and the Pentium 4 in notebooks.

Take the first Pentium Ms for example; at 1.6GHz, the first Pentium Ms were faster than 2.66GHz Pentium 4s in notebooks in business and content creation applications. More recently, the first 2.0GHz Pentium Ms based on the Dothan core managed to outperform the Pentium 4 3.2GHz and the Athlon 64 3000+. Pretty impressive for a notebook platform, but what happens when you make the move to the desktop world?

On the desktop, the Pentium 4 runs at higher clock speeds, as does the Athlon 64. Both the Pentium 4 and Athlon 64 have dual channel DDR platforms on the desktop, unlike the majority of notebooks out there. Does the Pentium M have what it takes to be as competitive on the desktop as it is in the mobile sector? Now that the first desktop Pentium M motherboards are shipping, that's why this review is here - to find out.

Problem #1: Can't Use Desktop Chipsets
Comments Locked

77 Comments

View All Comments

  • fitten - Tuesday, February 8, 2005 - link

    Also, it's interesting that there are many benchmarks chosen which are known to stress the weaknesses of the Pentium-M... not that it isn't interesting information. For example, there seems to be a whole lot of FPU intensive benchmarks (around 15 or so, all of which the Pentium-M should lose handily - known before they are even run) so kind of just hammering the point home I guess.

    Anyway, the Dothans held up pretty well from what I can see... Most of the time (except for the notable FPU intensive and memory bandwidth intensive benchmarks), the Dothan compares quite well with Athlon64s of the same clock speed that have the advantage of dual channel memory.
  • fitten - Tuesday, February 8, 2005 - link

    The other interesting thing about the Athlon64 vs. Dothan comparison is that even with dual channel memory bandwidth on the Athlon64's side, the single channel memory bandwidth of the Dothan still keeps it very close in many of the benchmarks and can even beat the dual channel Athlon64s at 400MHz higher clock in some.

    Anyway, the Pentium-M family is a good start. Some tweaking here and there (improved FPU with better FPU performance and maybe another FPU execution unit, improved memory subsystem to make good use of dual channel) and it will be at least as good as the Athlon64s across the board.

    I own three Athlon64 desktops, two AthlonXP desktops, and two Pentium-M laptops and the laptops are by no means "slow" at doing work.
  • KristopherKubicki - Tuesday, February 8, 2005 - link

    teutonicknight: We purposely don't change our test platform too often. Even though we are using a slightly older version of Premiere, it is the same version we have used in our other processor analyses.

    Hope that helps,

    Kristopher
  • kmmatney - Tuesday, February 8, 2005 - link

    There's also a Celeron version that would have been intersting to review. The small L2 cache should hurt the performance, though. I think the celeron version using something like 7 Watts. It would make no sense to put a celeron-M in such an expensive motherboard, though.
  • Slaimus - Tuesday, February 8, 2005 - link

    I think this indirectly shows how AMD needs to update its caching architecture on the K8. They basically carried over the K7 caches, which is just too slow when paired with its memory controller. Instead of being as large as possible (as evidenced by the exclusive caches) at the expense of latency, the K8 needs faster caches. The memory bandwith of L2 vs system memory is only about 2 to 1 on the K8, which is to say the L2 cache is not helping the system memory much.
  • sandorski - Monday, February 7, 2005 - link

    I think the Pentium M mythos can now be laid to rest.
  • mjz5 - Monday, February 7, 2005 - link

    to #29:

    your 2800 is the 754 pin.

    the 3000+ reviewed is the 939 pin which is 1.8. the 3000+ for the 754 is 2.0 ghz
  • kristof007 - Monday, February 7, 2005 - link

    I don't know if anyone else noticed but the charts are a bit off. My A64 2800+ is running at a stock 1.8 ghz .. while in the review the A64 3000+ is running at 1.8 ... weird!
  • knitecrow - Monday, February 7, 2005 - link

    #25

    1) Intel and AMD measure TDP differently... and TDP is not the same as actual power dissipation. The actual dissipation of 90nm A64 is pretty darn good.

    2) A microprocessor is not made of Lego... you can't rearrange/tweak parts to make it faster. It takes a lot of time, energy and talent to make changes -- even then it may not work for the best. Prescott anyone?


    Frankly I’ve been waiting for a good review of P-M's actual performance. I really don't trust those "other" sites.
  • k00kie - Monday, February 7, 2005 - link

Log in

Don't have an account? Sign up now