Memory Latency and Bandwidth

Currently, there are only two manufacturers of desktop Pentium M motherboards who are selling into the channel - AOpen and DFI. Both AOpen and DFI's motherboards came about not because of widespread consumer demand, but because they each had one customer that needed a Pentium M motherboard for a specific application. Once the boards were designed and built, they were later repackaged and made available to the public as an afterthought.

The major issue with both of these motherboards is that they are based on the 855GME chipset. The 855GME only features AGP 4X support, but the killer is in its single-channel DDR333 memory controller. Without DDR400 support, the 855GME starves the Pentium M for bandwidth, as it is only capable of delivering 2.7GB/s of bandwidth to main memory while the Pentium M at 2.0GHz needs 3.2GB/s of bandwidth to remain most efficient. Overclocking the memory bus is somewhat of an option, but not exactly the most desirable one for reasons that we will get to later.

One solution is Intel's recently released mobile 915 chipset, which features a dual-channel DDR1/DDR2 memory controller. The dual channel controller is more than capable of supplying the appropriate memory bandwidth to the Pentium M, if not a bit overboard, but right now mobile 915 isn't an option on the desktop.

With an unsatisfactory amount of memory bandwidth, the Pentium M will undoubtedly be held back in performance in applications where memory bandwidth is most important. As we all know, memory bandwidth and latency are interdependent, so let's see how the latency to main memory compares.

For our memory latency tests, we once again turn to ScienceMark 2.0:

 CPU  Memory Latency
(in ns)
AMD Athlon 64 3200+ (2.0GHz) 50ns
Intel Pentium 4E 560 (3.6GHz) 80ns
Intel Pentium M 755 (2.0GHz) 80ns

With an on-die memory controller, the Athlon 64 obviously offers the lowest latency memory access out of the group. The reason why we used a 2.0GHz Athlon 64 for this comparison was to show the memory latency seen by a CPU clocked identically to the Pentium M. As strong as the Pentium M's branch predictor may be, the trip to main memory will always be longer than the Athlon 64 - increasing the penalty from having a longer pipeline.

When you compare the Pentium M to the Pentium 4, you see the real harm in only having a single channel DDR333 memory controller - the time for the Pentium M to get to main memory is very similar to that of the Pentium 4, even when the latter is using higher latency DDR2 memory. High memory latency will send the performance of the Pentium M tumbling as soon as it leaves the sanctity of its low latency L2 cache.

Low Latency L2 Cache Floating Point Performance
Comments Locked

77 Comments

View All Comments

  • fitten - Tuesday, February 8, 2005 - link

    Also, it's interesting that there are many benchmarks chosen which are known to stress the weaknesses of the Pentium-M... not that it isn't interesting information. For example, there seems to be a whole lot of FPU intensive benchmarks (around 15 or so, all of which the Pentium-M should lose handily - known before they are even run) so kind of just hammering the point home I guess.

    Anyway, the Dothans held up pretty well from what I can see... Most of the time (except for the notable FPU intensive and memory bandwidth intensive benchmarks), the Dothan compares quite well with Athlon64s of the same clock speed that have the advantage of dual channel memory.
  • fitten - Tuesday, February 8, 2005 - link

    The other interesting thing about the Athlon64 vs. Dothan comparison is that even with dual channel memory bandwidth on the Athlon64's side, the single channel memory bandwidth of the Dothan still keeps it very close in many of the benchmarks and can even beat the dual channel Athlon64s at 400MHz higher clock in some.

    Anyway, the Pentium-M family is a good start. Some tweaking here and there (improved FPU with better FPU performance and maybe another FPU execution unit, improved memory subsystem to make good use of dual channel) and it will be at least as good as the Athlon64s across the board.

    I own three Athlon64 desktops, two AthlonXP desktops, and two Pentium-M laptops and the laptops are by no means "slow" at doing work.
  • KristopherKubicki - Tuesday, February 8, 2005 - link

    teutonicknight: We purposely don't change our test platform too often. Even though we are using a slightly older version of Premiere, it is the same version we have used in our other processor analyses.

    Hope that helps,

    Kristopher
  • kmmatney - Tuesday, February 8, 2005 - link

    There's also a Celeron version that would have been intersting to review. The small L2 cache should hurt the performance, though. I think the celeron version using something like 7 Watts. It would make no sense to put a celeron-M in such an expensive motherboard, though.
  • Slaimus - Tuesday, February 8, 2005 - link

    I think this indirectly shows how AMD needs to update its caching architecture on the K8. They basically carried over the K7 caches, which is just too slow when paired with its memory controller. Instead of being as large as possible (as evidenced by the exclusive caches) at the expense of latency, the K8 needs faster caches. The memory bandwith of L2 vs system memory is only about 2 to 1 on the K8, which is to say the L2 cache is not helping the system memory much.
  • sandorski - Monday, February 7, 2005 - link

    I think the Pentium M mythos can now be laid to rest.
  • mjz5 - Monday, February 7, 2005 - link

    to #29:

    your 2800 is the 754 pin.

    the 3000+ reviewed is the 939 pin which is 1.8. the 3000+ for the 754 is 2.0 ghz
  • kristof007 - Monday, February 7, 2005 - link

    I don't know if anyone else noticed but the charts are a bit off. My A64 2800+ is running at a stock 1.8 ghz .. while in the review the A64 3000+ is running at 1.8 ... weird!
  • knitecrow - Monday, February 7, 2005 - link

    #25

    1) Intel and AMD measure TDP differently... and TDP is not the same as actual power dissipation. The actual dissipation of 90nm A64 is pretty darn good.

    2) A microprocessor is not made of Lego... you can't rearrange/tweak parts to make it faster. It takes a lot of time, energy and talent to make changes -- even then it may not work for the best. Prescott anyone?


    Frankly I’ve been waiting for a good review of P-M's actual performance. I really don't trust those "other" sites.
  • k00kie - Monday, February 7, 2005 - link

Log in

Don't have an account? Sign up now