Floating Point Performance

Just about a year ago, our own Johan De Gelas made an extremely interesting point about one of the weaknesses of the Pentium M - floating point performance. The theory is this - the Pentium 4, Athlon 64 and Pentium M all have very different platforms, with equally different characteristics. Unfortunately, as we've already shown, the Pentium M is quite possibly the worst off with only a single channel 333MHz DDR memory bus. It's also widely known that most floating point intensive applications are highly memory bandwidth limited, meaning that the Pentium M already has an excuse for poor floating point performance - it doesn't have enough memory bandwidth.

But what if we are able to take memory bandwidth out of the equation? This is where a little benchmark called "flops" comes into play. The beauty of flops is that it executes entirely within the L1 cache of the Pentium M, meaning that the benchmark is limited by two things: the performance of the Pentium M's L1 cache, and more importantly, the performance of the Pentium M's floating point and SSE units.

The actual tests that flops runs are a mixture of floating point add, subtract, multiply and divide operations. The mix of ADD/SUB, MUL and DIV operations is listed next to each test in the table below.

We compiled flops using the latest Intel C compilers to give the Pentium M as solid of a foundation as possible using the /O3 and architecture specific flags under Visual Studio .NET. All of the results are expressed in MFLOPs, higher scores being better:

 Test (% ADD, SUB, MUL, DIV)  AMD Athlon 64 3200+ (2.0GHz)  AMD Athlon 64 FX-55 (2.6GHz)  Intel Pentium 4 3.2GHz  Intel Pentium M 755 (2.0GHz)
1 (50,0,43,7) 1576 2057 1274 899
2 (43,29,14,14) 856 1118 790 492
3 (35,12,53,0) 1388 1802 2476 1470
4 (47,0,53,0) 1244 1622 2792 1601
5 (45,0,52,3) 1477 1923 2351 1019
6 (45,0,55,0) 1466 1908 2762 1607
7 (25,25,25,25) 458 595 365 252
8 (43,0,57,0) 1585 2065 2566 1572
Average 1256 1636 1922 1114

The first comparison to look at is the Athlon 64 3000+ vs the Pentium M 755, since both CPUs run at the same clock speed. Despite the Pentium M's improvements to enhance IPC, the Athlon 64 is still able to outperform it at a core level (without the aid of its memory controller) by almost 13%. But here's where the next Athlon 64 score comes into play - while the Pentium M will hit 2.26GHz by the end of this year, the Athlon 64 will be at or above 3.0GHz. So, the headroom of the Athlon 64's architecture gives it a huge performance advantage here in flops as you can see by the Athlon 64 FX-55 results (remember that the larger L2 cache of the FX-55 has no effect on the flops results as the program runs entirely out of L1).

Next, we have one of the slower Pentium 4s vs. the Pentium M 755. Why not compare to a 3.6GHz or the new 3.8GHz Pentium 4? Well, look at how much the Pentium 4 3.2GHz outperforms the Pentium M 755 - 72% using Intel's 8.1 C++ compiler. When running optimized SSE2/3 code, the Pentium 4 is a much stronger FP performer than what the Pentium M ever could be, which is very important for the following reason: the future of desktop applications is in very floating-point intensive media transcoding tasks, and for those applications, the Pentium M just won't cut it. So, to those who feel that Intel will soon ditch Net Burst in favor of the Pentium M's architecture, the results speak for themselves. While elements of the Pentium M architecture will undoubtedly make an appearance in the Pentium 4's successor, its dated P6 execution core will not.

Memory Latency and Bandwidth The Motherboards
Comments Locked

77 Comments

View All Comments

  • bobsmith1492 - Monday, February 7, 2005 - link

    Granted the T8000 here is an Intel fanboy, but please notice Anand was comparing clock-for-clock.
  • T8000 - Monday, February 7, 2005 - link

    There is one big difference between this review and the reviews where the Pentium M did very well: CLOCKSPEED!

    While others where able to get over 2.8 Ghz with aircooling, Anand got just 2.4 Ghz. This may be a coincidence, but it is the difference between surprisingly good performance and a few % below others.

    As most of the benchmarks where based on the stock 2 Ghz, the difference became even greater.

    So this review just shows that the stock speed Pentium M performs about 30% less with about 30% less clockspeed than overclocked versions.

    A slightly redesigned version with higher voltages is not extremely unlikely to hit at least 3 Ghz. Combining that with a desktop chipset will result in stellar performance, as the benchmark scores in this review (x1.5) indicate.

    But since there is no slightly redesigned version and Intel has no good reason to make one, the current Pentium M desktops will only appeal to overclockers and silent computing people.

    Also, for some reason, Anand found the 90W TDP of the 2.4 Ghz A64 closer to the 20W of the P-M than to the 110W of the 3.8 Ghz P4.
  • CSMR - Monday, February 7, 2005 - link

    That's a very good option Zebo, thanks for posting it.
  • teutonicknight - Monday, February 7, 2005 - link

    One suggestion: Why don't you start using a newer version of Premiere for testing? I personally don't use it, but every that I know who does says before Premiere Pro, the program sucked. I'm sure the render results would be much more realistic and accurate if you used a more up to date version of the program
  • Regs - Monday, February 7, 2005 - link

    I was wondering the same thing too Jeff. If you feed it more bandwidth, it would eliminate the pipeline stalls and maybe give it a chance to reach higher clock speeds. Right? Or is it still prohibited by the shorter pipeline to reach higher clock speeds?

    Longer pipeline = wasted clock cycles. But to me that sounds like the PM should actually scale a lot better with a speed boost. Why exactly does it scale badly compared to a P4? Could it be remedied in anyway with a dual channel memory bus?
  • ozzimark - Monday, February 7, 2005 - link

    there's something wrong with the 3400+ in the spec tests. why is the 3000+ beating it consitantly?
  • Warder45 - Monday, February 7, 2005 - link

    Maybe I missed something but I don't see the reason for all the negitivity in the final words. The 2.4Ghz P-M was very close to the A64 2.4Ghz in many of the tests, 3D rendering seemed to slow it down but that looked like it. With better boards and memory the P-M might best the A64 in a clock for clock match up.

    I do agree the prices are way too high. I think Intel really needs to wake up and smell what they have cooking here. With more support and more aggressive priceing they could easily have a winner in the HTPC and SFF markets.
  • plewis00 - Monday, February 7, 2005 - link

    Surely when someone builds a mainboard with the Sonoma (i915) platform using PCI-E and DDR2-533 then it will change. And I wouldn't have thought that's that far off assuming they don't charge rip-off prices for the technology. It would also be perfect for Shuttle systems where the emphasis is on quietness and coolness rather than so much on performance.
  • Zebo - Monday, February 7, 2005 - link

    CSMR
    So's this one very soon..
    http://www.xtremesystems.org/forums/showthread.php...

    ...more than excellent performance wise if Dothan is excellent...power differential hopefully for AMD will be nominal.
  • Sokaku - Monday, February 7, 2005 - link


    While it is true that the A64 has way more bandwidth, I doubt that is the reason why it crushed the P-M in the Professional Applications. I think the real cause is to be found in the P-M's abillity to do FP divisions. The P-III had a pipelined FP unit, however div operations were extremly expensive. My guess would be that Intel haven't thrown much effort into improving on this.

Log in

Don't have an account? Sign up now