Floating Point Performance

Just about a year ago, our own Johan De Gelas made an extremely interesting point about one of the weaknesses of the Pentium M - floating point performance. The theory is this - the Pentium 4, Athlon 64 and Pentium M all have very different platforms, with equally different characteristics. Unfortunately, as we've already shown, the Pentium M is quite possibly the worst off with only a single channel 333MHz DDR memory bus. It's also widely known that most floating point intensive applications are highly memory bandwidth limited, meaning that the Pentium M already has an excuse for poor floating point performance - it doesn't have enough memory bandwidth.

But what if we are able to take memory bandwidth out of the equation? This is where a little benchmark called "flops" comes into play. The beauty of flops is that it executes entirely within the L1 cache of the Pentium M, meaning that the benchmark is limited by two things: the performance of the Pentium M's L1 cache, and more importantly, the performance of the Pentium M's floating point and SSE units.

The actual tests that flops runs are a mixture of floating point add, subtract, multiply and divide operations. The mix of ADD/SUB, MUL and DIV operations is listed next to each test in the table below.

We compiled flops using the latest Intel C compilers to give the Pentium M as solid of a foundation as possible using the /O3 and architecture specific flags under Visual Studio .NET. All of the results are expressed in MFLOPs, higher scores being better:

 Test (% ADD, SUB, MUL, DIV)  AMD Athlon 64 3200+ (2.0GHz)  AMD Athlon 64 FX-55 (2.6GHz)  Intel Pentium 4 3.2GHz  Intel Pentium M 755 (2.0GHz)
1 (50,0,43,7) 1576 2057 1274 899
2 (43,29,14,14) 856 1118 790 492
3 (35,12,53,0) 1388 1802 2476 1470
4 (47,0,53,0) 1244 1622 2792 1601
5 (45,0,52,3) 1477 1923 2351 1019
6 (45,0,55,0) 1466 1908 2762 1607
7 (25,25,25,25) 458 595 365 252
8 (43,0,57,0) 1585 2065 2566 1572
Average 1256 1636 1922 1114

The first comparison to look at is the Athlon 64 3000+ vs the Pentium M 755, since both CPUs run at the same clock speed. Despite the Pentium M's improvements to enhance IPC, the Athlon 64 is still able to outperform it at a core level (without the aid of its memory controller) by almost 13%. But here's where the next Athlon 64 score comes into play - while the Pentium M will hit 2.26GHz by the end of this year, the Athlon 64 will be at or above 3.0GHz. So, the headroom of the Athlon 64's architecture gives it a huge performance advantage here in flops as you can see by the Athlon 64 FX-55 results (remember that the larger L2 cache of the FX-55 has no effect on the flops results as the program runs entirely out of L1).

Next, we have one of the slower Pentium 4s vs. the Pentium M 755. Why not compare to a 3.6GHz or the new 3.8GHz Pentium 4? Well, look at how much the Pentium 4 3.2GHz outperforms the Pentium M 755 - 72% using Intel's 8.1 C++ compiler. When running optimized SSE2/3 code, the Pentium 4 is a much stronger FP performer than what the Pentium M ever could be, which is very important for the following reason: the future of desktop applications is in very floating-point intensive media transcoding tasks, and for those applications, the Pentium M just won't cut it. So, to those who feel that Intel will soon ditch Net Burst in favor of the Pentium M's architecture, the results speak for themselves. While elements of the Pentium M architecture will undoubtedly make an appearance in the Pentium 4's successor, its dated P6 execution core will not.

Memory Latency and Bandwidth The Motherboards
Comments Locked

77 Comments

View All Comments

  • Lupine - Wednesday, February 16, 2005 - link

    I'm surprised at these results. I'm setting up a new Dell Inspiron 9200 (M 725 @ 1.6GHz/400MHz FSB) and it is schooling both my Barton 2500+ @ 2.2GHz and TBred B 1700+ @ 2.2GHz running Stanford's Folding@Home project (600 point proteins: ~37min per frame for the XP boxes compared to ~34min per frame w/ the laptop).

    So, if it is so weak, what is allowing it to process WUs at such a competitive rate? Sure, that is slower than an A64, but competitive w/ most P4 procs.
  • fitten - Thursday, February 10, 2005 - link

    Something else to remember about the Banias/Dothan line of chips... Agressive power reduction was the #1 goal of the design process. In a 'normal' chip design, not all pipeline stages are the same length, the clock speed it runs at is the speed of the slowest part of the CPU. Since power usage is directly related to the frequency of the switching gates, the Intel engineers actually deliberately slowed down some parts of the chip to match the target release speeds (or get close to them) to reduce power consumption. This is, perhaps, the main reason why the frequencies don't scale so well as some would want them to scale.
  • Visual - Thursday, February 10, 2005 - link

    here's another thought... when the opterons launched initially at ECC DDR266, there were similar comments like "give it unbuffered DDR400 or higher and stay out of its way" :) well, now that we have that, ok it did improve performance a bit. but not hugely. shouldn't help the dothan significantly more too.
  • Visual - Thursday, February 10, 2005 - link

    I like how AMD got beaten by the P-M :) not because im intel fan, just because this will make things more interesting now.

    don't catch flame from this comment :p its my oppinion

    Funny how you picked the game benchmarks btw, its almost as if you wanted to show the P-M lacking behind the A64... from what I've seen it beats A64 in HL2 and CSS, and that's a game you don't skip usually :) so why now?

    Also looks suspicious how in lots of tests where P-M performs well with the A64 clock-for-clock or beats it, there is almost no difference in the 3800+ and 4000+ results... like if L2 isnt all that important, yet L2 is exactly how everyone explains the P-M success

    Maybe we'll see some 2MB L2 A64 "emergency edition" once Dothan gets a decent desktop chipset, just like what intel did to (try to) save P4 from the A64 :)
    actually i'd be happy if Dothan motivates AMD to develop faster L2 cache or something.

    Knowing Intel, i dont expect they'd even try to match AMD's prices with the P-M... and there's a lot of room for AMD to decreace prices, as they're selling with quite a margin now. So for sure the P-M won't be cost-effective compared to A64, not if you don't care for ultra-low power consumption at least.

    also it doesn't look likely Dothan could scale beyond 2.6GHz on current 90nm tech. by the time it gets there, AMD should've launched the 2.8 FX and most likely 3GHz too. so I have no doubts AMD will keep the lead for quite a while... maybe the race to 65nm will be the next turning point, as it seems its going smooth for intel (at least for P-M)

    anyway, even if AMD is better in absolute performance, pricepoint and (arguably) clock-for-clock, you gotta admit it to the P-M, it does quite a punch. fun times are coming :)
  • Zebo - Wednesday, February 9, 2005 - link

    dobwal buy intel if you want mhz, AMD is for performance.
  • dobwal - Wednesday, February 9, 2005 - link

    i wasn't referring to the FX series. Plus you are not understanding the point i was trying to make. Lets take a look at the FX series.

    OPN Model Operating Freq. Package ADAFX55DEI5AS FX55 2600MHz 939-Pin
    ADAFX53DEP5AS FX53 2400MHz 939-Pin
    ADAFX53CEP5AT FX53 2400MHz 940-Pin
    ADAFX51CEP5AT FX51 2200MHz 940-Pin
    ADAFX51CEP5AK FX51 2200MHz 940-Pin

    the first FX51 was release around late third quarter 2003. So in a little over a year the FX series has only increased 400 Mhz. Can you automatically assume that the FX has poor scalability in terms of cpu speed. NO. You know why, because the EE is underperforming and can't touch the FX. AMD has no need to push large scale speed increases out of the FX line, which would do nothing but increase cost with each new stepping it used to boost performance.

    The same goes for the Dothan at 2.26Ghz by the end of 2005. What other cpu offers the same level of performance vs. battery life. So why push for performance except to push sales.

    You simply can't determine the scalabiltiy of a cpu based on its roadmap especially when its the performance leader in its market segment and has no current viable competitor or one in the near future.
  • Aileur - Wednesday, February 9, 2005 - link

    Oh and, superpi relies on the fpu to do its calculations, so so much for this fpu is crap trend we have going here.

    http://mod.vr-zone.com.sg/Aopen_i855_review/25sPIm...
  • Aileur - Wednesday, February 9, 2005 - link

    Oh and before you start bragging about the better superpi1mb result of the a64
    http://www.akiba-pc.com/DFI_855/d17g_2608_spi1m.gi...

    this is 1 sec better, with 100mhz less, and single channel ram.
  • Aileur - Wednesday, February 9, 2005 - link

    Since you seem to like xtremesystems
    http://www.akiba-pc.com/DFI_855/d15g_2435_spi1m.PN...
    also a 1ghz overclock, also on default voltage

    Id like to see how an a64 would perform on a kt266 (if that were possible)

    Give the pentium m time to mature and all those "OMG HAHA YOU CPUZ IS SO HOT LOLOL!!!1111" will be obsoleet.
  • Zebo - Wednesday, February 9, 2005 - link

    58 "How long has A64 been stuck on 2.4Ghz."
    ----------------------------------

    There not. 2.6 FX-55 been out for months. More importantly AMD does'nt have to release new chips the way they dominate the benchmarks now. Could they? Hell ya.They got a nice buffer going, New FX's hit 3.0 on stock air. Cheap 90nm's are now hitting 2.7 on default Vcore and air. And by air I mean AMD's cheap all aluminum HS with a itty bitty 15mmx70mm fan, not Prescotts copper core screamers.

    T8000- You're clueless. Maybe it's the heat generated by your prescott making your head woozy, I dunno, but have a look here..1800 Mhz to 2800 Mhz on default Vcore stock fan.
    http://www.xtremesystems.org/forums/showthread.php...

Log in

Don't have an account? Sign up now