Next up, we'll look at floating point performance.

Flops, programmed by Al Aburto, is a very floating-point intensive benchmark. Analyses show that this benchmark contains:

70% floating point instructions;
only 4% branches; and
Only 34% of instructions are memory instructions.
Note that some of those 70% FP instructions are also memory instructions. Benchmarking with Flops is not real world, but isolates the FPU power.

Al Aburto, about Flops:
" Flops.c is a 'C' program which attempts to estimate your systems floating-point 'MFLOPS' rating for the FADD, FSUB, FMUL, and FDIV operations based on specific 'instruction mixes' (see table below). The program provides an estimate of PEAK MFLOPS performance by making maximal use of register variables with minimal interaction with main memory. The execution loops are all small so that they will fit in any cache."
Flops shows the maximum double precision power that the core has, by making sure that the program fits in the L1-cache. Flops consists of 8 tests, and each test has a different, but well known instruction mix. The most frequently used instructions are FADD (addition), FSUB (subtraction) and FMUL (multiplication).

MOD FADD FSUB FMUL FDIV
iMac G5 1.9GHz
iMac Core Duo 1.83GHz
1 50% 0% 43% 7% 705 876
2 43% 29% 14% 14% 490 366
3 35% 12% 53% 0% 2213 1216
4 47% 0% 53% 0% 1349 1178
5 45% 0% 52% 3% 868 1109
6 45% 0% 55% 0% 1509 1291
7 25% 25% 25% 25% 341 235
8 43% 0% 57% 0% 1440 1264
Average: 1114 942

One of the G5's strengths is in its floating point performance, and here, we see an example of that as it holds a 18% performance advantage over the Core Duo.  This does complicate the performance scene, as the move to Core Duo isn't necessarily going to be a clean victory for Apple today.

The last architectural performance test was the Queens benchmark, which does a great job of measuring the performance of a CPU's branch predictor. 

To test the branch prediction, we used the benchmark "Queens". Queens is a very well known problem where you have to place n chess Queens on an n x n board. The catch is that no single Queen must be able to attack the other. The exhaustive search strategy for finding a solution to placing the Queens on a chess board so that they don't attack each other is the algorithm behind this benchmark, and it contains some very branch intensive code.

Queens has about:

23% branches
45% memory instructions
No FP operations

On a PIII, the Branch misprediction rate is up to 19%! (Typical: 9%) Queens runs perfectly in the L1-cache.

As Johan mentioned in his article, it seemed as if a good branch predictor was very important to the chip's designers.  The necessity for a good branch predictor is also evident when you look at how long it takes the G5 to access main memory.  For this test, we looked at Queens performance with 16 queens on the chessboard:

Branch Predictor Performance - Queens (N=16)

The G5 completely dominates the Core Duo here. With a relatively short pipeline, not as much attention is usually paid to branch prediction as on a chip with a longer pipe.

Architecture and Memory Performance Boot Time
Comments Locked

35 Comments

View All Comments

  • Anand Lal Shimpi - Tuesday, January 31, 2006 - link

    Turning off one core leaves the full 2MB of cache for the other core to use since it is a shared L2.

    Take care,
    Anand
  • Eug - Tuesday, January 31, 2006 - link

    quote:

    Turning off one core leaves the full 2MB of cache for the other core to use since it is a shared L2.

    Take care,
    Anand

    Cool thanks.

    P.S. I have read elsewhere that the new iMac Core Duo uses less than half of the CPU's processing power to play back H.264 Hi-Def 1920x1080 video at a full 24 fps. If true, that's great, because my iMac 2.0 chokes on that. It plays back relatively smoothly, but only at about 12-15 fps.

    That bodes well for a future single-core Yonah Mac mini.

    Then again, probably not, considering that I suspect the iMac Core Duo does so well on H.264 playback because of its Radeon X1600. I'd doubt the Mac mini would get anything close to that any time soon.
  • Anand Lal Shimpi - Tuesday, January 31, 2006 - link

    Max CPU utilization (across both CPUs) when playing a 1080p stream scaled to fit the screen is about 60%, but it usually hovers below 50%. I am not sure whether or not the X1600's H.264 decode acceleration is taken advantage of (I doubt it), I'm trying to find out now. Also remember that on the PC side, the X1600 will only accelerate up to 720p.

    Take care,
    Anand

  • Anand Lal Shimpi - Tuesday, January 31, 2006 - link

    I just confirmed with ATI, the X1600's H.264 decode acceleration is currently not supported under OS X. ATI is working with Apple on trying to get the support built in, but currently it isn't there.

    Take care,
    Anand
  • Eug - Tuesday, January 31, 2006 - link

    quote:

    I just confirmed with ATI, the X1600's H.264 decode acceleration is currently not supported under OS X. ATI is working with Apple on trying to get the support built in, but currently it isn't there.

    Thanks again for the info. That's actually good news in a way. Things are looking up for that single-core Yonah Mac mini HTPC.
  • andrep74 - Tuesday, January 31, 2006 - link

    Isn't performance/Watt a function of the CPU, not the platform?
  • Kyteland - Tuesday, January 31, 2006 - link

    That picture of Jobs doesn't say "PC vs Intel" it says "PowerPC vs Intel". Jobs is just standing in the way. He's comparing the old mac to the new mac.
  • Calin - Tuesday, January 31, 2006 - link

    You could think about it that way - but in the end, the buyer is interested on the total energy consumption/heat production (as this is what he pays for, and what he must get rid of).
    Have you heard of the Toyota D4D engine? It has a record of 2.4 liter (less than a gallon) diesel fuel used per a hundred kilometers (60 miles). However, the same engine on a Land Cruiser 4x4 all options will get you much less (four times less maybe).
    It doesn't worths talking about performance per watt at the processor level, it is better at the platform level.
  • BUBKA - Tuesday, January 31, 2006 - link

    Were these benches done with a USB 2.0 device plugged in?
  • Furen - Tuesday, January 31, 2006 - link

    I was under the impression that Intel was blaming Microsoft for that, so that would not apply to OSX, though if the driver works perfectly for every platform except Napa I'd guess its a hardware problem that MS will fix in software (which is well enough as long as it works). The power consumption difference is probably less than 10W anyway. It matters on a notebook but hardly matters with a desktop.

Log in

Don't have an account? Sign up now