Understanding the Performance Numbers

As Intel and AMD are adding more and more cores to their CPUs, we encounter two main challenges to keep these CPUs scaling. Cache coherency messages can add a lot of latency and absorb a lot of bandwidth, and at the same time all those cores require more and more bandwidth. So the memory subsystem plays an important role. We still use our older stream binary. This binary was compiled by Alf Birger Rustad using v2.4 of Pathscale's C-compiler. It is a multi-threaded, 64-bit Linux Stream binary. The following compiler switches were used:

-Ofast -lm -static -mp

We ran the stream benchmark on SUSE SLES 11. The stream benchmark produces 4 numbers: copy, scale, add, triad. Triad is the most relevant in our opinion, it is a mix of the other three.

Stream TRIAD on 64 bit linux - maximum threads

The new DDR3 memory controller gives the Opteron 6100 series wings. Compared to the Opteron 2435 which uses DDR-2 800, bandwidth has increased by 130%. Each core gets more bandwidth, which should help a lot of HPC applications. It is a pity of course that the 1.8 GHz Northbridge is limiting the memory subsystem. It would be interesting to see 8-core versions with higher clocked northbridges for the HPC market.

Also notice that the new Xeon 5600 handles DDR3-1333 a lot more efficiently. We measured 15% higher bandwidth from exactly the same DDR3-1333 DIMMs compared to the older Xeon 5570.  

The other important metric for the memory subsystem is latency. Most of our older latency benchmarks (such as the latency test of CPUID) are no longer valid. So we turned to the latency test of Sisoft Sandra 2010.

  Speed (GHz) L1 (Clocks) L2 (Clocks) L3 (Clocks) Memory (ns)
Intel Xeon X5670 2.93GHz 4 10 56 87
Intel Xeon X5570 2.80GHz 4 9 47 81
AMD Opteron 6174 2.20GHz 3 16 57 98
AMD Opteron 2435 2.60GHz 3 16 56 113

 

With Nehalem, Intel increased the latency of the L1 cache from 3 cycles to 4. The tradeoff was meant to allow for future scaling as the basic architecture evolves. The Xeons have the smallest (256 KB) but the fastest L2-cache. The L3-cache of the Xeon 5570 is the fastest, but the latency advantage has disappeared on the Xeon X5670 as the cache size increased from 8 to 12 MB.

Interesting is also the fact that the move from DDR2-800 to DDR3-1333 has also decreased the latency to the memory system by about 15%. There's nothing but good news for the 12-core Opteron here: more bandwith and lower latency access per core.

Benchmark Methods and Systems Rendering: Cinebench 11.5
POST A COMMENT

58 Comments

View All Comments

  • kokotko - Saturday, April 24, 2010 - link

    why you are NOT SHARIG same "shareable" components - like PSU ??????

    NO WONDER THE NUMBERS ARE WORSE ! ! !
    Reply
  • blurian589 - Tuesday, May 11, 2010 - link

    3ds max crashes because of the mental ray renderer. remove the plugin from loading and max will start up. its due to mental ray cannot see more than 16 threads (physical or virtual via hyper-threading). please do test the max rendering performance. thanks Reply
  • Desired_Username - Tuesday, June 29, 2010 - link

    In the final words it states "We estimate that the new Opteron 6174 is about 20% slower than the Xeon 5670 in virtualized servers with very high VM counts. " But in the virtualization section I can't seem to figure out what brought you to that conclusion. The VMmark scores for the Cisco X5680 system was 35.83@26 tiles. You have the VMmark for the 6176SE at 31 which is dead on to the HP DL385 G7 which got 30.96@22 tiles. I see the X5680 15% better at best. And the Cisco x5680 system had 192GB of memory to the HP 6176SE system had 128GB. What am I missing here? Reply
  • jeffjeff - Wednesday, September 22, 2010 - link

    I appreciate AMD's lower CPU cost but on the other hand, Oracle will license me their RDBMS per core and whether it's an Intel 56xx or AMD 61xx, I am still paying a relation of .5 license per core.

    So in the end, I would pay 6 cores for AMD and 3 cores for Intel. The price per core is much higher than the hardware price difference.

    Any thought or solutions on this issue would be appreciated...

    Joffrey
    Reply
  • stealthy - Wednesday, November 24, 2010 - link

    Would it be possible to get the xml parameter files you have used in this test ?
    We are currently in a trial phase at my company to see how the current crop of intel boxes (dual Xeon X5460 procs) hold up against a new z10 system.
    Did you run the swingbench on the server itself or did you use a dedicated client to test ?
    Reply
  • Big_Mr_Mac - Thursday, December 16, 2010 - link

    In 1991 I had an AMD 386-40 that kicked the snot out of Intel pride and joy 486DX2-66. Benchmarks were 25%+ across the board over Intel. Then Intel lied to the market and started passing off cull processors as viable options calling the 25Mhz and 50Mhz processors, when they were actually processors that failed the benchmarks for 33Mhz and 66Mhz respectively.

    In 1998 When Win98 Beta was released I was building Servers and workstations at a Tech-company and Again the AMD was kicking the snot out of Intel. Load times on new system builds, boot time and performance. The Intel chips could not hack it. Then when MS release their actual market version of Win98...all of a sudden you could not even use an AMD processor to run it. You had to wait 2 weeks for MS come up with a "AMD Patch" to run on an AMD system.

    One think I have seen over 20 years in the industry is that Intel will, Lie, Cheat, Steal and Bribe to try and get the upper hand on AMD. Always have....Always Will!!
    Reply
  • rautamiekka - Saturday, December 25, 2010 - link

    Why the fuck are you testing with WinServer and M$ SQL ? Just reading this makes my blood boil 9 times in a second. Reply
  • polbel - Saturday, May 21, 2011 - link

    i've been an amd fan for as long as i can remember. started fixing computers in 1979. used to fix mai basic four minis in the mid-80s that were built on amd bit-slice bipolar cpus on boards that cost 15,000$.

    just got 2 opteron 6172 cpus from ebay for what i thought was peanuts (450 $ each) only to discover upon delivery that both had hairline cracks at a 45 degree angle on one corner of the contact pad surface. looking at their web site i could figure i was out on limb and they would laugh in my face if asked for warranty support on these not-boxed cpus. i know some dumb ass managed to break those cpu corners, and tried to shove the crap to an ebay sucker, but the problem lies deeper, mostly in the g34 socket physical design itself of these otherwise beautiful electronic products. the edge of the metal cover doesn't reach the edge of the fiber board, leaving some unsupported area to be broken by dumb asses mimicking the old days when they could put a 40-pin dip cpu upside-down in its socket. so i'm freshly reviewing my belief system about amd while i figure a solution for this crap-hits-the-fan situation. wish i could have told amd engineers to cover theses last millimeters at the bleeding edge. they might say this and that about warranty, i still hold them responsible for this preventable disaster.

    paul :-)
    Reply

Log in

Don't have an account? Sign up now