Understanding the Performance Numbers

As Intel and AMD are adding more and more cores to their CPUs, we encounter two main challenges to keep these CPUs scaling. Cache coherency messages can add a lot of latency and absorb a lot of bandwidth, and at the same time all those cores require more and more bandwidth. So the memory subsystem plays an important role. We still use our older stream binary. This binary was compiled by Alf Birger Rustad using v2.4 of Pathscale's C-compiler. It is a multi-threaded, 64-bit Linux Stream binary. The following compiler switches were used:

-Ofast -lm -static -mp

We ran the stream benchmark on SUSE SLES 11. The stream benchmark produces 4 numbers: copy, scale, add, triad. Triad is the most relevant in our opinion, it is a mix of the other three.

Stream TRIAD on 64 bit linux - maximum threads

The new DDR3 memory controller gives the Opteron 6100 series wings. Compared to the Opteron 2435 which uses DDR-2 800, bandwidth has increased by 130%. Each core gets more bandwidth, which should help a lot of HPC applications. It is a pity of course that the 1.8 GHz Northbridge is limiting the memory subsystem. It would be interesting to see 8-core versions with higher clocked northbridges for the HPC market.

Also notice that the new Xeon 5600 handles DDR3-1333 a lot more efficiently. We measured 15% higher bandwidth from exactly the same DDR3-1333 DIMMs compared to the older Xeon 5570.  

The other important metric for the memory subsystem is latency. Most of our older latency benchmarks (such as the latency test of CPUID) are no longer valid. So we turned to the latency test of Sisoft Sandra 2010.

  Speed (GHz) L1 (Clocks) L2 (Clocks) L3 (Clocks) Memory (ns)
Intel Xeon X5670 2.93GHz 4 10 56 87
Intel Xeon X5570 2.80GHz 4 9 47 81
AMD Opteron 6174 2.20GHz 3 16 57 98
AMD Opteron 2435 2.60GHz 3 16 56 113

 

With Nehalem, Intel increased the latency of the L1 cache from 3 cycles to 4. The tradeoff was meant to allow for future scaling as the basic architecture evolves. The Xeons have the smallest (256 KB) but the fastest L2-cache. The L3-cache of the Xeon 5570 is the fastest, but the latency advantage has disappeared on the Xeon X5670 as the cache size increased from 8 to 12 MB.

Interesting is also the fact that the move from DDR2-800 to DDR3-1333 has also decreased the latency to the memory system by about 15%. There's nothing but good news for the 12-core Opteron here: more bandwith and lower latency access per core.

Benchmark Methods and Systems Rendering: Cinebench 11.5
Comments Locked

58 Comments

View All Comments

  • 564265425722557 - Monday, March 29, 2010 - link

    1. Why is the TDP of the 65W ACP Magny Cours the question mark? And are you sure the TDP of the 80W ACP ones 115W?

    2. The Intel systems have only 24GB ram against the 32GB ram on the 2S magny cours. That's why the 100GB database test favors the Magny cours by a large margin.
  • JohanAnandtech - Monday, March 29, 2010 - link

    AMD told us the TDP values of the Magny-Cours at 80 and 105W ACP. The TDP values of the Lower power versions were not disclosed yet.

    And as we disclosed on the benchmark config page, none of the benches uses more than 20 GB. The vAPus mark I uses about 19 GB. The SQL Server uses much less. While the SQL server test has to scan through the complete index, it does access the complete 100 GB data. There absolutely no advantage for the Opterons there. We checked.

    The fact that we spec the servers like that is a direct consequence of their memory channels (3 and 4). There is not much we can do about that.
  • Penti - Tuesday, March 30, 2010 - link

    How about about 4P performance? It's cheap now and it's AMD whole selling point. I guess you can get a 4P 48-core 128GB system for not that much. How would that compare to a say 2P Nehalem 12-core 92GB? Wouldn't they cost about the same? Will it still be competitive against 8-core 2P Nehalem-EX? And how about the 4P (like 6-core versions) Nehalem-EX? How about the 8-core versions of 6100 series Opterons?
  • elnexus - Wednesday, March 31, 2010 - link

    In answer to cost:

    Compare our 2P Xeon 5600-series Workstation :http://elnexus.com/products.aspx?line_id=15514
    with our 4P Opteron 6100-series Workstation: http://elnexus.com/products.aspx?line_id=15635

    (I hope this isn't condemned as advertising, since it is an attempt to answer a question about price vs performance.)

    Note how low priced the 6128 chip is (the default chip included in the base price).

    AMD, I think are running away from Intel if you factor in the price...
  • Penti - Wednesday, March 31, 2010 - link

    Thanks, I don't condemn it as advertising as this is a new platform so it's interesting and hard to get prices for complete systems yet. Basically 4P 8-core 6100-series opterons with 128GB DDR3 ECC REG cost as much as 2P six-core Xeon (Westmere EP) with 96GB DDR3 ECC REG. Mainly because you can use cheaper 4GB sticks and still get 128GB. And partly because there's no longer any markup for above >2P parts. I guess it accounts for something. Yeah, 6128 chip virtually don't cost nothing for being 4P compatible. Guess it helps AMD for a lot of workload scenarios. And since you can get 4P in 1U it's really nothing that speaks against it. Will be interesting to see what the Nehalem-EX can do though.
  • TitanusComp - Wednesday, April 6, 2011 - link

    You can really get a good idea by comparing this two products:

    48 Cores:
    http://www.titanuscomputers.com/A400-AMD-Workstati...

    24 Cores (Quad SLi Capable)
    http://www.titanuscomputers.com/X450-Intel-High-Pe...

    Now, things to consider, do you need CPU or GPU power?
  • duploxxx - Monday, March 29, 2010 - link

    To make the whole benchmark complete I think you should ask some AMD Opteron 6136 from AMD to get a full review.
  • duploxxx - Monday, March 29, 2010 - link

    and add the 56xx 4core counterpart off course
  • JohanAnandtech - Tuesday, March 30, 2010 - link

    We are working on it. Expect an update with new SKUs this month. I would say next week, but I would like to take some time to do some in depth analysis.
  • Hacp - Monday, March 29, 2010 - link

    Anand,
    I want to ask why are you biased against AMD? You should base your tests based on price. AMD is selling their 12 core for the price of an Intel 6 core. Compare apples to apples! Do a 12 core vs 6 core comparison and see who wins. Otherwise, you are doing a disservice.

Log in

Don't have an account? Sign up now