Making Sense of the New Interlagos Opteron

This second look at the current Xeon and Opteron platforms added OLAP, ERP, and OLTP power and performance data. Combine this with our first review and the other publicly available benchmark and power data and we should be able to evaluate the new Opteron 6200 more accurately. So in which situations does the Opteron 6200 make sense? We'll start with the perspective of the server buyer.

Positioning the Opteron 6276

First let's look at the pricing. The Opteron 6276 is priced similar to an E5649, which is clocked 5% lower than the X5650 we tested. If you calculate the price of a Dell R710 with the Xeon E5649 and compare it with a Dell R715 with the Opteron 6276 with similar specs, you end up more or less the same acquisition cost. However, the E5649 is an 80W TDP and should thus consume a bit less power. That is why we argued that the Opteron 6276 should at least offer a price/performance bonus and perform like an X5650. The X5650 is roughly $220 more expensive, so you end up with the dual socket Xeon system costing about $440 more. On a fully speced server, that is about a 10% price difference.

The Opteron 6276 offered similar performance to the Xeon in our MySQL OLTP benchmarks. If we take into account the hard to quantify TPC-C benchmarks, the Opteron 6276 offers equal to slightly better OLTP performance. So for midrange OLTP systems, the Opteron 6276 makes sense if the higher core count does not increase your software license. The same is true for low end ERP systems.

When we look at the higher end OLTP and the non low end ERP market, the cost of buying server hardware is lost in the noise. The Westmere-EX with its higher thread count and performance will be the top choice in that case: higher thread count, better RAS, and a higher number of DIMM slots.

AMD also lost the low end OLAP market: the Xeon offers a (far) superior performance/watt ratio on mySQL. In the midrange and high end OLAP market, the software costs of for example SQL Server increase the importance of performance and performance/watt and make server hardware costs a minor issue. Especially the "performance first" OLAP market will be dominated by the Xeon, which can offer up to 3.06GHz SKUs without increasing the TDP.

The strong HPC performance and the low price continue to make the Opteron a very attractive platform for HPC applications. While we haven't tested this ourself, even Intel admits that they are "challenged in that area".

The Xeon E5, aka Sandy Bridge EP

There is little doubt that the Xeon E5 will be a serious threat for the new Opteron. The Xeon E5 offers for example twice the peak AVX throughput. Add to this the fact that the Xeon will get a quad channel DDR3-1600 memory interface and you know that the Opteron's leadership in HPC applications is going to be challenged. Luckily for AMD, the 8-core top models of the Xeon E5 will not be cheap according to leaked price tables. Much will depend on how the 6-core midrange models fare against the Opteron.

The Hardware Enthusiast Point of View

The disappointing results in the non-server applications is easy to explain as the architecture is clearly more targeted at server workloads. However, the server workloads show a very blurry picture as well. Looking at the server performance results of the new Opteron is nothing less than very confusing. It can be very capable in some applications (OLTP, ERP, HPC) but disappointing in others (OLAP, Rendering). The same is true for the performance/watt results. And of course, if you name a new architecture Bulldozer and you target it at the server space, you expect something better than "similar to a midrange Xeon".

It is clear to us that quite a few things are suboptimal in the first implementation of this new AMD architecture. For example, the second integer cluster (CMT) is doing an excellent job. If you make sure the front end is working at full speed, we measured a solid 70 to 90% increase in performance enabling CMT (we will give more detail in our next article). CMT works superbly and always gives better results than SMT... until you end up with heavy locking contention issues. That indicates that something goes wrong in the front end. The software applications that do not scale well could be served well with low core count "Valencia" Opteron 4200s, but when we write this, the best AMD could offer was a 3.3GHz 6-core. The architecture is clearly capable of reaching very high clockspeeds, but we saw very little performance increase from Turbo Core.

What we end up with then is more questions. That means it's time for us to do some deep profiling and see if we can get some more answers. Until then, we hope you've enjoyed our second round of Interlagos benchmarking, and as always, comments and feedback on our testing methods are welcome.

SAP S&D Benchmark
Comments Locked

46 Comments

View All Comments

  • sonofgodfrey - Thursday, February 9, 2012 - link

    Have you explicitly tested one socket vs. two sockets? We've found an immense increase in contention once a cache-line has to be shared between sockets on some systems.
  • JohanAnandtech - Friday, February 10, 2012 - link

    That is one suggestion I will try out next week. Thanks!
  • Klimax - Thursday, February 9, 2012 - link

    Hello.

    Nice tests.

    However I would like to see MySQL tested on Windows Server 2008 R2
    Would be interesting comparsion.

    (Especially due to http://channel9.msdn.com/shows/Going+Deep/Arun-Kis... )
  • Klimax - Thursday, February 9, 2012 - link

    Title of post is wrong... (I have deleted second thing and forgot to fix title)
  • Scali - Thursday, February 9, 2012 - link

    Unless I'm mistaken, the Xeon 5650 is a 1.17B transistor chip, where the Interlagos 6276 is a 2.4B transistor chip.
    In that light, doesn't that make Intel's SMT implementation a lot better than CMT?
    I mean, yes CMT may give more of a performance boost when you increase the threadcount. But considering the fact that AMD spends more than twice the number of transistors on the chip... well, that's pretty obvious.
    AMD might as well just have used conventional cores.
    The true strength of SMT is not so much that it improves performance in multithreaded scenarios, but that it does so at virtually no extra cost in terms of transistors (and with little or no impact on the single-threaded performance either).
  • JohanAnandtech - Friday, February 10, 2012 - link

    Interlagos is 1.2 billion chip (maybe 1.3 but anyway). Most of those transistors are spend on the L3 cache: about 0.5 billion. Only 213 million transistors are in a module and each module contains a 2 MB L2-cache, probably good for 120 million transistors. That leaves 90 million transistors to the core, and it has been stated that the second cluster added 12%. So that second cluster costs about 12 million transistors, or 48 million on the total 4 module die. That is less than 5% of the total transistor count but you get a 30-90% performance boost!

    So for AMD, this was clearly a great choice.

    SMT is perfect for Intel, as the Intel architecture puts all instructions in one big ROB.

    For very low IPC serverworkloads, I think the CMT approach gives better results. Unfortunately AMD lowered some of the CMT benefits by keeping the datacache so small and the low associativity of the Icache.
  • Scali - Friday, February 10, 2012 - link

    Uhhh, I think you're wrong here... the 4-module Bulldozer is a 1.2B chip (Zambezi). But you tested the 8-module Interlagos (16 threads), which is TWO Zambezi dies in one package.
    Hence 2*1.2 = 2.4B transistors.
  • JohanAnandtech - Friday, February 10, 2012 - link

    Ok, it is two chips of 1.2 billion. That doesn't change anything about our analyses of CMT.
  • Scali - Friday, February 10, 2012 - link

    Not in the article, because you did not factor in transistor count (which is the flaw I tried to point out in the first place... comparing two chips, where once is twice the transistor count of the other, is quite the apples-to-oranges comparison. One would expect a chip with twice the transistorcount to be considerably better in multithreading scenarios, not 'catching up' to the smaller chip).

    But in your above post, I think it changes everything about your analysis. All your figures have to be done times two.
    Which makes it a very poor comparison, not only to Intel, but also to AMD's own previous line of CPUs.
    The 6174 Magny Cours is actually beating Interlagos, with 'only' 12 threads, no kind of CMT/SMT, and 'only' 1.8B transistors.

    How does that make CMT look like a great choice for AMD?
  • slycer.tech - Friday, February 10, 2012 - link

    What i read on benchmark configuration page, Anand used 2x Intel Xeon X5650. So 2x 1.17B = 2.34B. I think it is comparable to AMD CPU used in this test. Am I right?

Log in

Don't have an account? Sign up now