The Opteron 6276: a closer look

Name: The Opteron 6276: a closer look
Item: The Opteron 6276: a closer look
Author: Johan De Gelas

by Johan De Gelas on February 9, 2012 6:00 AM EST

46 Comments | Add A Comment

46 Comments

MS SQL Server 2008 R2 at Medium Load

First we test at a moderate load (20-40% load) with 125 concurrent users. Note that these numbers are one set of results from our testing a complete chain of concurrencies (25, 40, 80, 100, 125, 200, 300 ... concurrent users). You will see the complete listing of the results later in the review. For this overview, we focus on a specific concurrency. The database is "warmed up" with a test using 25 concurrent users. We always discard the result at 25 concurrent users as you see some disk I/O peaks at the first concurrency.

We don't look at the throughput numbers here as all servers deliver somewhere between 117 and 122 queries per second as we only demand 125 queries per second. Instead, we focus on response times.

MS SQL Server 2008

Here the response times are very interesting. From the "full load" numbers, you might conclude that the Opteron 6220 is untenable, as it delivers 30-40% lower throughput while consuming just as much power as the other servers. From those same numbers, we would conclude that the Opteron 6174 is the server platform to get. Switch to a moderate load and our conclusions change.

When testing at medium load, we get a much more accurate and nuanced picture as your servers will probably be spending a lot of time running this kind of load. It seems that if you want to save some power (e.g. run the "Balanced" power profile), the opteron 6220 comes close to the new champion, the Xeon X5650. Since turbo is not enabled in this mode, the 6220 leverages its higher clockspeed to outperform the other Opterons.

Interestingly, the Dynamic Voltage Frequency and Voltage Scaling (DVFS) of the Opteron 6174 performs pretty badly compared to the Xeon and the new Opteron. Enabling DVFS increases the response times by 116% (!) on the older Opteron. The Xeon and Opteron 6276 also get a significant—but lower—hit in response time: +66% and +78% respectively.

The Opteron 6220 suffers much less from this problem, as response times only grow 22%. That clearly indicates that the new Opteron deals much better with DVFS. The reason why the 6276 gets such a high penalty in "Balanced" mode is probably due to the fact that it cannot boost to 2.6GHz or 3.2GHz anymore. A better adapted power policy could definitely improve performance at lower loads. We measured the impact of turbo on the power consumption and it was less than 10%. The energy (power * time) increase was even lower (a few %) as the CPU could put the cores to sleep more quickly.

If you think these kind of response times (<100 ms) don't matter, don't forget that the top 5% queries can easily show 20-50x higher response times. Those are exactly the queries the users might start to complain about.

Let us look at the power figures.

MS SQL Server 2008

As the Xeon is able to put its cores to sleep more quickly and deeply, the Xeon is a real winner in "Balanced" mode. But notice how the Opteron 6174 performance/watt is no longer attractive: it needs just as much power as Opteron 6276 in balanced mode but delivers worse response times. Meanwhile, the Opteron 6220 fails to impress; it did deliver very decent response times, but it needs 26% more power than the Xeon, which is saving a significant amount of power in "Balanced" mode.

SQL Server 2008 R2 "OLAP" Workload MS SQL Server 2008 R2 at Low Load

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

46 Comments

View All Comments

sonofgodfrey - Thursday, February 9, 2012 - link
Have you explicitly tested one socket vs. two sockets? We've found an immense increase in contention once a cache-line has to be shared between sockets on some systems.
JohanAnandtech - Friday, February 10, 2012 - link
That is one suggestion I will try out next week. Thanks!
Klimax - Thursday, February 9, 2012 - link
Hello.

Nice tests.

However I would like to see MySQL tested on Windows Server 2008 R2
Would be interesting comparsion.

(Especially due to http://channel9.msdn.com/shows/Going+Deep/Arun-Kis... )
Klimax - Thursday, February 9, 2012 - link
Title of post is wrong... (I have deleted second thing and forgot to fix title)
Scali - Thursday, February 9, 2012 - link
Unless I'm mistaken, the Xeon 5650 is a 1.17B transistor chip, where the Interlagos 6276 is a 2.4B transistor chip.
In that light, doesn't that make Intel's SMT implementation a lot better than CMT?
I mean, yes CMT may give more of a performance boost when you increase the threadcount. But considering the fact that AMD spends more than twice the number of transistors on the chip... well, that's pretty obvious.
AMD might as well just have used conventional cores.
The true strength of SMT is not so much that it improves performance in multithreaded scenarios, but that it does so at virtually no extra cost in terms of transistors (and with little or no impact on the single-threaded performance either).
JohanAnandtech - Friday, February 10, 2012 - link
Interlagos is 1.2 billion chip (maybe 1.3 but anyway). Most of those transistors are spend on the L3 cache: about 0.5 billion. Only 213 million transistors are in a module and each module contains a 2 MB L2-cache, probably good for 120 million transistors. That leaves 90 million transistors to the core, and it has been stated that the second cluster added 12%. So that second cluster costs about 12 million transistors, or 48 million on the total 4 module die. That is less than 5% of the total transistor count but you get a 30-90% performance boost!

So for AMD, this was clearly a great choice.

SMT is perfect for Intel, as the Intel architecture puts all instructions in one big ROB.

For very low IPC serverworkloads, I think the CMT approach gives better results. Unfortunately AMD lowered some of the CMT benefits by keeping the datacache so small and the low associativity of the Icache.
Scali - Friday, February 10, 2012 - link
Uhhh, I think you're wrong here... the 4-module Bulldozer is a 1.2B chip (Zambezi). But you tested the 8-module Interlagos (16 threads), which is TWO Zambezi dies in one package.
Hence 2*1.2 = 2.4B transistors.
JohanAnandtech - Friday, February 10, 2012 - link
Ok, it is two chips of 1.2 billion. That doesn't change anything about our analyses of CMT.
Scali - Friday, February 10, 2012 - link
Not in the article, because you did not factor in transistor count (which is the flaw I tried to point out in the first place... comparing two chips, where once is twice the transistor count of the other, is quite the apples-to-oranges comparison. One would expect a chip with twice the transistorcount to be considerably better in multithreading scenarios, not 'catching up' to the smaller chip).

But in your above post, I think it changes everything about your analysis. All your figures have to be done times two.
Which makes it a very poor comparison, not only to Intel, but also to AMD's own previous line of CPUs.
The 6174 Magny Cours is actually beating Interlagos, with 'only' 12 threads, no kind of CMT/SMT, and 'only' 1.8B transistors.

How does that make CMT look like a great choice for AMD?
slycer.tech - Friday, February 10, 2012 - link
What i read on benchmark configuration page, Anand used 2x Intel Xeon X5650. So 2x 1.17B = 2.34B. I think it is comparable to AMD CPU used in this test. Am I right?

The Opteron 6276: a closer look

Post Your Comment

46 Comments

View All Comments

sonofgodfrey - Thursday, February 9, 2012 - link

JohanAnandtech - Friday, February 10, 2012 - link

Klimax - Thursday, February 9, 2012 - link

Klimax - Thursday, February 9, 2012 - link

Scali - Thursday, February 9, 2012 - link

JohanAnandtech - Friday, February 10, 2012 - link

Scali - Friday, February 10, 2012 - link

JohanAnandtech - Friday, February 10, 2012 - link

Scali - Friday, February 10, 2012 - link

slycer.tech - Friday, February 10, 2012 - link

Log in

Don't have an account? Sign up now