MS SQL Server 2008 Power Analysis

We'll let power consumption be the final judge:

Concurrency CMT No CMT Power Increase
CMT vs. No CMT
HTT No HTT Power Increase
HTT vs. No HTT
25 144 141 2% 159 156 2%
40 156 160 -2% 169 163 4%
80 185 194 -5% 182 174 4%
100 211 209 1% 192 177 8%
125 223 224 0% 201 186 8%
200 250 254 -2% 235 213 10%
300 290 283 3% 277 251 10%
350 305 288 6% 299 275 9%
400 316 288 10% 303 275 10%
500 316 291 9% 314 275 14%
600 324 299 9% 312 285 9%
800 324 308 5% 320 289 11%

CMT increases the amount of power consumed by 6-10%, but only at high loads. The extra clusters probably allow the modules (as AMD likes to call the cores) to sleep more frequently at lighter loads, and we measure no increase or even a small decrease in power consumption. The message is clear: there is no reason to disable CMT when running MS SQL Server.

Hyper-Threading seems to increase the power dissipation always. At higher concurrencies, the higher performance must be paid with a 10-14% power increase, so you might consider disabling Hyper-Threading if your want to cap maximum power output for some reason (e.g. getting to close to the maximum amount of amps allowed in your rack).

MS SQL Server OLAP Conclusion

We invested 10 times more time in our MS SQL Server testing, but frankly we are glad we did. The Opteron 6174 seems to be a true champion from a simple "throughput/power at 100%" analysis, but the reality is that servers hardly ever run at such loads. Under light loads, the Opteron 6174 is either slower and consumes more power (Balanced power setting) or it consumes quite a bit more (High Performance power setting) while being roughly on par with the competition in terms of performance. At medium load, the Opterons are beaten solidly by the Xeon; the Xeon consumes quite a bit less power in "Balanced" and performs a lot better (response times).

At the end of the day, the Xeon X5650 is the better chip (especially in "Balanced" mode) but it's also the more expensive one. The Opteron 6276 price/performance/watt ratio remains quite attractive, but if pricing is taken into account everything will depend on which MS SQL Server License you will get. We will leave that analysis to other people as an economic analysis of complex, customer unfriendly licensing is definitely out of the scope of this article.

Threading Tricks or Not? MySQL OLAP Testing
Comments Locked


View All Comments

  • sonofgodfrey - Thursday, February 9, 2012 - link

    Have you explicitly tested one socket vs. two sockets? We've found an immense increase in contention once a cache-line has to be shared between sockets on some systems.
  • JohanAnandtech - Friday, February 10, 2012 - link

    That is one suggestion I will try out next week. Thanks!
  • Klimax - Thursday, February 9, 2012 - link


    Nice tests.

    However I would like to see MySQL tested on Windows Server 2008 R2
    Would be interesting comparsion.

    (Especially due to )
  • Klimax - Thursday, February 9, 2012 - link

    Title of post is wrong... (I have deleted second thing and forgot to fix title)
  • Scali - Thursday, February 9, 2012 - link

    Unless I'm mistaken, the Xeon 5650 is a 1.17B transistor chip, where the Interlagos 6276 is a 2.4B transistor chip.
    In that light, doesn't that make Intel's SMT implementation a lot better than CMT?
    I mean, yes CMT may give more of a performance boost when you increase the threadcount. But considering the fact that AMD spends more than twice the number of transistors on the chip... well, that's pretty obvious.
    AMD might as well just have used conventional cores.
    The true strength of SMT is not so much that it improves performance in multithreaded scenarios, but that it does so at virtually no extra cost in terms of transistors (and with little or no impact on the single-threaded performance either).
  • JohanAnandtech - Friday, February 10, 2012 - link

    Interlagos is 1.2 billion chip (maybe 1.3 but anyway). Most of those transistors are spend on the L3 cache: about 0.5 billion. Only 213 million transistors are in a module and each module contains a 2 MB L2-cache, probably good for 120 million transistors. That leaves 90 million transistors to the core, and it has been stated that the second cluster added 12%. So that second cluster costs about 12 million transistors, or 48 million on the total 4 module die. That is less than 5% of the total transistor count but you get a 30-90% performance boost!

    So for AMD, this was clearly a great choice.

    SMT is perfect for Intel, as the Intel architecture puts all instructions in one big ROB.

    For very low IPC serverworkloads, I think the CMT approach gives better results. Unfortunately AMD lowered some of the CMT benefits by keeping the datacache so small and the low associativity of the Icache.
  • Scali - Friday, February 10, 2012 - link

    Uhhh, I think you're wrong here... the 4-module Bulldozer is a 1.2B chip (Zambezi). But you tested the 8-module Interlagos (16 threads), which is TWO Zambezi dies in one package.
    Hence 2*1.2 = 2.4B transistors.
  • JohanAnandtech - Friday, February 10, 2012 - link

    Ok, it is two chips of 1.2 billion. That doesn't change anything about our analyses of CMT.
  • Scali - Friday, February 10, 2012 - link

    Not in the article, because you did not factor in transistor count (which is the flaw I tried to point out in the first place... comparing two chips, where once is twice the transistor count of the other, is quite the apples-to-oranges comparison. One would expect a chip with twice the transistorcount to be considerably better in multithreading scenarios, not 'catching up' to the smaller chip).

    But in your above post, I think it changes everything about your analysis. All your figures have to be done times two.
    Which makes it a very poor comparison, not only to Intel, but also to AMD's own previous line of CPUs.
    The 6174 Magny Cours is actually beating Interlagos, with 'only' 12 threads, no kind of CMT/SMT, and 'only' 1.8B transistors.

    How does that make CMT look like a great choice for AMD?
  • - Friday, February 10, 2012 - link

    What i read on benchmark configuration page, Anand used 2x Intel Xeon X5650. So 2x 1.17B = 2.34B. I think it is comparable to AMD CPU used in this test. Am I right?

Log in

Don't have an account? Sign up now