MS SQL Server 2008 R2 at Medium Load

First we test at a moderate load (20-40% load) with 125 concurrent users. Note that these numbers are one set of results from our testing a complete chain of concurrencies (25, 40, 80, 100, 125, 200, 300 ... concurrent users). You will see the complete listing of the results later in the review. For this overview, we focus on a specific concurrency. The database is "warmed up" with a test using 25 concurrent users. We always discard the result at 25 concurrent users as you see some disk I/O peaks at the first concurrency.

We don't look at the throughput numbers here as all servers deliver somewhere between 117 and 122 queries per second as we only demand 125 queries per second. Instead, we focus on response times.

MS SQL Server 2008

Here the response times are very interesting. From the "full load" numbers, you might conclude that the Opteron 6220 is untenable, as it delivers 30-40% lower throughput while consuming just as much power as the other servers. From those same numbers, we would conclude that the Opteron 6174 is the server platform to get. Switch to a moderate load and our conclusions change.

When testing at medium load, we get a much more accurate and nuanced picture as your servers will probably be spending a lot of time running this kind of load. It seems that if you want to save some power (e.g. run the "Balanced" power profile), the opteron 6220 comes close to the new champion, the Xeon X5650. Since turbo is not enabled in this mode, the 6220 leverages its higher clockspeed to outperform the other Opterons.

Interestingly, the Dynamic Voltage Frequency and Voltage Scaling (DVFS) of the Opteron 6174 performs pretty badly compared to the Xeon and the new Opteron. Enabling DVFS increases the response times by 116% (!) on the older Opteron. The Xeon and Opteron 6276 also get a significant—but lower—hit in response time: +66% and +78% respectively.

The Opteron 6220 suffers much less from this problem, as response times only grow 22%. That clearly indicates that the new Opteron deals much better with DVFS. The reason why the 6276 gets such a high penalty in "Balanced" mode is probably due to the fact that it cannot boost to 2.6GHz or 3.2GHz anymore. A better adapted power policy could definitely improve performance at lower loads. We measured the impact of turbo on the power consumption and it was less than 10%. The energy (power * time) increase was even lower (a few %) as the CPU could put the cores to sleep more quickly.

If you think these kind of response times (<100 ms) don't matter, don't forget that the top 5% queries can easily show 20-50x higher response times. Those are exactly the queries the users might start to complain about.

Let us look at the power figures.

MS SQL Server 2008

As the Xeon is able to put its cores to sleep more quickly and deeply, the Xeon is a real winner in "Balanced" mode. But notice how the Opteron 6174 performance/watt is no longer attractive: it needs just as much power as Opteron 6276 in balanced mode but delivers worse response times. Meanwhile, the Opteron 6220 fails to impress; it did deliver very decent response times, but it needs 26% more power than the Xeon, which is saving a significant amount of power in "Balanced" mode.

SQL Server 2008 R2 "OLAP" Workload MS SQL Server 2008 R2 at Low Load
Comments Locked

46 Comments

View All Comments

  • Scali - Friday, February 10, 2012 - link

    No, because if you read the ENTIRE benchmark configuration page, you'd see that all the AMD systems had 2 CPUs as well.
  • Scali - Saturday, February 11, 2012 - link

    Oh, and while we're at it... the Intel system had only 48 GB of 1333 memory, where the AMDs had 64 GB of 1600 memory.
    (Yes, Bulldozer is THAT bad)
  • PixyMisa - Saturday, February 11, 2012 - link

    Or rather, MySQL scales that poorly.

    What we can tell from this article is that if you want to run a single instance of MySQL as fast as possible and don't want to get involved with subtle performance tuning options, the Opteron 6276 is not the way to go.

    For other workloads, the result can be very different.
  • JohanAnandtech - Saturday, February 11, 2012 - link

    Feel free to send me a suggestion on how to setup another workload. We know how to tune MySQL. So far none of these settings helped. The issue discussed (spinlocks) can not be easily solved.
  • Scali - Saturday, February 11, 2012 - link

    I'm not sure if you bothered to read the entire article, because MySQL was not the only database that was tested.
    There were also various tests with MS SQL, and again, Interlagos failed to impress compared to both Magny Cours-based Opterons and the Xeon system.
  • JohanAnandtech - Saturday, February 11, 2012 - link

    The clockspeed of the RAM has a small impact here. 64 vs 48 GB does not matter.
  • Scali - Saturday, February 11, 2012 - link

    Not saying it does... Just pointing out that the AMD system had more impressive specs on paper, yet failed to deliver the performance.
  • JohanAnandtech - Saturday, February 11, 2012 - link

    Again, it is not CMT that makes AMD's transistor count explode but the combination of 2x L3 caches and 4x 2M L2-caches. You can argue that AMD made poor choices concerning caches, but again it is not CMT that made the transistor count grow.

    I am not arguing that AMD's performance/billion transistors is great.
  • Scali - Saturday, February 11, 2012 - link

    I think you are looking at it from the wrong direction.
    You are trying to compare SMT and CMT, but contrary to what AMD wants to make everyone believe, they are not very similar technologies.
    You see, SMT enables two threads to run on one physical core, without adding any kind of execution units, cache or anything. It is little more than some extra logic so that the OoOE buffers can handle two thread contexts at the same time, rather than one.

    So the thing with SMT is that it REDUCES the transistorcount required for running two threads. By nearly 100%.
    CMT on the other hand does not reduce the transistorcount nearly as much. So if you are merely looking at an 'exposion of transistor count', you are missing the point of what SMT really does.

    Other than that, your argument is still flawed. Even an 8-thread Bulldozer has a higher transistor count than the 12-thread Xeon here. It's not just cache. CMT just doesn't pack as many threads per transistor as SMT does... and to make matters worse, CMT also has a negative impact on single-threaded performance (which again, if you are looking at it from the wrong direction, may look like better scaling in threadcount... but effectively, both with low and high threadcounts, the Xeon is the better option... and this is just a midrange Xeon compared to a high-end Interlagos. The Xeon can scale to higher clockspeeds, improving both single-threaded and multithreaded performance for the same transistorcount).

    So what your article says is basically this:
    CMT, which is nearly the same as having full cores, especially in integer-only tasks such as databases, since you have two actual integer cores, has nearly the same scaling in threadcount as conventional multicore CPUs.
    Which has a very high 'duh'-factor, since it pretty much *is* conventional multicore.
    It does not reduce transistorcount, nor does it improve performance, so what's the point?
  • JohanAnandtech - Friday, February 10, 2012 - link

    Semantics :-). I can call it a core with CMT, or a module with 2 cores. Both are valid.

Log in

Don't have an account? Sign up now