SQL Server 2008 Enterprise R2

We have been using the Flemish/Dutch web 2.0 website Nieuws.be for some time. 99% of the loads on the database are selects and about 5% of them are stored procedures. You can find a more detailed description here.

We have improved our testing methodology and updated the SQL Server version, so the results are no longer comparable with previous results. We used to publish the highest throughput possible, but we have found that it is not entirely fair. At the highest throughput, there was a very small (<1%) percentage of connection errors (client side timeouts), but those timeouts could make the results vary by about 5-10%. A better configured .NET data provider improved this situation. We adapted the .Net data provider to support the same timeout as the MS SQL Server standard timeout (60s), and we now meticulously scan our logs for errors and discard all results that have any error rates.

Since turbo modes are disabled in the "Balanced" Power management policy and we want to evaluate both power and performance, we tested with both the "Balanced" and "High Performance" power management policies.

MS SQL Server 2008

We have reported this before: when it comes to pure OLAP MS SQL server throughput, the Opteron Magny-Cours is unbeatable. Notice that both the Xeon and the new Opteron 6276 can hardly leverage Turbo (e.g. "High Performance" mode) even though both Intel and AMD talk about the potential to run at higher clockspeeds even when all cores are active. While this integer intensive workload comes nowhere close to consuming what a Linpack run would require, the TDP headroom is no longer there to enable a clockspeed boost.

Looking at the results, the Opteron 6276 disappoints somewhat as it is not capable of outperforming its older brother. However, it performs relatively close to the Xeon and is thus far from a dud.

At maximum CPU load, the response times paint a very similar picture as the throughput numbers:

MS SQL Server 2008

When we look at the response times, the Opteron 6174's leadership is confirmed and emphasized. The fact that a 16-cluster 2.3GHz Opteron offers 30% more throughput and twice as fast response times as its 8-cluster 3GHz brother is a clear testimony to the excellent scaling capabilities of SQL server.

Since performance/watt is an extremely important metric, we follow up with a power measurement:

MS SQL Server 2008

There's no doubt about it: the Opteron 6174 is the performance/watt champion in this particular task.

This is the classical way to evaluate server performance, but should you base your purchase on these numbers alone? Frankly, no. The 100% full load evaluation is incomplete, and it's not related to the real world way of using a database. It just shows what your servers can spit out when running at their maximum, a situation most people either try to avoid or never see as so many other bottlenecks (I/O, lock contention) kick in before you see 100% CPU utilization. It is the best method to evaluate HPC machines, but a short-sighted method for almost any server application (web, database, etc.).

In other words, server benchmarks at 100% are just one datapoint, but we should test at lower concurrencies as well. That is why we simulated 40, 80, 100, 125, 200, 300, 400, 600 and 800 users with our vApus stress testing client. Each user starts a query with between 900 and 1100 ms "thinking time" (so on average 1 second). At 600 and 800 users, our servers achieve their maximum throughput, but how do these servers handle "low" and "midrange" workloads? Let us see what happened when we tested with everyday "normal" loads.

Benchmark Configuration MS SQL Server 2008 R2 at Medium Load
Comments Locked

46 Comments

View All Comments

  • Scali - Friday, February 10, 2012 - link

    No, because if you read the ENTIRE benchmark configuration page, you'd see that all the AMD systems had 2 CPUs as well.
  • Scali - Saturday, February 11, 2012 - link

    Oh, and while we're at it... the Intel system had only 48 GB of 1333 memory, where the AMDs had 64 GB of 1600 memory.
    (Yes, Bulldozer is THAT bad)
  • PixyMisa - Saturday, February 11, 2012 - link

    Or rather, MySQL scales that poorly.

    What we can tell from this article is that if you want to run a single instance of MySQL as fast as possible and don't want to get involved with subtle performance tuning options, the Opteron 6276 is not the way to go.

    For other workloads, the result can be very different.
  • JohanAnandtech - Saturday, February 11, 2012 - link

    Feel free to send me a suggestion on how to setup another workload. We know how to tune MySQL. So far none of these settings helped. The issue discussed (spinlocks) can not be easily solved.
  • Scali - Saturday, February 11, 2012 - link

    I'm not sure if you bothered to read the entire article, because MySQL was not the only database that was tested.
    There were also various tests with MS SQL, and again, Interlagos failed to impress compared to both Magny Cours-based Opterons and the Xeon system.
  • JohanAnandtech - Saturday, February 11, 2012 - link

    The clockspeed of the RAM has a small impact here. 64 vs 48 GB does not matter.
  • Scali - Saturday, February 11, 2012 - link

    Not saying it does... Just pointing out that the AMD system had more impressive specs on paper, yet failed to deliver the performance.
  • JohanAnandtech - Saturday, February 11, 2012 - link

    Again, it is not CMT that makes AMD's transistor count explode but the combination of 2x L3 caches and 4x 2M L2-caches. You can argue that AMD made poor choices concerning caches, but again it is not CMT that made the transistor count grow.

    I am not arguing that AMD's performance/billion transistors is great.
  • Scali - Saturday, February 11, 2012 - link

    I think you are looking at it from the wrong direction.
    You are trying to compare SMT and CMT, but contrary to what AMD wants to make everyone believe, they are not very similar technologies.
    You see, SMT enables two threads to run on one physical core, without adding any kind of execution units, cache or anything. It is little more than some extra logic so that the OoOE buffers can handle two thread contexts at the same time, rather than one.

    So the thing with SMT is that it REDUCES the transistorcount required for running two threads. By nearly 100%.
    CMT on the other hand does not reduce the transistorcount nearly as much. So if you are merely looking at an 'exposion of transistor count', you are missing the point of what SMT really does.

    Other than that, your argument is still flawed. Even an 8-thread Bulldozer has a higher transistor count than the 12-thread Xeon here. It's not just cache. CMT just doesn't pack as many threads per transistor as SMT does... and to make matters worse, CMT also has a negative impact on single-threaded performance (which again, if you are looking at it from the wrong direction, may look like better scaling in threadcount... but effectively, both with low and high threadcounts, the Xeon is the better option... and this is just a midrange Xeon compared to a high-end Interlagos. The Xeon can scale to higher clockspeeds, improving both single-threaded and multithreaded performance for the same transistorcount).

    So what your article says is basically this:
    CMT, which is nearly the same as having full cores, especially in integer-only tasks such as databases, since you have two actual integer cores, has nearly the same scaling in threadcount as conventional multicore CPUs.
    Which has a very high 'duh'-factor, since it pretty much *is* conventional multicore.
    It does not reduce transistorcount, nor does it improve performance, so what's the point?
  • JohanAnandtech - Friday, February 10, 2012 - link

    Semantics :-). I can call it a core with CMT, or a module with 2 cores. Both are valid.

Log in

Don't have an account? Sign up now