SQL Server 2008 Enterprise R2

We have been using the Flemish/Dutch Web 2.0 website Nieuws.be as a benchmark for some time. 99% of the loads on the database are selects and about 5% of them are stored procedures. You can find a more detailed description here.

We have improved our testing methodology (read more about it here) and updated the SQL Server, so the results are only comparable to our last Opteron 6276 review (and not comparable to older ones than the latter).

MS SQL Server 2008

Since performance/watt is an extremely important metric, we follow up with a power measurement:

MS SQL Server 2008

The Xeon E5-2690 is by far the fastest in this discipline, but the difference power consumption compared to the rest of the pack is significant. The Xeon E5-2690 needs 140W more than its slower brother, the 95W TDP Xeon E5-2660. That is 70W extra per CPU. This clearly indicates that the fastest Xeon is running closer to its TDP than the 2.2 GHz version. The Xeon E5-2660 offers more than 20% better performance per Watt than the 135W TDP Xeon.

The Xeon E5-2660 is especially impressive if you compare it with the older Xeon. Despite the lower clockspeed, the new Xeon is capable of outperforming the Xeon 5650 by 30%.

Clock for clock, core for core the Xeon E5 is 23% more efficient at SQL Server workloads than its older brother. Considering that it is pretty hard to extract higher IPC out of server workloads, we can say that the Sandy-Bridge architecture is a winner when it comes to SQL databases.

Finally, let's check out the response times with 600 users sending off a query every second (on average):

MS SQL Server 2008

Response times are more or less linear (and low!) when the server is not yet saturated . Once the server is closer to or over its maximum throughput, response times tend to increase almost exponentially. Since the Xeon E5-2690 is capable of sustaining more than 600 users, it can still offer a very low response time. The other CPUs are saturated at this point.

But as we pointed out in our previous article, server benchmarks at 100% are just one datapoint and we should test at lower concurrencies as well. Most people try to make sure that their database server almost never runs at 100% CPU load.

ESXi Performance per Watt MS SQL Server 2008 R2 at medium load
Comments Locked

81 Comments

View All Comments

  • JohanAnandtech - Wednesday, March 7, 2012 - link

    Argh. You are absolutely right. I reversed all divisions. I am fixing this as we type. Luckily this does not alter the conclusion: LS-DYNA does not scale with clockspeed very well.
  • alpha754293 - Wednesday, March 7, 2012 - link

    I think that I might have an answer for you as to why it might not scale well with clock speed.

    When you start a multiprocessor LS-DYNA run, it goes through a stage where it decomposes the problem (through a process called recursive coordinate bisection (RCB)).

    This decomposition phase is done every time you start the run, and it only runs on a single processor/core. So, suppose that you have a dual-socket server where the processors say...are hitting 4 GHz. That can potentially be faster than say if you had a four-socket server, but each of the processors are only 2.4 GHz.

    In the first case, you have a small number of really fast cores (and so it will decompose the domain very quickly), whereas in the latter, you have a large number of much slower cores, so the decomposition will happen slowly, but it MIGHT be able to solve the rest of it slightly faster (to make up for the difference) just because you're throwing more hardware at it.

    Here's where you can do a little more experimenting if you like.

    Using the pfile (command line option/flag 'p=file'), not only can you control the decomposition method, but you can also tell it to write the decomposition to a file.

    So had you had more time, what I would have probably done is written out the decompositions for all of the various permutations you're going to be running. (n-cores, m-number of files.)

    When you start the run, instead of it having to decompose the problem over and over again each time it starts, you just use the decomposition that it's already done (once) and then that way, you would only be testing PURELY the solving part of the run, rather than from beginning to end. (That isn't to say that the results you've got is bad - it's good data), but that should help to take more variables out of the equation when it comes to why it doesn't scale well with clock speed. (It should).
  • IntelUser2000 - Tuesday, March 6, 2012 - link

    Please refrain from creating flamebait in your posts. Your post is almost like spam, almost no useful information is there. If you are going to love one side, don't hate the other.
  • Alexko - Tuesday, March 6, 2012 - link

    It's not "like spam", it's just plain spam at this point. A little ban + mass delete combo seems to be in order, just to cleanup this thread—and probably others.
  • ultimav - Wednesday, March 7, 2012 - link

    My troll meter is reading off the charts with this guy. Reading between the lines, he's actually a hardcore AMD fan trying to come across as the Intel version of Sharikou to paint Intel fans in a bad light. Pretty obvious actually.
  • JohanAnandtech - Wednesday, March 7, 2012 - link

    We had to mass delete his posts as they indeed did not contain any useful info and were full of insults. The signal to noise ratio has been good the last years, so we must keep it that way.

    Inteluser2000, Alexko, Ultimav, tipoo: thx for helping to keep the tone civil here. Appreciate it.

    - Johan.
  • tipoo - Wednesday, March 7, 2012 - link

    And thank you for removing that stuff.
  • tipoo - Tuesday, March 6, 2012 - link

    We get it. Don't spam the whole place with the same post.
  • tipoo - Tuesday, March 6, 2012 - link

    No, he's just a rational persons. I don't care which company you like, if you say the same thing 10 times in one article someones sure to get annoyed and with justification.
  • MySchizoBuddy - Tuesday, March 6, 2012 - link

    I'm again requesting that when you do the benchmarks please do a Performance per watt metric along with stress testing by running folding@home for straight 48hours.

Log in

Don't have an account? Sign up now