Real World Power

In the real world you do not run your virtualized servers at their maximum just to measure the potential performance. Neither do they run idle. The user base will create a certain workload and expect this workload to be performed with the lowest response times. The service provider (that is you!) wants the server to finish the job with the least amount of energy consumed. So the general idea behind this new benchmark scenario is that each server runs exactly the same workload and that we then measure the amount of energy consumed. It is similar to our previous article about server power consumption, but the methodology has been enhanced.

We made a new benchmark scenario. In this scenario, we changed three things compared to the vApus Mark II scenario:

  1. The number of users or concurrency per VM was lowered significantly to throttle the load
  2. The OLTP VMs are omitted
  3. We ran with two tiles

vApus Mark II loads the server with up to 800 users per second on the OLAP test, up to 50 users per second on the website, and the OLTP test is performing transactions as fast as it can. The idea is to give the server so much work that it is constantly running at 95-99% CPU load, allowing us to measure throughput performance quite well. vApus Mark II is designed as a CPU/memory benchmark.

To create a real world “equal load” scenario, we throttle the number of users to a point where you typically get somewhere between 30% and 60% CPU load on modern servers. As we cannot throttle our OLTP VM (Swingbench) as far we as know, we discarded the OLTP VM in this test. If we let the OLTP test run at maximum speed, the OLTP VM would completely dominate the measurements.

We run two tiles with 14 vCPUs (eight vCPUs for OLAP, three webservers with two vCPUs per tile), so in total 28 virtual CPUs are active. There are some minor tasks in the background: a very lightly loaded Oracle databases that feeds the three websites (one per tile), the VMware console (which idles most of the time), and of course the ESX hypervisor kernel. So all in all, you have a load on about 30-31 vCPUs. That means that some of the cores of the server system will be idleing, just like in the real world. On the HP DL380 G7, this “equal workload” benchmark gives the following CPU load graph:

On the Y-axis is CPU load, and on the X-axis is the periodic CPU usage. ESXtop was set up to measure CPU load every five seconds. Each test was performed three times: two times to measure performance and energy consumption, and the third time we did the same thing but with extensive ESXtop monitoring. To avoid having the CPU load in the third run much higher than the first two, we measured every five seconds. We measure the energy consumption over 15 minutes.

  vApus Mark   

Again, the dual Opteron numbers are somewhat high as we are running them in a quad socket machine. A Dell R715 is probably going to consume about 5% less. If we get the chance, we'll verify this. But even if the dual Opterons are not ideal measurements in comparison to the dual Xeon, they do give us interesting info.

Two Opterons CPUs are consuming 26.5 Wh (96.7 - 70.2). So if we extrapolate, this means roughly 55% (53 Wh out of 97Wh) of the total energy in our quad Opteron server is consumed by the four processors. Notice also that despite the small power handicap of the Opteron (a dual socket server will consume less), it was able to stay close to the Xeon X5670 based server when comparing maximum power (360W vs 330W). But once we introduce a 30-50% load, the gap between the dual Opteron setup and dual Xeon setup widens. In other words, the Opteron and Xeon are comparable at high loads, but the Xeon is able to save more power at lower loads. So there is still quite a bit of room for improvement: power gating will help the “Bulldozer” Opteron drive power consumption down at lower load.

Ok, enough interesting tidbits, who has the best performance per watt ratio?

Power Extremes: Idle and Full Load Response Times
Comments Locked

51 Comments

View All Comments

  • pablo906 - Saturday, September 11, 2010 - link

    High performance Oracle environments are exactly what's being virtualized in the Server world yet it's one of your premier benchmarks.

    /edit should read

    High performance Oracle environments are exactly what's not being virtualized in the Server world yet it's one of your premier benchmarks.
  • JohanAnandtech - Monday, September 13, 2010 - link

    "You run highly loaded Hypervisors. NOONE does this in the Enterprise space."

    I agree. Isn't that what I am saying on page 12:

    "In the real world you do not run your virtualized servers at their maximum just to measure the potential performance. Neither do they run idle."

    The only reason why we run with highly loaded hypervisors is to measure the peak throughput of the platform. Like VMmark. We know that is not realworld, and does not give you a complete picture. That is exactly the reason why there is a page 12 and 13 in this article. Did you miss those?
  • Per Hansson - Sunday, September 12, 2010 - link

    Hi, please use a better camera for pictures of servers that costs thousands of dollars
    In full size the pictures look terrible, way too much grain
    The camera you use is a prime example of how far marketing have managed to take these things
    10MP on a sensor that is 1/2.3 " (6.16 x 4.62 mm, 0.28 cm²)
    A used DSLR with a decent 50mm prime lens plus a tripod really does not cost that much for a site like this

    I love server pron pictures :D
  • dodge776 - Friday, September 17, 2010 - link

    I may be one of the many "silent" readers of your reviews Johan, but putting aside all the nasty or not-so-bright comments, I would like to commend you and the AT team for putting up such excellent reviews, and also for using industry-standard benchmarks like SAPS to measure throughput of the x86 servers.

    Great work and looking forward to more of these types of reviews!
  • lonnys - Monday, September 20, 2010 - link

    Johan -
    You note for the R815:
    Make sure you populate at least 32 DIMMs, as bandwidth takes a dive at lower DIMM counts.
    Could you elaborate on this? We have a R815 with 16x2GB and not seeing the expected performance for our very CPU intensive app perhaps adding another 16x2GB might help
  • JohanAnandtech - Tuesday, September 21, 2010 - link

    This comment you quoted was written in the summary of the quad Xeon box.

    16 DIMMs is enough for the R815 on the condition that you have one DIMM in each channel. Maybe you are placing the DIMMs wrongly? (Two DIMMs in one channel, zero DIMM in the other?)
  • anon1234 - Sunday, October 24, 2010 - link

    I've been looking around for some results comparing maxed-out servers but I am not finding any.

    The Xeon 5600 platform clocks the memory down to 800MHz whenever 3 dimms per channel are used, and I believe in some/all cases the full 1066/1333MHz speed (depends on model) is only available when 1 dimm per channel is used. This could be huge compared with an AMD 6100 solution at 1333MHz all the time, or a Xeon 7560 system at 1066 all the time (although some vendors clock down to 978MHz with some systems - IBM HX5 for example). I don't know if this makes a real-world difference on typical virtualization workloads, but it's hard to say because the reviewers rarely try it.

    It does make me wonder about your 15-dimm 5600 system, 3 dimms per channel @800MHz on one processor with 2 DPC @ full speed on the other. Would it have done even better with a balanced memory config?

    I realize you're trying to compare like to like, but if you're going to present price/performance and power/performance ratios you might want to consider how these numbers are affected if I have to use slower 16GB dimms to get the memory density I want, or if I have to buy 2x as many VMware licenses or Windows Datacenter processor licenses because I've purchased 2x as many 5600-series machines.
  • nightowl - Tuesday, March 29, 2011 - link

    The previous post is correct in that the Xeon 5600 memory configuration is flawed. You are running the processor in a degraded state 1 due to the unbalanced memory configuration as well as the differing memory speeds.

    The Xeon 5600 processors can run at 1333MHz (with the correct DIMMs) with up to 4 ranks per channel. Going above this results in the memory speed clocking down to 800MHz which does result in a performance drop to the applications being run.
  • markabs - Friday, June 8, 2012 - link

    Hi there,

    I know this is an old post but I'm looking at putting 4 SSDs in a Dell poweredge and had a question for you.

    What raid card did you use with the above setup?

    Currently a new Dell poweredge R510 comes with a PERC H700 raid card with 1GB cache and this is connect to a hot swap chassis. Dell want £1500 per SSD (crazy!) so I'm looking to buy 4 intel 520s and setup them up in raid 10.

    I just wanted to know what raid card you used and if you had a trouble with it and what raid setup you used?

    many thanks.

    Mark
  • ian182 - Thursday, June 28, 2012 - link

    I recently bought a G7 from www.itinstock.com and if I am honest it is perfect for my needs, i don't see the point in the higher end ones when it just works out a lot cheaper to buy the parts you need and add them to the G7.

Log in

Don't have an account? Sign up now