Measuring Real-world Power Consumption, Part 1

vApus FOS EWL

The Equal Workload (EWL) test is very similar to our previous vApus Mark II "Real-world Power" test. To create a real-world “equal workload” scenario, we throttle the number of users in each VM to a point where you typically get somewhere between 20% and 80% CPU load on a modern dual CPU server. This is the CPU load of vApus FOS:

Note that we do not measure performance in the "start up" phase and at the end of test.

Compare this with vApus FOS EWL:

In this case we measure until the very end. The amount of work to be done is always equal, and the faster the system, the sooner it can idle. The time of the test is always the same, and all tested systems will spend some time in idle. The faster the system, the faster the workload will be done and the more time will be spent at idle. For this test we do not measure power but energy (power x time) consumed.

The measured performance cannot be compared as in "system x is z% faster than system y", but it does give you an idea of how well the server handles the load and how quickly it will save energy by entering a low power state.

vApus FOS EWL performance

The Xeons are all in the same ballpark. The AMD system with its slower CPUs needs more time to deal with this workload. One interesting thing to note is that Hyper-Threading does not boost throughput. That is not very surprising, considering that the total CPU load is between 20 and 80%. What about response time?

vApus FOS EWL performance

Note that we do not simply take a geomean of the response times. All response times are compared to the reference values. Those percentages (Response time/reference Response time) are then geometrically averaged.

The reference values are measured on the HP DL380 G7 running a native CentOS 5.6 client. We run four tiles of seven vCPUs on top of each server. So the value 117 means that the VMs are on average 17% slower than on the native machine. The 17% higher response times are a result of the fact that when a VM demands two virtual Xeon CPUs, the hypervisor cannot always oblige. It has 24 logical CPUs available, and 28 (7 vCPUs x 4 tiles) are requested. In contrast, the software running on the native machine gets two real cores.

Back to our results. The response time of the AMD based server tells us that even under medium load, a faster CPU can help to reduce the response time, which is the most important performance parameter anyway. However, Hyper-Threading does not help under these circumstances.

Also note that the Open Compute server handles this kind of load slightly better than the HP. So while the Open Compute servers offer a slightly lower top performance, they are at their best in the most realistic benchmark scenarios: between 20% and 80% CPU load. Of course, performance per watt remains the most important metric:

vApus FOS EWL Energy Consumption

When the CPU load is between 20 and 80%, which is realistic, the HP uses 15% more power. We can reduce the energy consumed by another 10% if we disable Hyper-Threading, which as noted does not improve performance in this scenario anyway.

vApus FOS results Measuring Real-World Power Consumption, Part 2
Comments Locked

67 Comments

View All Comments

  • jamdev12 - Thursday, November 3, 2011 - link

    I would definitely have to agree with you on this notion. HP servers are pretty expensive when you take into account 3 year warranties and 24/7 replacement options that going with a open compute server is a nice alternative to the "I can do everything" server. Better to stick to something you can do pretty well and efficiently than I can do many things poorly.
  • haplo602 - Friday, November 4, 2011 - link

    this is an option for somebody with a custom built infrastructure and dedicated DC services. however a general purpose server CANNOT do without.

    since the server category is different (general purpose vs custom built) the HP one does well (I'd say even excelent).
  • HollyDOL - Thursday, November 3, 2011 - link

    I would be quite interested how they determined Java and C# are 2/3x slower than C++. Since it seems pretty non-corresponding with reality to me. I have seen a few tests C++ vs. Java and the differences were in matter of %. As well as C# in my experience does the same jobs little bit faster than Java and the benchmark results generally confirm it.
    few links:

    http://blog.cfelde.com/2010/06/c-vs-java-performan...
    http://reverseblade.blogspot.com/2009/02/c-versus-...
  • setzer - Thursday, November 3, 2011 - link

    I'm guessing they are comparing their algorithms and I hope they are good programmers for all the languages they tested otherwise the tests don't mean anything.
  • Taft12 - Thursday, November 3, 2011 - link

    I'm not surprised that part of the article would lead to programming language holy wars, but general benchmarks are utterly useless for Facebook. They should (and surely do) care only about performance of the compiled code and hardware platforms that run the site.
  • bji - Thursday, November 3, 2011 - link

    It's illogical to suggest that an interpreted language like Java or C# could ever approach C++ in speed when the same level of optimization is applied to each.

    In my experience, the least optimized C++ code can sometimes be approximated in performance by the best optimized Java code, depending on the task in question.

    Of course, once you spend time optimizing the C++ code then there is no way for Java to keep up.

    I have never used C# but I expect the result for it would be very similar to Java due to the similar mechanics of the language implementation.

    That being said, in many situations raw speed is not the most important factor, and Java and C# can have significant advantages in terms of mechanism of deployment, programmer productivity, etc, that can make those languages very much the best choice in some situations; which is why they are, in fact, used in those situations in which their advantages are best exploited and their weaknesses are least important.

    I think that Ruby takes the last paragraph even further; Ruby is so ungodly slow that it has to make up for it by allowing extreme productivity gains, and I expect that it must (I've never programmed in it to any significant extent), otherwise it wouldn't have any niche at all.
  • data003 - Thursday, November 3, 2011 - link

    While I've lurked this site for many years I just created an account to correct this erroneous bit of fail above.

    1. C# and Java are not interpreted languages. The are compiled at runtime into machine code.

    2. The C# JIT compiler can actually produce more efficient machine code than a compiled C++ binary.

    Since you have never used C# and clearly don't understand how it works, I'd suggest you refrain from commenting on it.
  • Jaybus - Friday, November 4, 2011 - link

    I agree that in some cases a JIT compiler can produce more efficient code, particularly when the application lends itself to runtime optimizations, however that is far from typical. Usually, for a single process, the JIT code, once compiled, will be reasonably close, though the static C/C++ code has the edge.

    But that is for the typical case. Facebook is not a typical case. Each web server is constantly starting many, many short-lived processes. Each process must start up its own copy of the code. This is where JIT fails badly to ahead-of-time compilation. It isn't the execution speed of the code after the JIT gets it compiled. The problem is the startup delay. Even with caching, the bytecode still must be compiled at least once for each new process, which in Facebook's case is millions of times. There is no such delay with ahead-of-time compilation. Therefore, Java and C# have no chance of competing in Facebook's environment.
  • erwinerwinerwin - Thursday, November 3, 2011 - link

    i wonder whether power consumption justifies them to create a new hardware w/ green power architecture and the cost they spend to having a custom build power supply running on 270volt, if it's only saves about 10-20 percent average of power consumption, rather than lets say make a corporate deal to the best power/performance servers producer on the market and modified it with water cooling (for example)???
  • Menetlaus - Thursday, November 3, 2011 - link

    Power savings absolutely justifies the work they did in customizing.

    20W less power consumption x 24/7/365 operation = 175KW.h (per server per year)
    175KW.h x $0.1/kw.h = $17.50 in power savings/year

    Just looking at the final image in the article there are easily 30 racks of 30 servers visable (30 x 30 x 17.50 =) $15 750/year in power saving.

    Since most power going into a computer ends up as wasted heat, if the 900 servers (from above) were consuming the additioanl 20W this would be ~18KW of additional heat being produced which needs to be cooled. This offers additional operational and capital cost savings due to the smaller cooling requirements.

    Water cooling may be a more efficient way of pulling heat out of the server rack, but the additional parts to move the water around the facility and to cool it adds to the total costs. Water is more efficient because it carries more heat/volume than air and with the piping the heat can be taken outside of the server room, while fans heat the air around the servers where another method of removing the heat is then required.

    The custom power supply at 270V and custom motherboard aren't really that difficult to get, as so many makers of each part already do custom designs for major PC makers (Dell/HP/etc). The difference between 208v and 270v from an electrical design standpoint isn't a big change, neither is removing parts from a motherboard.

    In short it's the economy of scale. You or I wouldn't be able to do this for a dozen personal systems as the costs would be huge per system, on the other hand for anyone managing 1'000's of servers the 20W/per adds up quick.

Log in

Don't have an account? Sign up now