Measuring Real-world Power Consumption, Part 1

vApus FOS EWL

The Equal Workload (EWL) test is very similar to our previous vApus Mark II "Real-world Power" test. To create a real-world “equal workload” scenario, we throttle the number of users in each VM to a point where you typically get somewhere between 20% and 80% CPU load on a modern dual CPU server. This is the CPU load of vApus FOS:

Note that we do not measure performance in the "start up" phase and at the end of test.

Compare this with vApus FOS EWL:

In this case we measure until the very end. The amount of work to be done is always equal, and the faster the system, the sooner it can idle. The time of the test is always the same, and all tested systems will spend some time in idle. The faster the system, the faster the workload will be done and the more time will be spent at idle. For this test we do not measure power but energy (power x time) consumed.

The measured performance cannot be compared as in "system x is z% faster than system y", but it does give you an idea of how well the server handles the load and how quickly it will save energy by entering a low power state.

vApus FOS EWL performance

The Xeons are all in the same ballpark. The AMD system with its slower CPUs needs more time to deal with this workload. One interesting thing to note is that Hyper-Threading does not boost throughput. That is not very surprising, considering that the total CPU load is between 20 and 80%. What about response time?

vApus FOS EWL performance

Note that we do not simply take a geomean of the response times. All response times are compared to the reference values. Those percentages (Response time/reference Response time) are then geometrically averaged.

The reference values are measured on the HP DL380 G7 running a native CentOS 5.6 client. We run four tiles of seven vCPUs on top of each server. So the value 117 means that the VMs are on average 17% slower than on the native machine. The 17% higher response times are a result of the fact that when a VM demands two virtual Xeon CPUs, the hypervisor cannot always oblige. It has 24 logical CPUs available, and 28 (7 vCPUs x 4 tiles) are requested. In contrast, the software running on the native machine gets two real cores.

Back to our results. The response time of the AMD based server tells us that even under medium load, a faster CPU can help to reduce the response time, which is the most important performance parameter anyway. However, Hyper-Threading does not help under these circumstances.

Also note that the Open Compute server handles this kind of load slightly better than the HP. So while the Open Compute servers offer a slightly lower top performance, they are at their best in the most realistic benchmark scenarios: between 20% and 80% CPU load. Of course, performance per watt remains the most important metric:

vApus FOS EWL Energy Consumption

When the CPU load is between 20 and 80%, which is realistic, the HP uses 15% more power. We can reduce the energy consumed by another 10% if we disable Hyper-Threading, which as noted does not improve performance in this scenario anyway.

vApus FOS results Measuring Real-World Power Consumption, Part 2
Comments Locked

67 Comments

View All Comments

  • harrkev - Thursday, November 3, 2011 - link

    You should look again at the sine-wave plots. Power factor has more to do with the phase of the current and not so much how much like a sine-wave it looks like.

    As an example, a purely capacitive or purely inductive load will have a perfect sine wave current (but completely out of phase with the voltage), but have a power factor very close to zero...

    So, those graphs do not really tell us much unless you actually crank the numbers to calculate the real power factor.

    http://en.wikipedia.org/wiki/Power_factor#Non-line...
  • ezekiel68 - Thursday, November 3, 2011 - link

    On page 2:

    "The next piece in the Facebook puzzle is that the Open Source tools are Memcached."

    In fact, the tools are not memchached. Instead, software objects from the PHP/c++ stack, programmed by the engineers, are stored in Memcached. Side note - those in the know pronounce it "mem-cache-dee", emphasizing with the last syllable that it is a network daemon. (similar to how the DNS server "bind" is pronounced "bin-dee") So the next piece is Memcached, but the tools are not 'memcached'.
  • JohanAnandtech - Thursday, November 3, 2011 - link

    That is something that went wrong in the final editing by Jarred. Sorry about that and I feel bad about dragging Jarred into this, but unfortunately that is what happened. As you can see further, "Facebook mostly uses memcached to alleviate database load", I was not under the impression that the "Open Source tools are Memcached. " :-)
  • ezekiel68 - Thursday, November 3, 2011 - link

    I was pretty sure it was a mistake and I only mentioned it to have the blemish removed - I've been following and admiring your technical writing since the the early 2000s. Please keep on bringing us great server architecture pieces. Don't worry about Jarred, he's fine too. We all make mistakes.
  • Dug - Thursday, November 3, 2011 - link

    I'm curious what the cost would be on the servers compared to something like the HP.
  • Lucian Armasu - Thursday, November 3, 2011 - link

    According to SemiAccurate, Facebook is considering Calxeda's recently announced ARM servers, too. It could be a lot more efficient to run something like Facebook on those types of servers.
  • JohanAnandtech - Thursday, November 3, 2011 - link

    I personally doubt that very much. The memcached servers are hardly CPU intensive, but a 32 bit ARM processor will not fit the bill. Even when ARM will get 64 bit, it is safe to say that x86 will offer much more DIMM slots. It remains to be seen how the ratio Watt/ RAM cache will be. Until 64 bit ARMs arrive with quite a few memory channels: no go IMHO.

    And the processing intensive parts of the facebook architecture are going to be very slow on the ARMs.

    The funny thing about the ARM presentations is that they assume that virtualization does not exist in the x86 world. A 24 thread x86 CPU with 128 GB can maybe run 30-60 VMs on it, lowering the cost to something like 5-10W per VM. A 5W ARM server is probably not even capable of running one of those machines at a decent speed. You'll be faced with serious management overhead to deal with 30x more servers (or worse!), high response times (single thread performance take a huge dive!) just to save a bit on the power bill.

    As a general rule: if the Atom based servers have not made it to the shortlist yet, they sure are not going to replace it by ARM based ones.
  • tspacie - Thursday, November 3, 2011 - link

    The FaceBook servers take a higher line voltage for increased efficiency. What voltage was supplied to the HP server for these tests?
  • JohanAnandtech - Thursday, November 3, 2011 - link

    Both servers used 230V. I have added this to benchmark page (Thanks, good question). So in reality the Facebook server can consume slightly less.
  • Alex_Haddock - Thursday, November 3, 2011 - link

    TBH we'd position SL class servers for this kind of scenario rather than DL380G7 (which does have a DC power option btw) so not sure it is a relevant comparison. Though I understand using what is available to test.

Log in

Don't have an account? Sign up now