The Results that Matter

Before you jump ahead to the charts below, we suggest taking some time to properly interpret the results. First of all, we simulate between 5 to 15 "busy" users on the web server per second. As a user clicks somewhere on the website, this can result in a few requests or tens of requests. For example, accessing the forum on the website results in two simple "GET" requests, while posting a reply results in an avalanche of 56 POSTs and GETs. That is why we report performance in "responses per second". Responses are somewhat similar from the CPU load point of view if you look at a statistically large enough number of them. User actions are so wildly different that in some cases performing two user actions per seconds can require more processing power and network bandwidth than 20 user actions per second.

Webhosting throughtput—average over 24 web servers

At the low concurrencies, the Intel machine leverages turboboost and its exceptionally high per core performance. At the higher web loads, the total throughput of the 96 (24x quad-core SoCs) ARM Cortex-A9 cores is up to 50% higher than the low power 32 thread/16 core (2x Octal core) Xeons. Even the mighty 2660 cannot beat the herd of ARM SoCs.

While we have lots and lots of experience with x86 servers, we had almost none with ARM based servers, so we met up with the people of Calxeda engineering and got some valuable optimization tips. It turns out that the internal switch fabric can be tuned in various ways. For example, the link speed from one node is by default set to 2.5 gbit/s, which is rather high considering that we are mostly CPU limited and use less than 0.5Gbit/s per node. Setting the link speed of each node to 1Gbit/s should lower power and gives more than enough bandwidth. We also updated to a slightly newer kernel (155) from the Calxeda kernel PPA (Personal Package Archive). This allowed us to make use of Dynamic Voltage and Frequency Scaling (DVFS, P-states) using the CPUfreq tool. First let's see if all these power saving tweaks have reduced the total throughput.

Webhosting throughtput Optimized—average over 24 web servers

The changes did not give any boost in throughput (in many cases the scores might even be slightly slower), but the changes might lower power use and/or response times. Let's look at that next.

Response Times

Webhosting Response time optimized—average over 24 web servers

Again, the Intel machine performs better at lower concurrencies, but our ARM server delivers lower response times at high load. Our optimizations have had no effect on response times.

Our Real World Test Energy and Power
Comments Locked

99 Comments

View All Comments

  • kfreund - Friday, March 15, 2013 - link

    Keep in mind that this is VERY early in the life cycle, and therefore costs are artificially high due to low volumes. Ramp up the volumes, and the prices will come WAY down.
  • wsw1982 - Wednesday, April 3, 2013 - link

    Ja, IF they have high volume. But even if there is high volume, it's shared between different ARM suppliers and needless to say, the ATOM. How much can it be for one company?

    But the question is where the ARM get the volume? less performance, comparable power consumption, less performance/watt rational (not this kind extreme bias case ), less flexibility, less software support (stability), vendor specific (you can build a normal server, but can you build up a massive parallel cluster?), oh, don't forgot, more (much more) expensive. Which company will sacrifice themselves to beef up the market volume of the ARM server?
  • Sputnik_b - Thursday, March 14, 2013 - link

    Hi Johan,
    Nice job benchmarking and analyzing the results. Our group at EPFL has recently done some work aimed at understanding the demands that scale-out workloads, such as web serving, place on processor architectures. Our findings very much agree with your benchmark conclusions for the Xeon/Calxeda pair. However, a key result of our work was that many-core processors (with dozens of simple cores per chip) are the sweet spot with regard to performance per TCO dollar. I encourage you to take a look at our work -- http://parsa.epfl.ch/~grot/pubs/SOP-TCO_IEEEMicro....
    Please consider benchmarking a Tilera system to round-out your evaluation.
    Best regards!
  • Sputnik_b - Thursday, March 14, 2013 - link

    Sorry, bad URL in the post above. This should work: http://parsa.epfl.ch/~grot/pubs/SOP-TCO_IEEEMicro....
  • aryonoco - Friday, March 15, 2013 - link

    LWN.net has a very interesting write-up on a talk given by Facebook's Director of Capacity Engineering & Analysis on the future of ARM servers and how they see ARM servers fit in with their operation. I think it gives valuable insight on this topic.

    http://lwn.net/SubscriberLink/542518/bb5d5d3498359... (free link)
  • phoenix_rizzen - Friday, March 15, 2013 - link

    ARM already has hardware virtualisation extensions. Linux-KVM has already been ported over to support it.
  • Andys - Saturday, March 16, 2013 - link

    Great article, finally good to see some realistic benchmarks run on the new ARM platform.

    But I feel that you screwed up in one regard: You should have tested the top Xoen CPU also - the E5-2690.

    As you know from your own previous articles, Intel's top CPUs are also the most power efficient under full load, and the price would still be cheaper than the full loaded Calxeda box anyway.
  • an3000 - Monday, March 25, 2013 - link

    It is a test using wrong software stack. Yes, I am not afraid to say that! Apache will never be used on such ARM servers. They are exact match for Memcached or Nginx or another set-get type services, like static data serving. Using Apache or LAMP stack is too much favorable for Xeon.
    What I would like to see is: Xeon server with max RAM non-virtualized running 4-8 (similar to core count) instances of Memcached/Nginx/lighttpd vs cluster of ARM cores doing the same light task. Measure performance and power usage.
  • wsw1982 - Wednesday, April 3, 2013 - link

    My suggestion will be let them run one hard-disk to one hard-disk copy and measure the power usage:)

Log in

Don't have an account? Sign up now