Building and Compiling

We compiled the 7z source by performing a make -jx (with x being the number of threads). Compiling is branch intensive (22%) workload that does mostly loads and stores (about 40%).

Looking at the single-thread performance, the ARM Cortex-A9 and Atom are in the same ballpark. This is the kind of workload where the Sandy Bridge core of the Xeon really shines. You need about eight Cortex-A9 cores to beat one Xeon (without HT). And it must be said: compiling inside a virtual machine on top of the Xeon E5 is a very pleasant experience compared to the long wait times on the Atom and ECX.

GCC compile—1 to 4 threads

Lessons so Far

A quad-core Cortex-A9 performs well in server workloads that are mostly memory latency sensitive. A quad-core Cortex-A9 ECX-1000 at 1.4GHz has no trouble competing with Atoms at slightly higher clockspeeds (1.6GHz). There is only one exception: bandwidth intensive workloads.

Both Atom and ARM based servers have the disadvantage of being rather slow in typical "management" tasks such as compiling, installing, and updating new software. Compiling a rather simple piece of software in a VM with only two Xeon vCPUs (running on one 1 core + HTT) took only 37 seconds. A single-core Atom server needed 275 seconds, while the quad-core ARM ECX-1000 needed 137 seconds.

But the Boston Viridis is much more than just a chassis with 24 server nodes. It has a high performance switching fabric. So it's time to see what this server can do in a real server environment.

Integer Processing Finding a Good Fit
Comments Locked

99 Comments

View All Comments

  • kfreund - Friday, March 15, 2013 - link

    Keep in mind that this is VERY early in the life cycle, and therefore costs are artificially high due to low volumes. Ramp up the volumes, and the prices will come WAY down.
  • wsw1982 - Wednesday, April 3, 2013 - link

    Ja, IF they have high volume. But even if there is high volume, it's shared between different ARM suppliers and needless to say, the ATOM. How much can it be for one company?

    But the question is where the ARM get the volume? less performance, comparable power consumption, less performance/watt rational (not this kind extreme bias case ), less flexibility, less software support (stability), vendor specific (you can build a normal server, but can you build up a massive parallel cluster?), oh, don't forgot, more (much more) expensive. Which company will sacrifice themselves to beef up the market volume of the ARM server?
  • Sputnik_b - Thursday, March 14, 2013 - link

    Hi Johan,
    Nice job benchmarking and analyzing the results. Our group at EPFL has recently done some work aimed at understanding the demands that scale-out workloads, such as web serving, place on processor architectures. Our findings very much agree with your benchmark conclusions for the Xeon/Calxeda pair. However, a key result of our work was that many-core processors (with dozens of simple cores per chip) are the sweet spot with regard to performance per TCO dollar. I encourage you to take a look at our work -- http://parsa.epfl.ch/~grot/pubs/SOP-TCO_IEEEMicro....
    Please consider benchmarking a Tilera system to round-out your evaluation.
    Best regards!
  • Sputnik_b - Thursday, March 14, 2013 - link

    Sorry, bad URL in the post above. This should work: http://parsa.epfl.ch/~grot/pubs/SOP-TCO_IEEEMicro....
  • aryonoco - Friday, March 15, 2013 - link

    LWN.net has a very interesting write-up on a talk given by Facebook's Director of Capacity Engineering & Analysis on the future of ARM servers and how they see ARM servers fit in with their operation. I think it gives valuable insight on this topic.

    http://lwn.net/SubscriberLink/542518/bb5d5d3498359... (free link)
  • phoenix_rizzen - Friday, March 15, 2013 - link

    ARM already has hardware virtualisation extensions. Linux-KVM has already been ported over to support it.
  • Andys - Saturday, March 16, 2013 - link

    Great article, finally good to see some realistic benchmarks run on the new ARM platform.

    But I feel that you screwed up in one regard: You should have tested the top Xoen CPU also - the E5-2690.

    As you know from your own previous articles, Intel's top CPUs are also the most power efficient under full load, and the price would still be cheaper than the full loaded Calxeda box anyway.
  • an3000 - Monday, March 25, 2013 - link

    It is a test using wrong software stack. Yes, I am not afraid to say that! Apache will never be used on such ARM servers. They are exact match for Memcached or Nginx or another set-get type services, like static data serving. Using Apache or LAMP stack is too much favorable for Xeon.
    What I would like to see is: Xeon server with max RAM non-virtualized running 4-8 (similar to core count) instances of Memcached/Nginx/lighttpd vs cluster of ARM cores doing the same light task. Measure performance and power usage.
  • wsw1982 - Wednesday, April 3, 2013 - link

    My suggestion will be let them run one hard-disk to one hard-disk copy and measure the power usage:)

Log in

Don't have an account? Sign up now