ARM based servers hold the promise of extremely low power and excellent performance per Watt ratios. It's theoretically possible to place an incredible number of servers into a single rack; there are already implementations with as many as 1000 ARM servers in one rack (48 server nodes in a 2U chassis). What's more, all of those nodes consume less than 5KW combined (or around 5W per quad-core ARM node). But whenever a new technology is hyped, it's important to remain objective. The media loves to rave about new trends and people like reading about "some new thing"; however, at the end of the day the system administrator has to keep his IT services working and convince his boss to invest in new technologies.

At first sight, the relatively low performance per core of ARM CPUs seems like a bad match for servers. The dominant CPU in the server market is without doubt Intel's Xeon. The success of the Xeon family is largely rooted in its excellent single-threaded (or per core) performance at moderate power levels (70-95W). Combine this exceptional single-threaded performance with a decent core count and you get good performance in almost any kind of application. Economies of scale and the resulting price levels are also very important, but the server market has been more than willing to pay a little extra if the response times are lower and the energy bills moderate.

A data point proving that single-threaded performance is still important is the evolution of the T-series of Oracle (or Sun if you prefer). The Sun T3 had 16 cores with 128 threads; the T4 however had only 8 cores with 8 threads each, and CEO Larry Ellison touted more than once that single-threaded performance was massively improved, up to five times faster. Do we really need another server with a flock of slow but energy efficient cores? Has history not taught us that a few "bulls" is better than "a flock of chickens"?

History has also shown that the amount of memory per server is very important. Many HPC and virtualization applications are limited by the amount of RAM. The current Cortex-A9 generation of ARM CPUs has a 32-bit address bus and does not support more than 4GB.

And yet, the interest in ARM-based servers is growing, and there is more to it than just hype. Yes, ARM-based CPUs still lack the number crunching power and the massive amount of DIMM slots that Xeon's memory controller can handle, but ARM CPUs score extremely well when it comes to cost and power consumption.

ARM based CPU have also made giant steps forward when it comes to performance. To give you a few data points: a dual ARM Cortex-A9 at 1.2GHz (Samsung Exynos 1.2GHz) introduced in 2011 compresses more than 10 times faster than the typical ARM 11 based cores in 2008. The SunSpider performance increased by a factor 20 according to Anand's measurements on the iPhones (though part of that is almost certainly thanks to browser and software optimizations). The latest ARM Cortex-A15 is again quite a bit more powerful, offering about 50% higher performance. The A57 will add 64-bit support and is estimated to deliver 20 to 30% higher performance. In short, the single-threaded performance is increasing quickly, and the same is true for the amount of RAM that can be addresssed. The ARM Cortex-A9 is limited to 4GB but the Cortex-A15 should be able to address 16GB while the A57 will be able to address a lot more.

It is likely just a matter of time before ARM products can start to chip away at segments of the server market. How much time? The best way to find out is to look at the most mature ARM server shipping today: the Calxeda based Boston Viridis. Just what can this server handle today, where does it have the potential to succeed, and what are its shortcomings? Let's find out.

It's a Cluster, Not a Server
POST A COMMENT

102 Comments

View All Comments

  • Madpacket - Wednesday, March 13, 2013 - link

    And all of a sudden AMD's acquisition of SeaMicro is starting to make sense. Thanks Johan, great article! Reply
  • JohanAnandtech - Wednesday, March 13, 2013 - link

    I really really hope they downscale the current SeaMicro's soon. Because with a starting price at $139000, they are not catering to the typical SME :-). Reply
  • joshv - Wednesday, March 13, 2013 - link

    It seems this has a very narrow application in VM hosting, but I am not sure it's applicable when you have the choice of just scaling up memory or process usage of the single instance Xeon server. For example, I could load 24 instances of my production middle tier on the ARM server - or I could run one instance on a Xeon server and give it all the memory and make sure it spawns enough threads to keep all the internal cores busy. Perhaps my middle tier software has issues with handling all that RAM, so maybe I run 4 instances of it as a process, not a biggy.

    I am going to bet that the Xeon server will win as it won't have the VM overhead.
    Reply
  • Kurge - Wednesday, March 13, 2013 - link

    I would be interested in a bare metal comparison. Since you're serving up the same app why would you split it between 24 VMs on the Xeon server? It's a bit contrived.

    Just load up Server 2012 and IIS or Linux + Apache straight up on the Xeon and see how it performs.
    Reply
  • MrSpadge - Wednesday, March 13, 2013 - link

    Very interesting!

    I'd prefer a fat machine with virtualized servers to get automatic load balancing, but it's not like one couldn't shuffle tasks around in the ARM farm. And there's room for improvement: be it the next Atom or the memory controller in the current ECX-1000 CPUs. And take a look at how badly they scale from 2 to 4 threads - surely, there's lot's of rooms left!
    Reply
  • rubyl - Wednesday, March 13, 2013 - link

    What is the average CPU utilization for the Viridis nodes and for the Xeon system under the 5 different concurrency loads (for the 24 webserver workload)? Reply
  • gercho - Wednesday, March 13, 2013 - link

    When you said " The next generation ARM servers are already on the way and will probably hit the market in the third quarter of this year. The "Midway" SoC is based on a 28nm (TSMC) Cortex-A15 chip. A 28nm A15 offers 50% higher single-threaded integer performance at slightly higher power levels and can address up to 16GB of RAM." As far as I know the A15 cores have 50% more performance but consume 3X more power, that's not "slightly"......... Reply
  • nofumble62 - Wednesday, March 13, 2013 - link

    50% more performance at 3X more power... reminding me of the Netburst architect. Reply
  • thenewguy617 - Wednesday, March 13, 2013 - link

    Can you please point me to sources of your number?
    Thanks
    Reply
  • Wilco1 - Thursday, March 14, 2013 - link

    Where on earth you do get that 3x from? So far no 28nm Cortex-A15 chips have been released. The A15 in the Exynos Octo uses about 1.25W per core at 1.8GHz according to Samsung. That's slightly more power than a Calxeda A9 uses per core, but the A15 gives twice the performance per core. Reply

Log in

Don't have an account? Sign up now