Calxeda's ARM server tested

Name: Calxeda's ARM server tested
Item: Calxeda's ARM server tested
Author: Johan De Gelas

by Johan De Gelas on March 12, 2013 7:14 PM EST

99 Comments | Add A Comment

99 Comments

Building and Compiling

We compiled the 7z source by performing a make -jx (with x being the number of threads). Compiling is branch intensive (22%) workload that does mostly loads and stores (about 40%).

Looking at the single-thread performance, the ARM Cortex-A9 and Atom are in the same ballpark. This is the kind of workload where the Sandy Bridge core of the Xeon really shines. You need about eight Cortex-A9 cores to beat one Xeon (without HT). And it must be said: compiling inside a virtual machine on top of the Xeon E5 is a very pleasant experience compared to the long wait times on the Atom and ECX.

GCC compile—1 to 4 threads

Lessons so Far

A quad-core Cortex-A9 performs well in server workloads that are mostly memory latency sensitive. A quad-core Cortex-A9 ECX-1000 at 1.4GHz has no trouble competing with Atoms at slightly higher clockspeeds (1.6GHz). There is only one exception: bandwidth intensive workloads.

Both Atom and ARM based servers have the disadvantage of being rather slow in typical "management" tasks such as compiling, installing, and updating new software. Compiling a rather simple piece of software in a VM with only two Xeon vCPUs (running on one 1 core + HTT) took only 37 seconds. A single-core Atom server needed 275 seconds, while the quad-core ARM ECX-1000 needed 137 seconds.

But the Boston Viridis is much more than just a chassis with 24 server nodes. It has a high performance switching fabric. So it's time to see what this server can do in a real server environment.

Integer Processing Finding a Good Fit

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

99 Comments

View All Comments

tuxRoller - Tuesday, March 12, 2013 - link
Why WOULD you expect DVFS to boost performance?
You seem to think it slightly revelational that the scores are slightly lower (but perhaps statistically meaningless).
dig23 - Tuesday, March 12, 2013 - link
On-demand seems fair choice to me, its what best you can do on this OSes. But I will be very interested to see energy efficiency numbers when DVFS working on swarm of ARM nodes...:)
tuxRoller - Tuesday, March 12, 2013 - link
It's not cpu governor I'm talking about but DVFS in particular.
There's bound to be some small amount of latency involved with the process.
It's point isn't for best performance but energy efficiency thus why I made the comment in the first place.
JarredWalton - Tuesday, March 12, 2013 - link
There's the potential for DVFS to optimize for better performance on a few cores while putting some of the other cores into a lower P-state, but I think that would be more for stuff like Turbo Boost/Turbo Core. It's also possible Johan is referring to the potential for the optimizations to simply improve performance in general.
CodyHall - Friday, March 15, 2013 - link
Love my job, since I've been bringing in $5600… I sit at home, music playing while I work in front of my new iMac that I got now that I'm making it online.(Click Home information)
http://goo.gl/9u8us
JohanAnandtech - Wednesday, March 13, 2013 - link
Can you tell me where I got you confused? Because I write "This allowed us to make use of Dynamic Voltage and Frequency Scaling (DVFS, P-states) using the CPUfreq tool. First let's see if all these power saving tweaks have reduced the total throughput."

So it should been clear that we are looking for a better performance/watt ratio. The interesting thing to note is that ARM benefits from p-states, and that Intel's excellent implementation of C-states makes p-states almost useless.
Twonky - Wednesday, March 13, 2013 - link
For information about a year ago the following post on the Linkedin ARM Based Group gave a link to a M.Sc. thesis publishing figures on the performance/watt ratio for Cortex-A8 and Cortex-A9 based boards:
www.linkedin.com/groups/Single-CortexA8-CortexA9-in-comparison-85447.S.84348310
AncientWisdom - Tuesday, March 12, 2013 - link
Very interesting read, thanks!
staiaoman - Tuesday, March 12, 2013 - link
Damn, Johan. As always- an incredible writeup. Interesting thought experiment to figure that an upper bound on damage to INTC server share might be found by simply looking at how much of the market is running applications like your web server here (where single-threaded performance isn't as important).

Intel powering phones and ARM chips in servers...the end is nigh.
JohanAnandtech - Thursday, March 14, 2013 - link
Thanks Staiaoman :-). I'll leave the though experiment to you :-)

Calxeda's ARM server tested

Building and Compiling

Lessons so Far

Post Your Comment

99 Comments

View All Comments

tuxRoller - Tuesday, March 12, 2013 - link

dig23 - Tuesday, March 12, 2013 - link

tuxRoller - Tuesday, March 12, 2013 - link

JarredWalton - Tuesday, March 12, 2013 - link

CodyHall - Friday, March 15, 2013 - link

JohanAnandtech - Wednesday, March 13, 2013 - link

Twonky - Wednesday, March 13, 2013 - link

AncientWisdom - Tuesday, March 12, 2013 - link

staiaoman - Tuesday, March 12, 2013 - link

JohanAnandtech - Thursday, March 14, 2013 - link

Log in

Don't have an account? Sign up now