Calxeda's ARM server testedby Johan De Gelas on March 12, 2013 7:14 PM EST
- Posted in
- IT Computing
- Enterprise CPUs
To measure the integer processing potential of the various CPUs, we'll turn to several different workloads. First up, we have 7z LZMA compression and decompression, again looking at performance with one to four threads. On the next page, we'll look at gcc compiler performance.
Compression is a low IPC workload that's sensitive to memory parallelism and latency. The instruction mix is a bit different, but this kind of workload is still somewhat similar to many server workloads.
Clock for clock, the out-of-order Cortex-A9 inside the Calxeda EXC-1000 beats the in-order Atom core. A single Cortex-A9 has no trouble beating the older Atoms while likewise coming close to the much higher clocked N2800. The N2800 and ECX-1000 perform similarly.
Decompression is pretty branch intensive and depends on the latencies of multiply and shift instructions.
Branch mispredictions are common and the Atom tackles branch mispredictions well with its Simulteanous MultiThreaded (SMT) core. The boost from Hyper-Threading is very large here: a second ARM Cortex-A9 core gives a 52% boost and Hyper-Threading gives a 56% boost. This is very much the exception as far as Hyper-Threading performance is concerned.
Looking at both decompression and compression, it looks like a quad ARM Cortex-A9 is about as fast as one Xeon core (without Hyper-Threading) at the same clock. We need about six Cortex-A9 cores to match the Xeon core with Hyper-Threading enabled. The quad-core ECX-1000 1.4GHz is also close to the dual-core, four-threaded Atom at 1.86GHz. This bodes well for Calxeda as the 6.1W S1240 only runs at 1.6GHz.
Post Your CommentPlease log in or sign up to comment.
View All Comments
tuxRoller - Tuesday, March 12, 2013 - linkWhy WOULD you expect DVFS to boost performance?
You seem to think it slightly revelational that the scores are slightly lower (but perhaps statistically meaningless).
dig23 - Tuesday, March 12, 2013 - linkOn-demand seems fair choice to me, its what best you can do on this OSes. But I will be very interested to see energy efficiency numbers when DVFS working on swarm of ARM nodes...:)
tuxRoller - Tuesday, March 12, 2013 - linkIt's not cpu governor I'm talking about but DVFS in particular.
There's bound to be some small amount of latency involved with the process.
It's point isn't for best performance but energy efficiency thus why I made the comment in the first place.
JarredWalton - Tuesday, March 12, 2013 - linkThere's the potential for DVFS to optimize for better performance on a few cores while putting some of the other cores into a lower P-state, but I think that would be more for stuff like Turbo Boost/Turbo Core. It's also possible Johan is referring to the potential for the optimizations to simply improve performance in general.
CodyHall - Friday, March 15, 2013 - linkLove my job, since I've been bringing in $5600… I sit at home, music playing while I work in front of my new iMac that I got now that I'm making it online.(Click Home information)
JohanAnandtech - Wednesday, March 13, 2013 - linkCan you tell me where I got you confused? Because I write "This allowed us to make use of Dynamic Voltage and Frequency Scaling (DVFS, P-states) using the CPUfreq tool. First let's see if all these power saving tweaks have reduced the total throughput."
So it should been clear that we are looking for a better performance/watt ratio. The interesting thing to note is that ARM benefits from p-states, and that Intel's excellent implementation of C-states makes p-states almost useless.
Twonky - Wednesday, March 13, 2013 - linkFor information about a year ago the following post on the Linkedin ARM Based Group gave a link to a M.Sc. thesis publishing figures on the performance/watt ratio for Cortex-A8 and Cortex-A9 based boards:
AncientWisdom - Tuesday, March 12, 2013 - linkVery interesting read, thanks!
staiaoman - Tuesday, March 12, 2013 - linkDamn, Johan. As always- an incredible writeup. Interesting thought experiment to figure that an upper bound on damage to INTC server share might be found by simply looking at how much of the market is running applications like your web server here (where single-threaded performance isn't as important).
Intel powering phones and ARM chips in servers...the end is nigh.
JohanAnandtech - Thursday, March 14, 2013 - linkThanks Staiaoman :-). I'll leave the though experiment to you :-)