Calxeda's ARM server tested

Name: Calxeda's ARM server tested
Item: Calxeda's ARM server tested
Author: Johan De Gelas

by Johan De Gelas on March 12, 2013 7:14 PM EST

99 Comments | Add A Comment

99 Comments

Integer Processing

To measure the integer processing potential of the various CPUs, we'll turn to several different workloads. First up, we have 7z LZMA compression and decompression, again looking at performance with one to four threads. On the next page, we'll look at gcc compiler performance.

Compression

Compression is a low IPC workload that's sensitive to memory parallelism and latency. The instruction mix is a bit different, but this kind of workload is still somewhat similar to many server workloads.

LZMA Compression- 1 to 4 threads

Clock for clock, the out-of-order Cortex-A9 inside the Calxeda EXC-1000 beats the in-order Atom core. A single Cortex-A9 has no trouble beating the older Atoms while likewise coming close to the much higher clocked N2800. The N2800 and ECX-1000 perform similarly.

Decompression

Decompression is pretty branch intensive and depends on the latencies of multiply and shift instructions.

7z LZMA Decompression- 1 to 4 threads

Branch mispredictions are common and the Atom tackles branch mispredictions well with its Simulteanous MultiThreaded (SMT) core. The boost from Hyper-Threading is very large here: a second ARM Cortex-A9 core gives a 52% boost and Hyper-Threading gives a 56% boost. This is very much the exception as far as Hyper-Threading performance is concerned.

Looking at both decompression and compression, it looks like a quad ARM Cortex-A9 is about as fast as one Xeon core (without Hyper-Threading) at the same clock. We need about six Cortex-A9 cores to match the Xeon core with Hyper-Threading enabled. The quad-core ECX-1000 1.4GHz is also close to the dual-core, four-threaded Atom at 1.86GHz. This bodes well for Calxeda as the 6.1W S1240 only runs at 1.6GHz.

Measuring Bandwidth Integer Processing, gcc

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

99 Comments

View All Comments

JohanAnandtech - Wednesday, March 13, 2013 - link
Hmmm ... There is almost no info on how that hypervisor works. It is hard to imagine that kind of system would scale very well. How does it keep Cache coherent? Do you have info on that?
timbuktu - Wednesday, March 13, 2013 - link
I can't speak directly to ScaleMP, but it looks similar to NUMALink.

http://en.wikipedia.org/wiki/NUMAlink

Reading through this article about Calxedas, great job BTW, I couldn't help but think about the old SGI hardware that seemed pretty similar with MIPs (and later Itanium) processors connected through a switch with NUMALink. I haven't played with NUMALink directly in almost a decade, but back then cheaper Altix slabs were ring topology while higher end hardware was switched. In the end though, you could put together a bunch of 1U racks together and have a single system image. Like you mentioned though, cache coherency was exceptionally important. Since we have a uv here, I can point you to the documentation for that box.

http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc...

Everything old is new again, I suppose. Well, except NUMAlink never went away. =D
Tunrip - Wednesday, March 13, 2013 - link
I'd be interested in knowing how the Xeon compared if you did the same test without the virtual machines.
JohanAnandtech - Wednesday, March 13, 2013 - link
The website won't scale to 32 logical cores I am afraid... but we can try to see how far we can get
Colin1497 - Wednesday, March 13, 2013 - link
A better question might be "is 24 VM's a logical number to use?" Would more or fewer VM's work better? The appearance is that you have 24VM's because you have 24 ARM nodes?
duploxxx - Wednesday, March 13, 2013 - link
very interesting, loved reading it. But although early in the ball game I do think there are other way better solutions in the pipe-line from the big OEM:

HP Moonshot
http://h17007.www1.hp.com/us/en/iss/110111.aspx
JohanAnandtech - Wednesday, March 13, 2013 - link
Isn't remarkable how PR people manage to fill so many pages with "extreme" and "the future" without telling anything. Frustation became even higher when I clicked "get the facts" page. That is more like "You are not getting any facts at all".
DuckieHo - Wednesday, March 13, 2013 - link
Since these are set up as webservers, what's the power consumption at say 20-40% load? Usually there is some load instead of completely idle.
JohanAnandtech - Wednesday, March 13, 2013 - link
Good suggestion... you'll like to see a step by step power measurement like SpecPower right? Let me try that.
DanNeely - Wednesday, March 13, 2013 - link
I'd be interested in seeing where, and what happens when you start pushing single chips to and slightly beyond their limits. Calxeda's hardware's proved competitive on a very friendly workload (which I didn't really expect would happen until their A15 product); but in the real world a set of small websites are unlikely to all have equal load levels. Virtual servers on larger CPUs should give more headroom for load spikes; so knowing what the limits on Calxeda's hardware are strikes me as fairly important.

Calxeda's ARM server tested

Integer Processing

Compression

Decompression

Post Your Comment

99 Comments

View All Comments

JohanAnandtech - Wednesday, March 13, 2013 - link

timbuktu - Wednesday, March 13, 2013 - link

Tunrip - Wednesday, March 13, 2013 - link

JohanAnandtech - Wednesday, March 13, 2013 - link

Colin1497 - Wednesday, March 13, 2013 - link

duploxxx - Wednesday, March 13, 2013 - link

JohanAnandtech - Wednesday, March 13, 2013 - link

DuckieHo - Wednesday, March 13, 2013 - link

JohanAnandtech - Wednesday, March 13, 2013 - link

DanNeely - Wednesday, March 13, 2013 - link

Log in

Don't have an account? Sign up now