Measuring Bandwidth

Stream measures "sustainable memory bandwidth" and is thus a good indication of how a CPU will handle data intensive applications. Dr. John McCalpin is the developer and maintainer of the STREAM benchmark.

We compiled with gcc 4.7 on all platforms and used the -O3 -fopen -static settings. It is important to remark that this version of gcc has been optimized by the Linaro group, a non-profit software engineering effort. Linaro's objective is to optimize the kernel and typical tools for the ARM-Cortex A-series CPUs.

On the Intel CPUs we force the threads to make use of Hyper-Threading with taskset. So for example, the four threads measurement is done on two physical cores with four threads. This gives an idea of how a quad-core ARM server node compares to a virtual machine that gets a few physical and few logical cores from the hypervisor. It also allows us to evaluate how two threads on top of an Atom core compare to two ARM cores. When you compare CPUs with similar power consumption, you typically get two ARM cores for each Hyper-Threaded Atom core.

Stream Triad—1 to 4 threads

The ARM based server is a pretty bad choice right now for memory intensive workloads. Even with four cores and DDR3-1333, the useable bandwidth is less than one sixth of what one Xeon core can sustain.

In a similar vein, the ECX-1000 is not capable of providing more bandwidth than an Atom system equipped with DDR2-667. However, both the Atom and ARM cores are pretty bad when it comes to bandwidth. Although the specs claim that the CPUs can drive one channel of DDR3-1066, the measured bandwidth comes nowhere near the theoretical 8.5GB/s that such a DIMM can deliver.

Benchmarking Configuration Integer Processing
Comments Locked

99 Comments

View All Comments

  • JohanAnandtech - Wednesday, March 13, 2013 - link

    Hmmm ... There is almost no info on how that hypervisor works. It is hard to imagine that kind of system would scale very well. How does it keep Cache coherent? Do you have info on that?
  • timbuktu - Wednesday, March 13, 2013 - link

    I can't speak directly to ScaleMP, but it looks similar to NUMALink.

    http://en.wikipedia.org/wiki/NUMAlink

    Reading through this article about Calxedas, great job BTW, I couldn't help but think about the old SGI hardware that seemed pretty similar with MIPs (and later Itanium) processors connected through a switch with NUMALink. I haven't played with NUMALink directly in almost a decade, but back then cheaper Altix slabs were ring topology while higher end hardware was switched. In the end though, you could put together a bunch of 1U racks together and have a single system image. Like you mentioned though, cache coherency was exceptionally important. Since we have a uv here, I can point you to the documentation for that box.

    http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc...

    Everything old is new again, I suppose. Well, except NUMAlink never went away. =D
  • Tunrip - Wednesday, March 13, 2013 - link

    I'd be interested in knowing how the Xeon compared if you did the same test without the virtual machines.
  • JohanAnandtech - Wednesday, March 13, 2013 - link

    The website won't scale to 32 logical cores I am afraid... but we can try to see how far we can get
  • Colin1497 - Wednesday, March 13, 2013 - link

    A better question might be "is 24 VM's a logical number to use?" Would more or fewer VM's work better? The appearance is that you have 24VM's because you have 24 ARM nodes?
  • duploxxx - Wednesday, March 13, 2013 - link

    very interesting, loved reading it. But although early in the ball game I do think there are other way better solutions in the pipe-line from the big OEM:

    HP Moonshot
    http://h17007.www1.hp.com/us/en/iss/110111.aspx
  • JohanAnandtech - Wednesday, March 13, 2013 - link

    Isn't remarkable how PR people manage to fill so many pages with "extreme" and "the future" without telling anything. Frustation became even higher when I clicked "get the facts" page. That is more like "You are not getting any facts at all".
  • DuckieHo - Wednesday, March 13, 2013 - link

    Since these are set up as webservers, what's the power consumption at say 20-40% load? Usually there is some load instead of completely idle.
  • JohanAnandtech - Wednesday, March 13, 2013 - link

    Good suggestion... you'll like to see a step by step power measurement like SpecPower right? Let me try that.
  • DanNeely - Wednesday, March 13, 2013 - link

    I'd be interested in seeing where, and what happens when you start pushing single chips to and slightly beyond their limits. Calxeda's hardware's proved competitive on a very friendly workload (which I didn't really expect would happen until their A15 product); but in the real world a set of small websites are unlikely to all have equal load levels. Virtual servers on larger CPUs should give more headroom for load spikes; so knowing what the limits on Calxeda's hardware are strikes me as fairly important.

Log in

Don't have an account? Sign up now