Measuring Bandwidth

Stream measures "sustainable memory bandwidth" and is thus a good indication of how a CPU will handle data intensive applications. Dr. John McCalpin is the developer and maintainer of the STREAM benchmark.

We compiled with gcc 4.7 on all platforms and used the -O3 -fopen -static settings. It is important to remark that this version of gcc has been optimized by the Linaro group, a non-profit software engineering effort. Linaro's objective is to optimize the kernel and typical tools for the ARM-Cortex A-series CPUs.

On the Intel CPUs we force the threads to make use of Hyper-Threading with taskset. So for example, the four threads measurement is done on two physical cores with four threads. This gives an idea of how a quad-core ARM server node compares to a virtual machine that gets a few physical and few logical cores from the hypervisor. It also allows us to evaluate how two threads on top of an Atom core compare to two ARM cores. When you compare CPUs with similar power consumption, you typically get two ARM cores for each Hyper-Threaded Atom core.

Stream Triad—1 to 4 threads

The ARM based server is a pretty bad choice right now for memory intensive workloads. Even with four cores and DDR3-1333, the useable bandwidth is less than one sixth of what one Xeon core can sustain.

In a similar vein, the ECX-1000 is not capable of providing more bandwidth than an Atom system equipped with DDR2-667. However, both the Atom and ARM cores are pretty bad when it comes to bandwidth. Although the specs claim that the CPUs can drive one channel of DDR3-1066, the measured bandwidth comes nowhere near the theoretical 8.5GB/s that such a DIMM can deliver.

Benchmarking Configuration Integer Processing
Comments Locked


View All Comments

  • tuxRoller - Tuesday, March 12, 2013 - link

    Why WOULD you expect DVFS to boost performance?
    You seem to think it slightly revelational that the scores are slightly lower (but perhaps statistically meaningless).
  • dig23 - Tuesday, March 12, 2013 - link

    On-demand seems fair choice to me, its what best you can do on this OSes. But I will be very interested to see energy efficiency numbers when DVFS working on swarm of ARM nodes...:)
  • tuxRoller - Tuesday, March 12, 2013 - link

    It's not cpu governor I'm talking about but DVFS in particular.
    There's bound to be some small amount of latency involved with the process.
    It's point isn't for best performance but energy efficiency thus why I made the comment in the first place.
  • JarredWalton - Tuesday, March 12, 2013 - link

    There's the potential for DVFS to optimize for better performance on a few cores while putting some of the other cores into a lower P-state, but I think that would be more for stuff like Turbo Boost/Turbo Core. It's also possible Johan is referring to the potential for the optimizations to simply improve performance in general.
  • CodyHall - Friday, March 15, 2013 - link

    Love my job, since I've been bringing in $5600… I sit at home, music playing while I work in front of my new iMac that I got now that I'm making it online.(Click Home information)
  • JohanAnandtech - Wednesday, March 13, 2013 - link

    Can you tell me where I got you confused? Because I write "This allowed us to make use of Dynamic Voltage and Frequency Scaling (DVFS, P-states) using the CPUfreq tool. First let's see if all these power saving tweaks have reduced the total throughput."

    So it should been clear that we are looking for a better performance/watt ratio. The interesting thing to note is that ARM benefits from p-states, and that Intel's excellent implementation of C-states makes p-states almost useless.
  • Twonky - Wednesday, March 13, 2013 - link

    For information about a year ago the following post on the Linkedin ARM Based Group gave a link to a M.Sc. thesis publishing figures on the performance/watt ratio for Cortex-A8 and Cortex-A9 based boards:
  • AncientWisdom - Tuesday, March 12, 2013 - link

    Very interesting read, thanks!
  • staiaoman - Tuesday, March 12, 2013 - link

    Damn, Johan. As always- an incredible writeup. Interesting thought experiment to figure that an upper bound on damage to INTC server share might be found by simply looking at how much of the market is running applications like your web server here (where single-threaded performance isn't as important).

    Intel powering phones and ARM chips in servers...the end is nigh.
  • JohanAnandtech - Thursday, March 14, 2013 - link

    Thanks Staiaoman :-). I'll leave the though experiment to you :-)

Log in

Don't have an account? Sign up now