X-Gene 1, Atom C2000 and Xeon E3: Exploring the Scale-Out Server World

Name: X-Gene 1, Atom C2000 and Xeon E3: Exploring the Scale-Out Server World
Item: X-Gene 1, Atom C2000 and Xeon E3: Exploring the Scale-Out Server World
Author: Johan De Gelas

by Johan De Gelas on March 9, 2015 2:00 PM EST

47 Comments | Add A Comment

47 Comments

Memory Subsystem Bandwidth

While the Xeon E5 has ample bandwidth for most applications courtesy of the massive quad-channel memory subsystem, the Xeon E3 and Atom C2000 only have two memory channels. The Xeon E3 and Atom C2000 also do not support the fastest DRAM modules (DDR3-1600, Xeon E5: DDR4-2133), so memory bandwidth can be a problem for some applications.

We measured the memory bandwidth in Linux. The binary was compiled with the Open64 compiler 5.0 (Opencc). It is a multi-threaded, OpenMP based, 64-bit binary. The following compiler switches were used:

-Ofast -mp -ipa

To keep things simple, we only report the Triad sub-benchmark of our OpenMP enabled Stream benchmark.

Stream Triad

First of all, we should note that the clock speed of the CPU has very little influence on the Stream score. Notice the small difference (12.7%) between the Xeon E3-1240 that can boost to 3.6GHz and the Xeon E3-1230L that is limited to 2.3GHz (Turbo Boost with four cores busy).

The Xeon E3-1200 v3 is slightly more efficient than the Xeon E3-1200 v2; we measured a 7% bandwidth improvement. The Xeon E3 also offers up to 33% more bandwidth than the Atom C2750 with the same DIMMs.

To do an apples-to-apples comparison with the X-Gene 1, we compiled the same OpenMP enabled Stream benchmark (O3 –fopenmp –static).

Stream Triad GCC 4.8.2

The Xeon E3 has the most efficient memory controller: it can extract almost as much bandwidth as the quad-channel memory controller of the X-Gene and about 46% more than the Atom. Our guess is that the X-Gene still has quite a bit of headroom to improve the memory subsystem. There is work to be done on the compiler side and on the hardware.

Benchmark Configuration Memory Subsystem: Latency

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

47 Comments

View All Comments

IBleedOrange - Monday, March 9, 2015 - link
EETimes is wrong.
Google "Intel Denverton"
beginner99 - Monday, March 9, 2015 - link
Maybe it would be good to mention the X-Gene is made on a 40nm process at the start of the article. I read the article and think for myself that the X-Gene is crap and in the end you get the explanation. It's on 40 nm vs Atoms on Intel 22 nm. It's a huge difference and currently the article is a bit misleading eg. shining a bad light on X-Gene and ARM. (And I say this even though I always was a proponent of Intel Big cores in almost all server applications).
Stephen Barrett - Monday, March 9, 2015 - link
If APM had a newer part to test then we would have tested it. XG2 is simply not out yet. So the fact that APM has their flagship SoC on an older process is not misleading... Its the facts. The currently available Intel parts have a process advantage.
warreo - Monday, March 9, 2015 - link
Mentioning it at the start would be good from a technical disclosure standpoint, but I'm not sure for the purposes of this article it truly matters. The article is comparing what is currently available now from APM and Intel. Reality is Intel will likely have a significant process advantage for the foreseeable future, and if you wanted to see a like for like comparison on a process basis, then you'll probably need to wait 2-3 years for X-Gene to get on 22nm, meanwhile Intel will have moved on to 10nm.
CajunArson - Monday, March 9, 2015 - link
The 40nm process is only really relevant when it comes to the power-consumption comparisons.
A 28nm.. or 20nm or 16nm... part with the same cores at the same clockspeeds will register the exact same level of performance. The only difference will be that the smaller lithographic processes should provide that level of performance in a smaller power envelope.
JohanAnandtech - Monday, March 9, 2015 - link
well, with so much time invested in an article, I always hope people will read the pages between page 1 and 18 too :-p. It is mentioned in the overview of the SoCs on page 5 and quite a few times at other pages too.
colinstu - Monday, March 9, 2015 - link
what server is on the bottom of the first page?
JohanAnandtech - Monday, March 9, 2015 - link
A very old MSI server :-). Just to show people what webfarms used before the micro server era.
Samus - Monday, March 9, 2015 - link
I use the Xeon E3-1230v3 in desktop applications all the time. It's basically an i7 for the price of an i5.

And a lot of IT dept dump them on eBay cheap when they upgrade their servers. They can be had well under $200 lightly used. The 80w TDP could theoretically have some drawbacks for boost time, but the real-world performance according to passmark elongated tests doesn't seem to show any difference between it's boost potential and that of an 88w i7-k

Great CPU's.
Alone-in-the-net - Monday, March 9, 2015 - link
In both your compilers, you need to specify the -march=native so the the compiler can optimize for the architecture you are running on, -o3 is not enough. This enables the compiler to use cpu specific commands.

X-Gene 1, Atom C2000 and Xeon E3: Exploring the Scale-Out Server World

Memory Subsystem Bandwidth

Post Your Comment

47 Comments

View All Comments

IBleedOrange - Monday, March 9, 2015 - link

beginner99 - Monday, March 9, 2015 - link

Stephen Barrett - Monday, March 9, 2015 - link

warreo - Monday, March 9, 2015 - link

CajunArson - Monday, March 9, 2015 - link

JohanAnandtech - Monday, March 9, 2015 - link

colinstu - Monday, March 9, 2015 - link

JohanAnandtech - Monday, March 9, 2015 - link

Samus - Monday, March 9, 2015 - link

Alone-in-the-net - Monday, March 9, 2015 - link

Log in

Don't have an account? Sign up now