X-Gene 1, Atom C2000 and Xeon E3: Exploring the Scale-Out Server World
by Johan De Gelas on March 9, 2015 2:00 PM ESTQuick Overview of the SoCs
In this review, we compare four different SoCs:
- Intel's Xeon E3-1240 v3 3.4GHz
- Intel's Xeon E3-1230L v3 1.8GHz
- Intel's Xeon E3-1265L v2 2.5GHz
- Intel's Atom C2750 2.4GHz
- AppliedMicro's X-Gene 1 2.4GHz
We have discussed the Xeon E3-1200 v3, Atom C2000, and X-Gene in more detail in our previous article. What follows is a quick discussion of why we tested these specific SKUs.
The Intel Xeon E3-1240 v3 is a speedy (3.4GHz, eight threads) Xeon E3 that is still affordable and has a decent TDP (69W). If you want a 6% higher clock (3.6GHz), Intel charges you 2.3X more. The Xeon E3-1240 v3 has an excellent performance per dollar ratio.
The Xeon E3-1230L v3 paper specs are incredible: eight cores that can boost to up to 2.8GHz (with a base clock of 1.8GHz) and a very low TDP of 25W. To see how much progress Intel has made, we compare it with the 45W Intel E3-1265L v2 at 2.5GHz based on the Ivy Bridge core. Will the Haswell core be enough to overcome the 700MHz (1.8 vs 2.5GHz) lower clock speed, which is necessary to make the chip work with a very low 25W TDP? How does this very low power Xeon with the brawny core compare to the Atom C2750?
The Atom C2750 is Intel's fastest Atom-based Xeon. We are very curious to see if there are applications where the eight lean cores can outperform the four wide cores of the Xeon E3.
And last but not least, the X-Gene 2.4GHz, the first server SoC incarnation of the ARMv8-A or AArch64 instruction set. The X-Gene has twice as many memory channels and can support twice as many DIMM slots as its Intel competitors. The cache architecture is a mix of the Atom C2000 and Xeon E3. Just like the Atom, two cores share a smaller L2 cache (256KB vs 1MB). And like the Xeon E3 (and unlike the Atom C2000), the X-Gene also has access to and 8MB L3 cache. Less positive is the antiquated 40nm production process and the fact that power management is much less sophisticated than Intel's solutions. The result is a relatively high 40W TDP.
While not every application was available on the X-Gene, we gathered enough datapoints to do a meaningful comparison. Where will the first productized ARMv8 chip land? Will it be an Atom C2000 or Xeon E3 killer, or neither? What kind of applications run well, and what kind of applications are still running much faster on a x86 chip?
We've added a few CPUs/SoCs to further improve the comparison. We've thrown in the Atom N2800 to mimic one of the worst Intel server CPUs ever (well, maybe "Paxville MP" was worse), the Atom S1260. The Xeon X5470 ("Harpertown", Penryn architecture) is also featured just to satisfy our curiosity and show how much performance has evolved. To understand the performance of the different SoCs, we should also take into account that the Intel chips almost always run at a higher clock speed than the advertised clock speed, thanks to Turbo Boost.
Overview of Clock Speeds | ||||
SoC | Max. Turbo Boost | Turbo Boost with Two Cores |
Turbo Boost with All Cores |
TDP |
Xeon E3-1240v3 3.4 | 3800 | 3600 | 3600 | 80W |
Xeon E3-1230Lv3 1.8 | 2800 | 2300 | 2300 | 25W |
Xeon E3-1220v2 3.1 | 3500 | 3500 | 3300 | 69W |
Xeon E3-1265Lv2 2.5 | 3500 | 3400 | 3100 | 45W |
Atom C2750 2.4 | 2600 | 2600 | 2400 | 20W |
X-Gene 1 2.4 | N/A | N/A | 2400 | 40W |
The 1.8GHz clock of the 25W TDP Xeon E3-1230L v3 may seem pretty low, but in reality the chip clocks at 2.3GHz and more. Single-threaded performance is even better with a top speed of 2.8GHz. The same is true for the Xeon E3-1265L v2, which has an even greater delta between the advertised clock speed (2.5GHz) and the actual clock speed (3.1 – 3.4GHz) when we run our benchmarks.
47 Comments
View All Comments
JohanAnandtech - Tuesday, March 10, 2015 - link
Are you sure this is up to date? gcc tells me -march=native is not supported.JohanAnandtech - Tuesday, March 10, 2015 - link
Update. march=native does not work. I have tried -march=armv8-a but does not do much (it is probably the default). O3 makes the biggest difference. Omit it and you get 5.7 GB/s. With -O3, I am at 18 GB/s and more (stream m400)Alone-in-the-net - Tuesday, March 10, 2015 - link
Apologies. For AArch64 the only is "armv8-a", for intel, -march=native sets it to use the one for your CPU.https://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/AArch...
https://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/i386-...
From version 4.9.x and above of GCC, you can really start to add tuning for the CPU.
https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/AArch...
-mtune=name
Specify the name of the target processor for which GCC should tune the performance of the code. Permissible values for this option are: ‘generic’, ‘cortex-a53’, ‘cortex-a57’.
Additionally, this option can specify that GCC should tune the performance of the code for a big.LITTLE system. The only permissible value is ‘cortex-a57.cortex-a53’.
Where none of -mtune=, -mcpu= or -march= are specified, the code will be tuned to perform well across a range of target processors.
Alone-in-the-net - Tuesday, March 10, 2015 - link
Also support for the XGene1 as a compilation target is only from GCC5.https://gcc.gnu.org/gcc-5/changes.html
Support has been added for the following processors (GCC identifiers in parentheses): ARM Cortex-A72 (cortex-a72) and initial support for its big.LITTLE combination with the ARM Cortex-A53 (cortex-a72.cortex-a53), Cavium ThunderX (thunderx), Applied Micro X-Gene 1 (xgene1). The GCC identifiers can be used as arguments to the -mcpu or -mtune options, for example: -mcpu=xgene1
The_Assimilator - Monday, March 9, 2015 - link
So AMD, how's that bet on ARM you made looking now?extide - Monday, March 9, 2015 - link
Don't count them out yet. I really wish that intel didn't abandon ARM for the Atom, I bet they could come out with a sweet armv8 core if they had to, and on their process it would be sweet.BlueBlazer - Monday, March 9, 2015 - link
That AMD Opteron A1100 looking more like abandonware as more time passes on, and that was like 8 months ago. Until now not a single real world deployment nor was used in any of AMD's own SeaMicro servers. Currently available as development kit with a rather steep price tag.tuxRoller - Monday, March 9, 2015 - link
You REALLY should be using GCC 5. that includes many improvements for the armv8 isa. I'd suggest grabbing a nightly of Fedora 22, but Ubuntu 15.04 may be using gcc5 as well.Wilco1 - Monday, March 9, 2015 - link
Agreed, nobody doing anything on AArch64 should contemplate using GCC4.8. Even 4.9 is way out of date. GCC5.0 with latest GLIBC gives major speedups across the board.JohanAnandtech - Tuesday, March 10, 2015 - link
"Way out of date?" We tried out 4.9.2, which has been released on October 30th 2014. That is about 4 months old. https://www.gnu.org/software/gcc/releases.html. Latest version is 4.8.4, 5.0 has not even been released AFAIK.