X-Gene 1, Atom C2000 and Xeon E3: Exploring the Scale-Out Server World
by Johan De Gelas on March 9, 2015 2:00 PM ESTQuick Overview of the SoCs
In this review, we compare four different SoCs:
- Intel's Xeon E3-1240 v3 3.4GHz
- Intel's Xeon E3-1230L v3 1.8GHz
- Intel's Xeon E3-1265L v2 2.5GHz
- Intel's Atom C2750 2.4GHz
- AppliedMicro's X-Gene 1 2.4GHz
We have discussed the Xeon E3-1200 v3, Atom C2000, and X-Gene in more detail in our previous article. What follows is a quick discussion of why we tested these specific SKUs.
The Intel Xeon E3-1240 v3 is a speedy (3.4GHz, eight threads) Xeon E3 that is still affordable and has a decent TDP (69W). If you want a 6% higher clock (3.6GHz), Intel charges you 2.3X more. The Xeon E3-1240 v3 has an excellent performance per dollar ratio.
The Xeon E3-1230L v3 paper specs are incredible: eight cores that can boost to up to 2.8GHz (with a base clock of 1.8GHz) and a very low TDP of 25W. To see how much progress Intel has made, we compare it with the 45W Intel E3-1265L v2 at 2.5GHz based on the Ivy Bridge core. Will the Haswell core be enough to overcome the 700MHz (1.8 vs 2.5GHz) lower clock speed, which is necessary to make the chip work with a very low 25W TDP? How does this very low power Xeon with the brawny core compare to the Atom C2750?
The Atom C2750 is Intel's fastest Atom-based Xeon. We are very curious to see if there are applications where the eight lean cores can outperform the four wide cores of the Xeon E3.
And last but not least, the X-Gene 2.4GHz, the first server SoC incarnation of the ARMv8-A or AArch64 instruction set. The X-Gene has twice as many memory channels and can support twice as many DIMM slots as its Intel competitors. The cache architecture is a mix of the Atom C2000 and Xeon E3. Just like the Atom, two cores share a smaller L2 cache (256KB vs 1MB). And like the Xeon E3 (and unlike the Atom C2000), the X-Gene also has access to and 8MB L3 cache. Less positive is the antiquated 40nm production process and the fact that power management is much less sophisticated than Intel's solutions. The result is a relatively high 40W TDP.
While not every application was available on the X-Gene, we gathered enough datapoints to do a meaningful comparison. Where will the first productized ARMv8 chip land? Will it be an Atom C2000 or Xeon E3 killer, or neither? What kind of applications run well, and what kind of applications are still running much faster on a x86 chip?
We've added a few CPUs/SoCs to further improve the comparison. We've thrown in the Atom N2800 to mimic one of the worst Intel server CPUs ever (well, maybe "Paxville MP" was worse), the Atom S1260. The Xeon X5470 ("Harpertown", Penryn architecture) is also featured just to satisfy our curiosity and show how much performance has evolved. To understand the performance of the different SoCs, we should also take into account that the Intel chips almost always run at a higher clock speed than the advertised clock speed, thanks to Turbo Boost.
Overview of Clock Speeds | ||||
SoC | Max. Turbo Boost | Turbo Boost with Two Cores |
Turbo Boost with All Cores |
TDP |
Xeon E3-1240v3 3.4 | 3800 | 3600 | 3600 | 80W |
Xeon E3-1230Lv3 1.8 | 2800 | 2300 | 2300 | 25W |
Xeon E3-1220v2 3.1 | 3500 | 3500 | 3300 | 69W |
Xeon E3-1265Lv2 2.5 | 3500 | 3400 | 3100 | 45W |
Atom C2750 2.4 | 2600 | 2600 | 2400 | 20W |
X-Gene 1 2.4 | N/A | N/A | 2400 | 40W |
The 1.8GHz clock of the 25W TDP Xeon E3-1230L v3 may seem pretty low, but in reality the chip clocks at 2.3GHz and more. Single-threaded performance is even better with a top speed of 2.8GHz. The same is true for the Xeon E3-1265L v2, which has an even greater delta between the advertised clock speed (2.5GHz) and the actual clock speed (3.1 – 3.4GHz) when we run our benchmarks.
47 Comments
View All Comments
Wilco1 - Tuesday, March 10, 2015 - link
GCC4.9 doesn't contain all the work in GCC5.0 (close to final release, but you can build trunk). As you hinted in the article, it is early days for AArch64 support, so there is a huge difference between a 4.9 and 5.0 compiler, so 5.0 is what you'd use for benchmarking.JohanAnandtech - Tuesday, March 10, 2015 - link
You must realize that the situation in the ARM ecosystem is not as mature as on x86. the X-Gene runs on a specially patched kernel that has some decent support for ACPI, PCIe etc. If you do not use this kernel, you'll get in all kinds of hardware trouble. And afaik, gcc needs a certain version of the kernel.Wilco1 - Tuesday, March 10, 2015 - link
No you can use any newer GCC and GLIBC with an older kernel - that's the whole point of compatibility.Btw your results look wrong - X-Gene 1 scores much lower than Cortex-A15 on the single threaded LZMA tests (compare with results on http://www.7-cpu.com/). I'm wondering whether this is just due to using the wrong compiler/options, or running well below 2.4GHz somehow.
JohanAnandtech - Tuesday, March 10, 2015 - link
Hmm. the A57 scores 1500 at 1.9 GHz on compression. The X-Gene scores 1580 with Gcc 4.8 and 1670 with gcc 4.9. Our scores are on the low side, but it is not like they are impossibly low.Ubuntu 14.04, 3.13 kernel and gcc 4.8.2 was and is the standard environment that people will get on the the m400. You can tweak a lot, but that is not what most professionals will do. Then we can also have to start testing with icc on Intel. I am not convinced that the overall picture will change that much with lots of tweaking
Wilco1 - Tuesday, March 10, 2015 - link
Yes, and I'd expect the 7420 will do a lot better than the 5433. But the real surprise to me is that X-Gene 1 doesn't even beat the A15 in Tegra K1 despite being wider, newer and running at a higher frequency - that's why the results look too low.I wouldn't call upgrading to the latest compiler tweaking - for AArch64 that is kind of essential given it is early days and the rate of development is extremely high. If you tested 32-bit mode then I'd agree GCC 4.8 or 4.9 are fine.
CajunArson - Tuesday, March 10, 2015 - link
This is all part of the problem: Requiring people to use cutting edge software with custom recompilation just to beat a freakin' Atom much less a real CPU?You do realize that we could play the same game with all the Intel parts. Believe me, the people who constantly whine that Haswell isn't any faster than Sandy Bridge have never properly recompiled computationally intensive code to take advantage of AVX2 and FMA.
The fact that all those Intel servers were running software that was only compiled for a generic X86-64 target without requiring any special tweaking or exotic hacking is just another major advantage for Intel, not some "cheat".
Klimax - Tuesday, March 10, 2015 - link
And if we are going for cutting edge compiler, then why not ICC with Intel's nice libraries... (pretty sure even ancient atom would suddenly look not that bad)Wilco1 - Tuesday, March 10, 2015 - link
To make a fair comparison you'd either need to use the exact same compiler and options or go all out and allow people to write hand optimized assembler for the kernels.68k - Saturday, March 14, 2015 - link
You can't seriously claim that recompiling an existing program with a different (well known and mature) compiler is equal to hand optimize things in assembler. Hint, one of the options is ridiculous expensive, one is trivial.aryonoco - Monday, March 9, 2015 - link
Thank you Johan. Very very informative article. This is one of the least reported areas of IT in general, and one that I think is poised for significant uptake in the next 5 years or so.Very much appreciate your efforts into putting this together.