X-Gene 1, Atom C2000 and Xeon E3: Exploring the Scale-Out Server World
by Johan De Gelas on March 9, 2015 2:00 PM ESTSaving Power at Idle
Efficiency is very important in many scenarios, so let's start by checking out idle power consumption. We quickly realized that many servers would use simpler boards with much fewer chips than our ASUS P9D-MH, especially in micro servers. It is also clear that it is very hard to make a decent apples-to-apples comparison as the boards are very different. The Xeon 1200 V3 board (ASUS) is very feature rich, the Intel board of our Xeon E3 is simpler, and the board inside the HP m300 is bare bone.
But with some smart measurements and some deduction we can get there. By disabling SAS controllers and other features, we can determine how much a simpler board would consume, e.g. a Xeon E3 board similar to the one in the m300. To estimate the range and impact of the motherboard and other components, we also test the Xeon E3-1230L v3 in two other situations: running on the Supermicro board with cooling not included and on the feature rich ASUS P9D (a small fan is included here). You can find the results below.
(*) Calculated as if the Xeon E3 was run in an "m300-ish" board.
The Supermicro nodes are quite efficient, with less than 29W per node. We measured this by dividing the measurement of four nodes by four. However, out of the box the fans have a tendency to run at high RPM, resulting in a power consumption of 7W per node in idle and up to 10W per node under load.
The m400 cartridge has eight DIMMs (instead of four) and a 10 Gbit controller (Mellanox Connect-X3 Pro Dual 10 Gbe NIC, disabled). Those features will probably consume a few watts. But this is where reality and marketing collide. If you just read newsbits about the ARM ecosystem, it is all robust and mature: after all, ARM's 64-bit efforts started back in 2012. The reality is that building such an ecosystem takes a lot of time and effort. The ARM server software ecosystem is – understandably – nowhere near the maturity of x86. One peak at the ARM 64-bit kernel discussion and you'll see that there is a lot of work to be done: ACPI and PCIe support for example are still works in progress.
The X-Gene in the HP m400 cartridge runs on a patched kernel that is robust and stable. But even if we substract about 5W for the extra DIMMs and disabled 10GbE NIC, 32W is a lot more than what the Atom C2750 requires. When running idle, the Atom C2750, the four low voltage 8GB DDR3 DIMMs, the 120GB SSD, and the dual 1GbE controller need no more than 11W. Even if we take into account that the power consumption of fans is not included, it shows how well HP engineered these cartridges and how sophisticated the Intel power management is.
For your information, the m350 cartridge goes even lower: 21W for four nodes. Of course, these amazing power figures come with some hardware limitations (two DIMMs per node, only small M.2 flash storage available).
47 Comments
View All Comments
Wilco1 - Tuesday, March 10, 2015 - link
GCC4.9 doesn't contain all the work in GCC5.0 (close to final release, but you can build trunk). As you hinted in the article, it is early days for AArch64 support, so there is a huge difference between a 4.9 and 5.0 compiler, so 5.0 is what you'd use for benchmarking.JohanAnandtech - Tuesday, March 10, 2015 - link
You must realize that the situation in the ARM ecosystem is not as mature as on x86. the X-Gene runs on a specially patched kernel that has some decent support for ACPI, PCIe etc. If you do not use this kernel, you'll get in all kinds of hardware trouble. And afaik, gcc needs a certain version of the kernel.Wilco1 - Tuesday, March 10, 2015 - link
No you can use any newer GCC and GLIBC with an older kernel - that's the whole point of compatibility.Btw your results look wrong - X-Gene 1 scores much lower than Cortex-A15 on the single threaded LZMA tests (compare with results on http://www.7-cpu.com/). I'm wondering whether this is just due to using the wrong compiler/options, or running well below 2.4GHz somehow.
JohanAnandtech - Tuesday, March 10, 2015 - link
Hmm. the A57 scores 1500 at 1.9 GHz on compression. The X-Gene scores 1580 with Gcc 4.8 and 1670 with gcc 4.9. Our scores are on the low side, but it is not like they are impossibly low.Ubuntu 14.04, 3.13 kernel and gcc 4.8.2 was and is the standard environment that people will get on the the m400. You can tweak a lot, but that is not what most professionals will do. Then we can also have to start testing with icc on Intel. I am not convinced that the overall picture will change that much with lots of tweaking
Wilco1 - Tuesday, March 10, 2015 - link
Yes, and I'd expect the 7420 will do a lot better than the 5433. But the real surprise to me is that X-Gene 1 doesn't even beat the A15 in Tegra K1 despite being wider, newer and running at a higher frequency - that's why the results look too low.I wouldn't call upgrading to the latest compiler tweaking - for AArch64 that is kind of essential given it is early days and the rate of development is extremely high. If you tested 32-bit mode then I'd agree GCC 4.8 or 4.9 are fine.
CajunArson - Tuesday, March 10, 2015 - link
This is all part of the problem: Requiring people to use cutting edge software with custom recompilation just to beat a freakin' Atom much less a real CPU?You do realize that we could play the same game with all the Intel parts. Believe me, the people who constantly whine that Haswell isn't any faster than Sandy Bridge have never properly recompiled computationally intensive code to take advantage of AVX2 and FMA.
The fact that all those Intel servers were running software that was only compiled for a generic X86-64 target without requiring any special tweaking or exotic hacking is just another major advantage for Intel, not some "cheat".
Klimax - Tuesday, March 10, 2015 - link
And if we are going for cutting edge compiler, then why not ICC with Intel's nice libraries... (pretty sure even ancient atom would suddenly look not that bad)Wilco1 - Tuesday, March 10, 2015 - link
To make a fair comparison you'd either need to use the exact same compiler and options or go all out and allow people to write hand optimized assembler for the kernels.68k - Saturday, March 14, 2015 - link
You can't seriously claim that recompiling an existing program with a different (well known and mature) compiler is equal to hand optimize things in assembler. Hint, one of the options is ridiculous expensive, one is trivial.aryonoco - Monday, March 9, 2015 - link
Thank you Johan. Very very informative article. This is one of the least reported areas of IT in general, and one that I think is poised for significant uptake in the next 5 years or so.Very much appreciate your efforts into putting this together.