X-Gene 1, Atom C2000 and Xeon E3: Exploring the Scale-Out Server Worldby Johan De Gelas on March 9, 2015 2:00 PM EST
Memory Subsystem Bandwidth
While the Xeon E5 has ample bandwidth for most applications courtesy of the massive quad-channel memory subsystem, the Xeon E3 and Atom C2000 only have two memory channels. The Xeon E3 and Atom C2000 also do not support the fastest DRAM modules (DDR3-1600, Xeon E5: DDR4-2133), so memory bandwidth can be a problem for some applications.
We measured the memory bandwidth in Linux. The binary was compiled with the Open64 compiler 5.0 (Opencc). It is a multi-threaded, OpenMP based, 64-bit binary. The following compiler switches were used:
-Ofast -mp -ipa
To keep things simple, we only report the Triad sub-benchmark of our OpenMP enabled Stream benchmark.
First of all, we should note that the clock speed of the CPU has very little influence on the Stream score. Notice the small difference (12.7%) between the Xeon E3-1240 that can boost to 3.6GHz and the Xeon E3-1230L that is limited to 2.3GHz (Turbo Boost with four cores busy).
The Xeon E3-1200 v3 is slightly more efficient than the Xeon E3-1200 v2; we measured a 7% bandwidth improvement. The Xeon E3 also offers up to 33% more bandwidth than the Atom C2750 with the same DIMMs.
To do an apples-to-apples comparison with the X-Gene 1, we compiled the same OpenMP enabled Stream benchmark (O3 –fopenmp –static).
The Xeon E3 has the most efficient memory controller: it can extract almost as much bandwidth as the quad-channel memory controller of the X-Gene and about 46% more than the Atom. Our guess is that the X-Gene still has quite a bit of headroom to improve the memory subsystem. There is work to be done on the compiler side and on the hardware.