Memory Subsystem: Bandwidth

As we mentioned before, the IBM POWER8 has a memory subsystem which is more similar to the Xeon E7's than the E5's. The IBM POWER8 connects to 4 "Centaur" buffer cache chips, which have both a 19.2 GB/s read and 9.6 GB/s write link to the processor, or 28.8 GB/s in total. So the 105 GB/s aggregate bandwidth of the POWER8 is not comparable to Intel's peak bandwidth. Intel's peak bandwidth is the result of 4 channels of DDR4-2400 that can either write or read at 76.8 GB/s (2.4 GHz x 8 bytes per channel x 4 channels).

Bandwidth is of course measured with John McCalpin's Stream bandwidth benchmark. We compiled the stream 5.10 source code with gcc 5.2.1 64 bit. The following compiler switches were used on gcc:

-Ofast -fopenmp -static -DSTREAM_ARRAY_SIZE=120000000

The latter option makes sure that stream tests with array sizes which are not cacheable by the Xeons' huge L3 caches.

It is important to note why we use the GCC compiler and not vendors' specialized compilers: the GCC compiler is not as good at vectorizing the code. Intel's ICC compiler does that very well, and as result shows the bandwidth available to highly optimized HPC code, which is great for that code in the real world, but it's not realistic for multi-threaded server applications.

With ICC, Intel can use the very wide 256-bit load units to their full potential and we measured up to 65 GB/s per socket. But you also have to consider that ICC is not free, and GCC is much easier to integrate and automate into the daily operations of any developer. No licensing headaches, no time consuming registrations.

Stream Triad (GCC)

The combination of the powerful four load and two store subsystem of the POWER8 and the read/write interconnect between the CPU and the Centaur chips makes it much easier to offer more bandwidth. The IBM POWER8 delivers a solid 90 GB/s despite using old DDR3-1333 memory technology.

Intel claims higher bandwidth numbers, but those numbers can only be delivered in vectorized software.

Configuration and Benchmark Selection Memory Subsystem: Latency Measurements
Comments Locked


View All Comments

  • JohanAnandtech - Thursday, July 28, 2016 - link

    Ah, you will have to wait for the improved P8 which is the first Power going after HPC :-)
  • RISC is RISKY! - Tuesday, August 2, 2016 - link

    I would support "Brutalizer". Every processor has its strength and weakness. If memory architecture is considered, for the same capacity, Intel is conjested memory, IBM is very distributed and Oracle-Sun is something in between. So Intel will always have memory B/W problem every way. IBM has memory efficiency problem. Oracle in theory doesn't have problem, but with 2 dimm per ch, that look like have problem. Oracle-Sun is for highly branched workload in the real world. Intel is for 1T/Core more of single threaded workloads and IBM is for mixed workloads with 2T-4T/Core priority. So supercomputing workloads will work fast on IBM now, compared to intel and sparc, while analytics and graph and other distributed will work faster on SPARC M7 and S7 (although S7 is resource limited). While for intel, a soft mix of applications and highly customized os is better. Leave the business decisions and the sales price. List prices are twice as much as sales price in the real world. These three processors (xeon e5v4, power8-9, sparc m7-s7) are thoroughly tuned for different work spaces with very little overlap. So there's no point in comparing them other than their specs. Its like comparing a falcon and a lion and a swordfish. Their environments are different even though all of them hunt. Thats in the real world. So benchmarks are not the real proof. We at the university of IITD have lots and lots of intel xeon e5v4, some P8 (10-15 single and dual sockets), and a very few (1-2 two socket M7 and 2 two socket S7). We run anything and every thing on any of these, we get our hands on. And this is the real world conclusion. So don't fight. Its a context centric supply.
  • RISC is RISKY! - Tuesday, August 2, 2016 - link

    of processors!
  • rootvgnet - Friday, August 12, 2016 - link

    Johan - interesting article, I enjoyed it - especially after I discovered how to get to the next page.

    As far as the comments go - 1) a good article will get a diverse response (from those with an open, read querying, mind.
    2) I agree with those who, in other words are saying: "there is no 'one size fits all'." And my gut reaction is that you are providing a level of detail that assists in determining which platform/processor "fits my need"

    Looking forward to part2.

Log in

Don't have an account? Sign up now