Memory Subsystem: Bandwidth

For this review we completely overhauled our testing of John McCalpin's Stream bandwidth benchmark. We compiled the stream 5.10 source code with the Intel compiler for linux version 16 or gcc 4.8.4, both 64 bit. The following compiler switches were used on icc:

 -fast  -openmp  -parallel

The results are expressed in GB per second. The following compiler switches were used on gcc:

-O3 –fopenmp –static

Stream allows us to estimate the maximum performance increase that DDR-2400 (Xeon E5 v4) can offer over DDR-2133 (Xeon E5 v3). 

Stream Triad

The Xeon E5 v4 with DDR4-2400 delivers about 15% higher performance then the v3 when we compile Stream with icc. To put this into perspective: DDR-4 @ 1600 delivered 80 GB/s. 

The difference between DDR-4 2400 and DDR-4 2133 is negligible with gcc.  

Memory Subsystem: Latency

To measure latency, we use the open source TinyMemBench benchmark. The source was compiled for x86 with gcc 4.8.2 and optimization was set to "-O2". The measurement is described well by the manual of TinyMemBench:

Average time is measured for random memory accesses in the buffers of different sizes. The larger the buffer, the more significant the relative contributions of TLB, L1/L2 cache misses, and DRAM accesses become. All the numbers represent extra time, which needs to be added to L1 cache latency (4 cycles).

We tested with dual random read, as we wanted to see how the memory system coped with multiple read requests. 

The larger the L3 caches get, the higher the latency. Latency has almost doubled from the Xeon E5 v1 to the Xeon E5 v4 while capacity has almost tripled (55 MB vs 20 MB). Still, this will result in a small performance hit in many non-virtualized applications that do no need such a large L3. 

Single Core Integer Performance With SPEC CPU2006 Multi-Threaded Integer Performance
Comments Locked

112 Comments

View All Comments

  • ltcommanderdata - Friday, April 1, 2016 - link

    Does anyone know the Windows support situation for Broadwell-EP for workstation use? Microsoft said Broadwell is the last fully supported processor for Windows 7/8.1 with Skylake getting transitional support and Kaby Lake will not be supported. So how does Broadwell-EP fit in? Is it lumped in with Broadwell and is fully supported or will it be treated like Skylake with temporary support until 2018 and only critical security updates after that? And following on will Skylake-EP see any Windows 7/8.1 support at all or will it not be supported since it'll presumably be released after Kaby Lake?
  • extide - Friday, April 1, 2016 - link

    When MS says they are not supporting Skylake on Windows 7 DOES NOT MEAN it won't work. It just means they are not going to add any specific support for that processor in the older OS's. They are not adding in the speed shift support, essentially.

    For some reason the press has not made this very clear, and many people are freaking out thinking that there will be a hard break here will stuff will straight up not work. That is not the case.

    Broadwell has no new OS level features over Haswell (unlike Skylake with speed shift) so there is nothing special about Broadwell to the OS. As the poster above mentions, they are all x86 cpu's and will all still work with x86 OS's.

    The difference here is between "Fully Supported" and Compatible. Skylake and even Kaby Lake will be compatible with WIndows 7/8/8.1.
  • aryonoco - Friday, April 1, 2016 - link

    Johan, this is yet again by far the best Enterprise CPU benchmark that's available anywhere on the net.

    Thank you for your detailed, scientific and well documented work. Works like this are not easy, I can only imagine how many man hours (weeks?) compiling this article must have taken. I just want you to know that it's hugely appreciated.
  • JohanAnandtech - Friday, April 1, 2016 - link

    Great to read this after weeks of hard work! :-D
  • fsdjmellisse - Friday, April 1, 2016 - link

    hello, i want to buy E5-2630L v4
    any one can give me website for buy it ?

    Best regards
  • HrD - Friday, April 1, 2016 - link

    I'm confused by the following:

    "The following compiler switches were used on icc:

    -fast -openmp -parallel

    The results are expressed in GB per second. The following compiler switches were used on icc:

    -O3 –fopenmp –static"

    Shouldn't one of these refer to icc and the other to gcc?
  • JohanAnandtech - Friday, April 1, 2016 - link

    Pretty sure I did not mix them up. "-fast" does not work on gcc neither does -fopenmp work on icc.
  • patrickjp93 - Friday, April 1, 2016 - link

    Um, wrong and wrong. -Ofast works with GCC 4.9 and later for sure. And -fopenmp is a valid ICC flag post-ICC 13.
  • JohanAnandtech - Saturday, April 2, 2016 - link

    "-fast" is a typical icc flag. (I did not write -"Ofast" that works on gcc 4.8 too)
  • extide - Friday, April 1, 2016 - link

    Johan, if you read the comment, you can see that you mention icc for BOTH.

Log in

Don't have an account? Sign up now