Memory Subsystem: Bandwidth

For this review we completely overhauled our testing of John McCalpin's Stream bandwidth benchmark. We compiled the stream 5.10 source code with the Intel compiler for linux version 16 or gcc 4.8.4, both 64 bit. The following compiler switches were used on icc:

 -fast  -openmp  -parallel

The results are expressed in GB per second. The following compiler switches were used on gcc:

-O3 –fopenmp –static

Stream allows us to estimate the maximum performance increase that DDR-2400 (Xeon E5 v4) can offer over DDR-2133 (Xeon E5 v3). 

Stream Triad

The Xeon E5 v4 with DDR4-2400 delivers about 15% higher performance then the v3 when we compile Stream with icc. To put this into perspective: DDR-4 @ 1600 delivered 80 GB/s. 

The difference between DDR-4 2400 and DDR-4 2133 is negligible with gcc.  

Memory Subsystem: Latency

To measure latency, we use the open source TinyMemBench benchmark. The source was compiled for x86 with gcc 4.8.2 and optimization was set to "-O2". The measurement is described well by the manual of TinyMemBench:

Average time is measured for random memory accesses in the buffers of different sizes. The larger the buffer, the more significant the relative contributions of TLB, L1/L2 cache misses, and DRAM accesses become. All the numbers represent extra time, which needs to be added to L1 cache latency (4 cycles).

We tested with dual random read, as we wanted to see how the memory system coped with multiple read requests. 

The larger the L3 caches get, the higher the latency. Latency has almost doubled from the Xeon E5 v1 to the Xeon E5 v4 while capacity has almost tripled (55 MB vs 20 MB). Still, this will result in a small performance hit in many non-virtualized applications that do no need such a large L3. 

Single Core Integer Performance With SPEC CPU2006 Multi-Threaded Integer Performance
Comments Locked

112 Comments

View All Comments

  • jhh - Thursday, March 31, 2016 - link

    The article says TSX-NI is supported on the E5, but if one looks at Intel ARK, it say it's not. Do the processors say they support TSX-NI? Or is this another one of the things which will be left for the E7?
  • JohanAnandtech - Friday, April 1, 2016 - link

    Intel's official slides say: "supports TSX". All SKUs, no exceptions.
  • Oxford Guy - Thursday, March 31, 2016 - link

    Bigger, badder, still obsolete cores.
  • patrickjp93 - Friday, April 1, 2016 - link

    Obsolete? Troll.
  • Oxford Guy - Tuesday, April 5, 2016 - link

    Unlike you, propagandist, I know what Skylake is.
  • benzosaurus - Thursday, March 31, 2016 - link

    "You can replace a dual Xeon 5680 with one Xeon E5-2699 v4 and almost double your performance while halving the CPU power consumption."

    I mean you can, but you can buy 4 X5680s for a quarter the price of a single E5-2699v4. It takes a lot of power savings to make that worthwhile. The pricing in the server market's always seemed weirdly non-linear to me.
  • warreo - Friday, April 1, 2016 - link

    Presumably, it's not just about TCO. Space is at a premium in a datacenter, and so being able to fit more performance per sq ft also warrants a higher price, just like how notebook parts have historically been more expensive than their desktop equivalents.
  • ShieTar - Friday, April 1, 2016 - link

    But you don't get 4 1366-Systems for the price of one 2011-3 System. Depending on your Memory, Storage and Interconnect Needs, even two full Systems based on the Xeon 5680 may cost you more than one system based on the E5-2699 v4. One less Infiniband-Adapter can easily save you 500$ in Hardware.

    And you are not only halving the CPU power consumption, but also the power consumption of the rest of the system that you no longer use, so instead of 140W you are saving probably at least 200W per System, which can already add up to more than 1k$ in electricity and cooling bills for a 24/7 machine running for 3 years.

    And last, but by no means least, less parts means less space, less chance for failure, less maintenance effort. If you happily waste a few hours here or there to maintain your own workstation, you don't do the math, but if you have to pay somebody to do it, salaries matter quickly. With an MTBF for an entire server rarely being much higher than 40.000, and recovery/repair easily taking you a person-day of work, each system generates about 1.7 hours of work per year. Cost of work (it's more than salaries, of course) probably comes up to 100$ for a skilled technical administrator, thus producing another 500$ over 3 years of added operational cost.

    And of course, space matters as well. If your data center is filled, it can be more cost effective to replace the old CPUs with new expensive ones, rather than build a new facility to fill with more old Systems.

    If you add it all up, I doubt you can get a System with an Xeon 5680 and operate it over 3 years for anything below 20.000$. So going from two 20.000$-Systems to a single 24.000$ Dollar System (because of an extra 4000$ for the big CPU) should save you a lot of money in the long run.
  • JohanAnandtech - Friday, April 1, 2016 - link

    Where do you get your pricing info from? I can not imagine that server vendors still sell X5680s.
  • extide - Friday, April 1, 2016 - link

    Yeah, if you go used. No enterprise sysadmin worth his salt is ever going to put used gear that is not in warranty, and in support into production.

Log in

Don't have an account? Sign up now