The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads

Name: The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads
Item: The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads
Author: Johan De Gelas

by Johan De Gelas on March 31, 2016 12:30 PM EST

112 Comments | Add A Comment

112 Comments

Memory Subsystem: Bandwidth

For this review we completely overhauled our testing of John McCalpin's Stream bandwidth benchmark. We compiled the stream 5.10 source code with the Intel compiler for linux version 16 or gcc 4.8.4, both 64 bit. The following compiler switches were used on icc:

-fast -openmp -parallel

The results are expressed in GB per second. The following compiler switches were used on gcc:

-O3 –fopenmp –static

Stream allows us to estimate the maximum performance increase that DDR-2400 (Xeon E5 v4) can offer over DDR-2133 (Xeon E5 v3).

Stream Triad

The Xeon E5 v4 with DDR4-2400 delivers about 15% higher performance then the v3 when we compile Stream with icc. To put this into perspective: DDR-4 @ 1600 delivered 80 GB/s.

The difference between DDR-4 2400 and DDR-4 2133 is negligible with gcc.

Memory Subsystem: Latency

To measure latency, we use the open source TinyMemBench benchmark. The source was compiled for x86 with gcc 4.8.2 and optimization was set to "-O2". The measurement is described well by the manual of TinyMemBench:

Average time is measured for random memory accesses in the buffers of different sizes. The larger the buffer, the more significant the relative contributions of TLB, L1/L2 cache misses, and DRAM accesses become. All the numbers represent extra time, which needs to be added to L1 cache latency (4 cycles).

We tested with dual random read, as we wanted to see how the memory system coped with multiple read requests.

The larger the L3 caches get, the higher the latency. Latency has almost doubled from the Xeon E5 v1 to the Xeon E5 v4 while capacity has almost tripled (55 MB vs 20 MB). Still, this will result in a small performance hit in many non-virtualized applications that do no need such a large L3.

Single Core Integer Performance With SPEC CPU2006 Multi-Threaded Integer Performance

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

112 Comments

View All Comments

Casper42 - Thursday, March 31, 2016 - link
HPE just dropped the 64GB LRDIMMs a week or two back.
They are now exactly 2x the 32GB LRDIMM as far as List Price goes.
LRDIMMs across the board are 31% more expensive than RDIMMs.
wishgranter - Tuesday, April 5, 2016 - link
http://www.techpowerup.com/221459/samsung-starts-m...
wishgranter - Tuesday, April 5, 2016 - link
While introducing a wide array of 10nm-class DDR4 modules with capacities ranging from 4GB for notebook PCs to 128GB for enterprise servers, Samsung will be extending its 20nm DRAM line-up with its new 10nm-class DRAM portfolio throughout the year.
nathanddrews - Thursday, March 31, 2016 - link
Perf/W is obviously a very exciting metric for server farmers and it generally exciting from a basic technology perspective, but it's absolute performance isn't amazing. Anyway, it's not like I'll be buying one anyway. LOL
asendra - Thursday, March 31, 2016 - link
This interest me in so far as this would be the updated processors in a supposedly-coming-this-year Mac Pro refresh. Not that I would personally fork that much cash, but I'm interested to see how much of a jump they will make.

But things seam rather bleak. No wonder they decided to wait 3 years for a refresh.
MrSpadge - Thursday, March 31, 2016 - link
Not sure which years you're counting in, but for the majority of us it takes 1.5 years from 09/2014 to today.
https://en.wikipedia.org/wiki/Haswell_%28microarch...
asendra - Thursday, March 31, 2016 - link
Apple didn't update the MacPros with Haswell-EP. They are still using Ivy Bridge
tipoo - Thursday, March 31, 2016 - link

Wonder what they'll do on the GPU side though. Too early for next generation 14nm FF GPUs from anyone, if Nvidia was even a choice due to OpenCL politics. Another GCN 1.0 part in 2016 would be...A bag of hurt.

Still waiting on the high end 15" rMBP to have something better than GCN 1.0...The performance, shockingly, hasn't come all that far from even my Iris Pro model. Maybe double, which is something, but I'd like larger than that to upgrade from integrated...
extide - Thursday, March 31, 2016 - link
Nah, if they refresh it late this year, like in august or something like that, then 14/16nm FF GPU's will be available.

At worst we would get GCN 1.2, but yeah it would suck to see 28nm GPU's put in there...
mdriftmeyer - Thursday, March 31, 2016 - link
On what planet do you not grasp FinFET 14nm end of June from AMD?

The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads

Memory Subsystem: Bandwidth

Memory Subsystem: Latency

Post Your Comment

112 Comments

View All Comments

Casper42 - Thursday, March 31, 2016 - link

wishgranter - Tuesday, April 5, 2016 - link

wishgranter - Tuesday, April 5, 2016 - link

nathanddrews - Thursday, March 31, 2016 - link

asendra - Thursday, March 31, 2016 - link

MrSpadge - Thursday, March 31, 2016 - link

asendra - Thursday, March 31, 2016 - link

tipoo - Thursday, March 31, 2016 - link

extide - Thursday, March 31, 2016 - link

mdriftmeyer - Thursday, March 31, 2016 - link

Log in

Don't have an account? Sign up now