The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads
by Johan De Gelas on March 31, 2016 12:30 PM EST- Posted in
- CPUs
- Intel
- Xeon
- Enterprise
- Enterprise CPUs
- Broadwell
SAP S&D
The SAP S&D 2-Tier benchmark has always been one of my favorites. This is probably the most real world benchmark of all the server benchmarks done by the vendors. It is a full-blown application living on top of a heavy relational database. And don't forget that SAP is one of the most successful software companies out there, the market leader of Enterprise Resource Planning.
We analyzed the SAP Benchmark in-depth in one of our earlier articles:
- Very parallel resulting in excellent scaling
- Low to medium IPC, mostly due to "branchy" code
- Somewhat limited by memory bandwidth
- Likes large caches (memory latency)
- Very sensitive to sync ("cache coherency") latency
Let us see how the new Xeon E5 fares in this ERP benchmark.
(est) = Preliminary data
The ever-increasing L3 cache, high core counts, and better NUMA coherency support of Broadwell-EP play well with SAP. It is almost like Intel builds these Xeons for SAP alone. The result is that the current Xeon is no less than 3 times faster than the Xeon 2690 (v1).
112 Comments
View All Comments
SkipPerk - Friday, April 8, 2016 - link
"Anyone putting Microsoft on bare hardware these days is nuts"This brother is speakin the truth!
warreo - Thursday, March 31, 2016 - link
Can someone clarify this line for me?"The average performance increase versus the Xeon E5-2690 is 3%, and the Broadwell cores get a boost of no less than 19%."
Does that mean IPC increase is 19% for Broadwell, offset by ~16% decline in clockspeed to get to 3% average performance increase? But that doesn't make sense to me as a 3.8ghz (E5-2690) to 3.6ghz (E5-2699 v4) is only 5% decline in max clockspeed?
ShieTar - Thursday, March 31, 2016 - link
I understood it as "the -Ofast setting boosts Broadwell by 19%", so with the -O2 setting it was actually 16% slower than the 2690.And I think the AT-Theory based on the original measurements is that the 3.6GHz boost are not even held for a significant amount of time, so that Broadwell in reality comes with an even worse decline in clock speed.
warreo - Thursday, March 31, 2016 - link
Your interpretation makes much more sense than mine, but still doesn't quite add up. The improvement from using -Ofast vs. -O2 is 13% on average, and the lowest improvement is 4% on the xalancbmk, well below the "no less than 19%" quoted by Johan.Perhaps the rest of the disparity is normalizing for sustained clock speeds as you suspect? Johan is that correct?
Ryan Smith - Thursday, March 31, 2016 - link
I've reworded that passage to make it clearer. But ShieTar's interpretation was basically correct."Switching from -O2 to -Ofast improves Broadwell-EP's absolute performance by over 19%. Meanwhile the relative performance advantage versus the Xeon E5-2690 averages 3%. "
JohanAnandtech - Thursday, March 31, 2016 - link
That means that the -ofast has much more effect on the Broadwell. I mean by that that -ofast is 19% faster than -o2 on Broadwell, while it is 3% faster on Sandy Bridge. I assume that the older the architecture, the better the compiler is able to optimize it without special tricks.warreo - Friday, April 1, 2016 - link
Thanks for the clarification. Loved the review, great work Johan!Pinn - Thursday, March 31, 2016 - link
I'm still happy I went with the 6 core x99 over the 8 core. Massive core count is nice to see available, but I don't see the true value. Looks like you have to do the same rough math to see if the clock speed reduction is worth the core count.Oxford Guy - Tuesday, April 5, 2016 - link
Why would there be "true value" for six and not for eight?Pinn - Wednesday, April 6, 2016 - link
Single threaded workloads.