Intel Xeon E5 Version 3: Up to 18 Haswell EP Cores
by Johan De Gelas on September 8, 2014 12:30 PM ESTJava Server Performance
According to the documentation, the SPECjbb 2013 benchmark has "a usage model based on a world-wide supermarket company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases, and data-mining operations". It uses the latest Java 7 features and makes use of XML, compressed communication, and messaging with security. We tested with four groups of transaction injectors and backends.
Several readers commented that we should try to optimize for lower response times instead of just optimizing for maximum throughput, so we have changed our relatively basic tuning. We left out "+AggressiveOpts" as this is still somewhat a risk for stability and the performance does not increase tangibly, and we used "-XX:+AlwaysPreTouch". Also we are more generous with the amount of allocated memory. These results are thus no longer comparable to our previous results. Our full parameters are:
"-server -Xmx8G -Xms8G -Xmn4G -XX:+AlwaysPreTouch -XX:+UseLargePages"
With these settings, the benchmark takes about 47GB-52GB of RAM. The first metric is basically maximum throughput.
Our new tuning has resulted in higher results, and all of the new Xeon scale well. However, if you start looking at it from a performance/watt perspective, the results are good but not spectacular. The power consumption of the Xeon E5-2695 v3 is similar to the Xeon E5-2697 v2, and the former has a 13% performance advantage.
The Critical-jOPS metric, is a throughput metric under response time constraint (SLA).
With our new tuning, the critical jOPS make a lot more sense, so we believe we have taken a step forward. Notice that the Xeon E5-2695 v3, despite its clock speed disadvantage (2.3 at least, 2.8 at the most), is capable of keeping up with the Xeon E5-2697 v2 (2.7 at the least, 3GHz at the most). The improvements in Haswell are measureable.
However, it must be said that while this is a step forward if you're buying a server, it's not a large one. You get 13% more throughput and the same response time for a few hundred dollars less (Xeon E5-2695 v3 vs E5-2697 v2).
85 Comments
View All Comments
cmikeh2 - Monday, September 8, 2014 - link
In the SKU comparison table you have the E5-2690V2 listed as a 12/24 part when it is in fact a 10/20 part. Just a tiny quibble. Overall a fantastic read.KAlmquist - Monday, September 8, 2014 - link
Also, the 2637 v2 is 4/8, not 6/12.isa - Monday, September 8, 2014 - link
Looking forward to a new supercomputer record using these behemoths.Bruce Allen - Monday, September 8, 2014 - link
Awesome article. I'd love to see Cinebench and other applications tests. We do a lot of rendering (currently with older dual Xeons) and would love to compare these new Xeons versus the new 5960X chips - software license costs per computer are so high that the 5960X setups will need much higher price/performance to be worth it. We actually use Cinema 4D in production so those scores are relevant. We use V-Ray, Mental Ray and Arnold for Maya too but in general those track with the Cinebench scores so they are a decent guide. Thank you!Ian Cutress - Monday, September 8, 2014 - link
I've got some E5 v3 Xeons in for a more workstation oriented review. Look out for that soon :)fastgeek - Monday, September 8, 2014 - link
From my notes a while back... two E5-2690 v3's (all cores + turbo enabled) under 2012 Server yielded 3,129 for multithreaded and 79 for single.While not Haswell, I can tell you that four E5-4657L V2's returned 4,722 / 94 respectively.
Hope that helps somewhat. :-)
fastgeek - Monday, September 8, 2014 - link
I don't see a way to edit my previous comment; but those scores were from Cinebench R15wireframed - Saturday, September 20, 2014 - link
You pay for licenses for render Nodes? Switch to 3DS, and you get 9999 nodes for free (unless they changed the licensing since I last checked). :)Lone Ranger - Monday, September 8, 2014 - link
You make mention that the large core count chips are pretty good about raising their clock rate when only a few cores are active. Under Linux, what is the best way to see actual turbo frequencies? cpuinfo doesn't show live/actual clock rate.JohanAnandtech - Monday, September 8, 2014 - link
The best way to do this is using Intel's PCM. However, this does not work right now (only on Sandy and Ivy, not Haswel) . I deduced it from the fact that performance was almost identical and previous profiling of some of our benchmarks.