The Intel Xeon E7 v2 Review: Quad Socket, Up to 60 Cores/120 Threads

Name: The Intel Xeon E7 v2 Review: Quad Socket, Up to 60 Cores/120 Threads
Item: The Intel Xeon E7 v2 Review: Quad Socket, Up to 60 Cores/120 Threads
Author: Johan De Gelas

by Johan De Gelas on February 21, 2014 6:00 AM EST

125 Comments | Add A Comment

125 Comments

OpenFoam

Several of our readers have already suggested that we look into OpenFoam. That's easier said than done, as good benchmarking means you have to master the sofware somewhat. Luckily, my lab was able to work with the professionals of Actiflow. Actiflow specialises in combining aerodynamics and product design. Calculating aerodynamics involves the use of CFD software, and Actiflow uses OpenFoam to accomplish this. To give you an idea what these skilled engineers can do, they worked with Ferrari to improve the underbody airflow of the Ferrari 599 and increase its downforce.

The Ferrari 599: an improved product thanks to Openfoam.

We were allowed to use one of their test cases as a benchmark, but we are not allowed to discuss the specific solver. All tests were done on OpenFoam 2.2.1 and openmpi-1.6.3.

Many CFD calculations do not scale well on clusters, unless you use InfiniBand. InfiniBand switches are quite expensive and even then there are limits to scaling. We do not have an InfiniBand switch in the lab, unfortunately. Although it's not as low latency as InfiniBand, we do have a good 10G Ethernet infrastructure, which performs rather well.

So we added a fifth configuration to our testing: the quad-node Intel Server System H2200JF. The only CPU that we have eight of right now is the Xeon E5-2650L 1.8GHz. Yes, it is not perfect, but this is the start of our first clustered HPC benchmark. This way we can get an of idea whether or not the Xeon E7 v2 platform can replace a complete quad-node cluster system and at the same time offer much higher RAM capacity.

OpenFoam test

The results are pretty amazing: the quad Xeon E7-4980 v2 runs circles around our quad-node HPC cluster. Even if we were to outfit it with 50% higher clocked Xeons, the quad Xeon E7 v2 would still be the winner. Of course, there is no denying that our quad-node cluster is a lot cheaper to buy. Even with an InfiniBand switch, an HPC cluster with dual socket servers is a lot cheaper than a quad socket Intel Xeon E7 v2.

However, this bodes well for the soon to be released Xeon E5-46xx v2 parts. QPI links are even lower latency than InfiniBand. But since we do not have a lot of HPC testing experience, we'll leave it up to our readers to discuss this in more detail.

Another interesting detail is that the Xeon 2650L at 1.8GHz is about twice as fast as a Xeon L5650. We found AVX code inside OpenFoam 2.2.1, so we assume that this is one of the cases where AVX improves FP performance tremendously. Seasoned OpenFoam users, let us know whether is the accurate assessment.

SAP S&D Benchmark Conclusion

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

125 Comments

View All Comments

Kevin G - Monday, February 24, 2014 - link
With POWER8 due out later this year, I suspect they'll be updating their old benchmarks with the newer hardware.

The real question is why hasn't IBM ever submitted benchmarks for their z-series mainframes? Performance data there is very lacking. Though z-series costumers tend to fall into two groups: legacy mainframe applications and those who desire ultimate RAS regardless of the performance.
Phil_Oracle - Tuesday, February 25, 2014 - link
Yes, we shall see what Power8 delivers and when.. Its already a year late according to IBM's "3-year cadence". Power7 is 4 years old this month! As for Mainframe, its not about performance, it’s about uptime but at some point, you can get uptime through clustering and redundancy and then performance becomes the issue. We once did a POC comparing performance of latest Mainframe vs SPARC M6 and we estimated SPARC M6-32 to be 2-3x higher MIPs! as you can imagine, customer is migrating.
Kevin G - Tuesday, February 25, 2014 - link
Everyone has been suffering delays with chips it seems. Intel even with their process advantage looks to be a 9 month to a year beyond schedule for their 14 nm roll out. IBM/TSMC/GF/Samsung are similarly behind in their roll out of 22/20 nm class logic.

There has been a desire for ages to get off of mainframes in some industries. Reliability is 'good enough' and performance is better but the reason some don't migrate is simply software costs. I used to work in such a shop and the mainframe hanged around due to the extensive cost of porting and validating all the legacy software. Also 'if it ain't broke, don't fix it' was a theme at that place and well, the mainframe was never broken. I figure that many main frame shops fall into that category.

A decked out M6-32 out running a mainframe in some tests by 2x within reason for some CPU tests. I'm more curious as to what specific workloads they were. In IO bound tests, the mainframe is still competitive due to raw amount of coprocessors and dedicated hardware thrown into the niche. Flash in the enterprise have helped narrowed the IO gap significantly but I don't think it has managed to surpass the ancient mainframe architecture.
PowerTrumps - Monday, February 24, 2014 - link
Probably because most of their numbers have held up by and large to the competition. Unlike Sun SPARC and now Oracle SPARC which had disappeared from the benchmark scene for years with T1-T3 and most Fujitsu based servers. Oracle had cherry picked obscure benchmarks with T4 and now with T5 they have had a lot to make up. So, although you make it sound impressive let's not forget the past and the gap that needed to be filled.
Phil_Oracle - Tuesday, February 25, 2014 - link
I'm a 15 yr Sun veteran now at Oracle so yes, I agree that in past, with older generation SPARC, especially the first generation T-Series, Sun only benchmarked where the T-Series did well and avoided benchmarks where it didn't as it was designed for web tier workloads. That was 5 generations ago! But that’s my point. A vendor isn't going to publish a poor or worse looking result that previous version so every vendor "cherry picks" as you say, Not having a benchmark tells me that either the previous version is better, new version isn't that much better or its worse (whether in throughput, per/core, etc). In any case, the more benchmarks, the better sign that its leading.. And while SPARC T4 was really the first Oracle SPARC developed processor, it caught up to competing CPUs, and with SPARC T5 and even SPARC M6, its hard to argue that SPARC T5 is not leading. With 16 x cores, 8 x threads/core @ 3.6GHz, and glue less scalability to 8-sockets, and SPARC M6 @ 12-cores, 8 x threads/core up to 32-sockets and now almost a year old, Intels latest Xeon Ivybridge-EX has finally caught up, but in certain areas, like DB and middleware performance, still lacking in benchmark proof points to show its superior. And as for Power8, well, we'll just have to wait and see what the systems will deliver and when. Clearly they are aiming at SPARC for high end, now that Itanium is all but dead, and on entry-mid range, competing against Xeon.
thunng8 - Friday, February 21, 2014 - link
Great for intel that they have finally marginally overtaken a several year old IBM box in the sap sd benchmark. Only trouble is the 2.5x faster POWER8 (compared to POWER7) is coming in the next few months.
extide - Friday, February 21, 2014 - link
Keep in mind that IBM POWER Chips are typically 200-250W TDP Chips. So yeah, on a performance per watt scale, these are quite impressive!
Kevin G - Friday, February 21, 2014 - link
POWER7 is 200W and POWER7+ is 180W. Still higher than Intel but not as bad as you'd think.
JohanAnandtech - Saturday, February 22, 2014 - link
Do you have a source for that? It is pretty hard to find good info on those CPUs. Or I have missed it somehow.
Kevin G - Saturday, February 22, 2014 - link
IBM, like Intel, bins chips by power consumption. It looks like there are indeed 250W POWER7's but they do scale down to 150W.

800W MCM for super computing, 200W POWER7 die @ 3.83 Ghz
http://www.theregister.co.uk/Print/2009/11/27/ibm_...
The final shipping speed was 3.83 Ghz which falls into the 3.5 to 4.0 Ghz range target in the article.

250W for high end boxes & 150W for blade systems:
http://www.realworldtech.com/forum/?threadid=12393...
Note that this was an early IBM paper and that 300W per socket figure could have been provisioning for future dual die POWER7+ modules

250W for POWER7 @ 4.0 Ghz and 250W for POWER7+ @ 4.5 Ghz:
http://www-05.ibm.com/cz/events/febannouncement201...

I'm trying to find the source to the 180W POWER7+ figure. The difficulty is that it appeared in a discussion about Intel's Poulson Itanium which consumes 10W less.

The Intel Xeon E7 v2 Review: Quad Socket, Up to 60 Cores/120 Threads

OpenFoam

Post Your Comment

125 Comments

View All Comments

Kevin G - Monday, February 24, 2014 - link

Phil_Oracle - Tuesday, February 25, 2014 - link

Kevin G - Tuesday, February 25, 2014 - link

PowerTrumps - Monday, February 24, 2014 - link

Phil_Oracle - Tuesday, February 25, 2014 - link

thunng8 - Friday, February 21, 2014 - link

extide - Friday, February 21, 2014 - link

Kevin G - Friday, February 21, 2014 - link

JohanAnandtech - Saturday, February 22, 2014 - link

Kevin G - Saturday, February 22, 2014 - link

Log in

Don't have an account? Sign up now