The Xeon E5-2600: Dual Sandy Bridge for Servers
by Johan De Gelas on March 6, 2012 9:27 AM EST- Posted in
- IT Computing
- Virtualization
- Opteron
- Xeon
- Cloud Computing
LS-DYNA
LS-DYNA is a "general purpose structural and fluid analysis simulation software package capable of simulating complex real world problems", developed by the Livermore Software Technology Corporation (LSTC). It is used by the automobile, aerospace, construction, military, manufacturing and bioengineering industry. Even simple simulations take hours to complete, so even a small performance increase results in tangible savings. Add to that that many of our readers have been asking that we perform some benchmarking with HPC workloads. So reasons enough to include our own LS-DYNA benchmarking.
These numbers are not directly comparable with AMD's and Intel's benchmarks as we did not perform any special tuning besides using the message passing interface (MPI) version of LS-DYNA ( ls971_mpp_hpmpi ) to run the LS-DYNA solver to get maximum scalability. This is HP-MPI version of LS-DYNA 9.71.
Our first test is a refined revised Neon crash test simulation.

This is one of the few benchmarks (besides SAP) where the Opteron 6276 outperforms the older Opteron 6174 by a tangible margin (about 20% faster) and is significantly faster than the Xeon 5600, by 40% to be more precise. However, the direct competitor of the 6276, the Xeon E5-2630, will do a bit better (see the E5-2660 6C score). When you are aiming for the best performance, it is impossible to beat the best Xeons: the Xeon E5-2660 offers 26% better performance, the 2690 is 46% faster. It is interesting to note that LS-Dyna does not scale well with clockspeed: the 32% higher clockspeed of the Xeon E5-2690 results in only a 15% speed increase.
A few other interesting things to note: we saw only a very smal performance increase (+5%) due to Hyperthreading. Memory bandwidth does not seem to be critical either, as performance increased by only 6% when we replaced DDR3-1333 with DDR3-1600. If LS-Dyna was bottlenecked severely by the memory speed we should have seen a performance increase close to 20% (1600 vs 1333).
CMT boosted the Opteron 6276's performance by up to 33%, which seems weird at first since LS-DYNA is a typical floating point intensive application. As the shared floating point "outsources" load and stores to the integer cores, the most logical explanation is that LS-DYNA is limited by the load/store bandwidth. This is in sharp contrast with for example 3DS Max where the additional overhead of 16 extra threads slowed the shared FP down instead of speeding it up.
Also, both CPUs seem to have made good use of their turbo capabilities. The AMD Opteron was running at 2.6 GHz most of the time, the Xeon 2690 at 3.3 GHz and the Xeon 2660 at 2.6 GHz.
The second test is the "Three Vehicle Collision Test" simulation, which runs a lot longer.

The three vehicle collision test does not change the benchmarking picture, it confirms our early findings. The Opteron Interlagos does well, but the Xeon E5 is the new HPC champion.

65 Comments
View All Comments
fredisdead - Saturday, April 07, 2012 - link
From the 'article' .....'The Opteron might also have a role in the low end, price sensitive HPC market, where it still performs very well. It won't have much of chance in the high end clustered one as Intel has the faster and more power efficient PCIe interface'
Well, if that's the case, why exactly would AMD be scoring so many design wins with Interlagos. Including this one ...
http://www.pcmag.com/article2/0,2817,2394515,00.as...
http://www.eweek.com/c/a/IT-Infrastructure/Cray-Ti...
U think those guys at Cray were going for low performance ? In fact, seems like AMD has being rather cleaning up in the HPC market since the arrival of Interlagos. And the markets have picked up on it, AMD stock is thru the roof since the start of the year. Or just see how many Intel processors occupy the the top 10 supercomputers on the planet. Nuff said ... Reply
InsaneScientist - Wednesday, March 07, 2012 - link
Johan, where in the specs where you have this line:Transistors (Billion) 2,26 2x 1,2 2x 904 1,17
I sure hope that 2x 904 (Billion) is a typo... otherwise AMD has some serious explaining to do. ;)
Should be 2x ,904 (I think? Would be 2x .904 for me, I assume you follow the same rules...) Reply
iliev - Wednesday, March 07, 2012 - link
Page 5, Benchmark ConfigurationR2208GZ4GSSPP specs table... E5-2660 is 2.2Ghz, and not 2.9GHz Reply
dodge776 - Wednesday, March 07, 2012 - link
Hi Johan,Always look forward to reading your server reviews at AT, but no SAPS benchmarks this time? Reply
ppennisi - Wednesday, March 07, 2012 - link
For maximum VMware performance on Opteron Interlagos cpu under VMWARE it's better to disable C1E and enable, where available, HPC mode.I found myself on a fresh installation of ESXi 5.0 on Dell R715 that leaving C1E enable literally crippled vm performance. Reply
boudini - Thursday, March 08, 2012 - link
I'm not sure I would recommend using iray as a reliable benchmark renderer in 3ds max. It is not a self configuring mental ray, but an unbiased renderer which behaves fairly differently to mental ray, and most other renderers such as vray, final render and brazil. It is comparible to maxwell and fryrender, but is very new compared to those two longer established unbiased render engines. It also attempts to use the gpu to add to its calculations as well - which could significantly skew results.Using mental ray or vray might well give you quite a different result, and besides I don't think iray is widely used in the industry. Reply
omega4711 - Friday, March 09, 2012 - link
This. The results of iray are mostly dependent on the GPU. The lack of proper scaling certainly isn't due to Amdahl's law. Just use mentalray with small enough render buckets and you can easily satisfy 64+ threads.Also, due to the limitations of iray, it can (at this moment) only be used in about 1-3% of real world scenarios.
Please, for all the people that care about these benchmarks, use mentalray and/or vray.
Otherwise, it's a brilliant article. Reply
silverblue - Thursday, March 08, 2012 - link
You've put that Interlagos has 4x2MB L2, but that would only be true for Valencia; Interlagos is 8x2MB. Replyaranyagag - Thursday, March 08, 2012 - link
you forgot the E5-2687W with a 150w tdp and higher speeds Replycolonelclaw - Friday, March 09, 2012 - link
Hi There,Thanks for an excellent article. With regards to the rendering benchmarks, would you consider using VRay as a rendering engine? It's fast becoming industry standard, is compatible with all the big hitters (Max, Maya, Softimage etc), is cross platform, and I believe, is incredibly well coded to scale with cores.
It's also incredibly popular, not something you could say about iRay right now. Reply