LS-DYNA

LS-DYNA is a "general purpose structural and fluid analysis simulation software package capable of simulating complex real world problems", developed by the Livermore Software Technology Corporation (LSTC). It is used by the automobile, aerospace, construction, military, manufacturing and bioengineering industry. Even simple simulations take hours to complete, so even a small performance increase results in tangible savings. Add to that that many of our readers have been asking that we perform some benchmarking with HPC workloads. So reasons enough to include our own LS-DYNA benchmarking.

These numbers are not directly comparable with AMD's and Intel's benchmarks as we did not perform any special tuning besides using the message passing interface (MPI) version of LS-DYNA ( ls971_mpp_hpmpi ) to run the LS-DYNA solver to get maximum scalability. This is HP-MPI version of LS-DYNA 9.71.

Our first test is a refined revised  Neon crash test simulation.

LS-Dyna Neon-Refined Revised

This is one of the few benchmarks (besides SAP) where the Opteron 6276 outperforms the older Opteron 6174 by a tangible margin (about 20% faster) and is significantly faster than the Xeon 5600, by 40% to be more precise. However, the direct competitor of the 6276, the Xeon E5-2630, will do a bit better (see the E5-2660 6C score). When you are aiming for the best performance, it is impossible to beat the best Xeons: the Xeon E5-2660 offers 26% better performance, the 2690 is 46% faster. It is interesting to note that LS-Dyna does not scale well with clockspeed: the 32% higher clockspeed of the Xeon E5-2690 results in only a 15% speed increase.

A few other interesting things to note: we saw only a very smal performance increase (+5%) due to Hyperthreading. Memory bandwidth does not seem to be critical either, as performance increased by only 6% when we replaced DDR3-1333 with DDR3-1600. If LS-Dyna was bottlenecked severely by the memory speed we should have seen a performance increase close to 20% (1600 vs 1333).

CMT boosted the Opteron 6276's performance by up to 33%, which seems weird at first since LS-DYNA is a typical floating point intensive application. As the shared floating point "outsources" load and stores to the integer cores, the most logical explanation is that LS-DYNA is limited by the load/store bandwidth. This is in sharp contrast with for example 3DS Max where the additional overhead of 16 extra threads slowed the shared FP down instead of speeding it up.

Also, both CPUs seem to have made good use of their turbo capabilities. The AMD Opteron was running at 2.6 GHz most of the time, the Xeon 2690 at 3.3 GHz and the Xeon 2660 at 2.6 GHz.

The second test is the "Three Vehicle Collision Test" simulation, which runs a lot longer.

LS-Dyna Three Vehicle Collision Test

The three vehicle collision test does not change the benchmarking picture, it confirms our early findings. The Opteron Interlagos does well, but the Xeon E5 is the new HPC champion.

Blender and 3DS Max Compression and Encryption
Comments Locked

81 Comments

View All Comments

  • BSMonitor - Tuesday, March 6, 2012 - link

    My question as well.

    What is the Intel roadmap for Ivy Bridge in this arena. Would be the same timeframe as IVB-E I would guess.

    Wondering if my Intel dividends will pile up enough for me to afford one! Haha
  • devdeepc - Friday, September 2, 2016 - link

    Based on the paper specs, AMD's 6276, 6274 and Intel's 2640 and 2630 are in a neck-and-neck race.
  • fredisdead - Saturday, April 7, 2012 - link

    From the 'article' .....

    'The Opteron might also have a role in the low end, price sensitive HPC market, where it still performs very well. It won't have much of chance in the high end clustered one as Intel has the faster and more power efficient PCIe interface'

    Well, if that's the case, why exactly would AMD be scoring so many design wins with Interlagos. Including this one ...

    http://www.pcmag.com/article2/0,2817,2394515,00.as...

    http://www.eweek.com/c/a/IT-Infrastructure/Cray-Ti...

    U think those guys at Cray were going for low performance ? In fact, seems like AMD has being rather cleaning up in the HPC market since the arrival of Interlagos. And the markets have picked up on it, AMD stock is thru the roof since the start of the year. Or just see how many Intel processors occupy the the top 10 supercomputers on the planet. Nuff said ...
  • iwod - Tuesday, March 6, 2012 - link

    And not find a single comment on how and why "making this CPU quite a challenge, even for Intel."

    In my view It seems Intel is now using Server Market and Atom / SoC for their 32nm capacity when ever they introduce a new node in consumer products.
  • extide - Tuesday, March 6, 2012 - link

    A large part of Intel's long-term strategies include keeping the fabs occupied.

    Latest gen fabs (currently 22nm) produce bleeding edge cpu's, usually in the consumer space

    One gen back (32nm) produces server/workstation/mobile cpus

    two gens back (45nm) produces other things like chipsets, and possibly itanium chips

    even three gens back (65nm) probably still exists in some places making some chipsets as well.

    Their goal is to as much use as possible from their investment into building the fabs themselves.
  • Kevin G - Tuesday, March 6, 2012 - link

    65 nm is still used for Itanium, though the Poulson chip is due sometime this year made on a 32 nm process. If you want to compare die sizes, the 65 nm Tukwila design is 699 mm^2 in size.

    The main reason why 32 nm Sandybridge-E has been released so close to the release of 22 nm Ivy Bridge chips is that the initial Ivy Bridge chips are consumer centric. Intel performs additional testing on its server centric designs. This is particularly true as Sandybridge-E is not just replacing the dual socket Westmere-EP chips but some of the quad socket Westmere-EX market. RAS demands jump from going from dual to quad socket and that is reflected in additional testing. Implementing PCI-E 3.0 and QPI 1.1 also contributed to the time for additional testing.

    Though you are correct that Intel does uses its older process nodes for various chipsets and IO chips. However, as Intel is marching toward SoC designs, the actual utility of keeping these older process nodes in action is decreasing.
  • meloz - Tuesday, March 6, 2012 - link

    >And not find a single comment on how and why "making this CPU quite a challenge, even for Intel."

    Because it is such a massive die? 416 mm²? Large dies usually have a lower yield, and Intel's 32 nm process is still cutting edge (if only for a few more weeks, heh).

    Look at how TSMC, Global Flounderings et al are struggling. An impressive achievement by Intel.
  • MrSpadge - Tuesday, March 6, 2012 - link

    A significant amount of functionality has been added to the SB cores, and Intel can't afford mistakes in such CPUs.
  • BSMonitor - Tuesday, March 6, 2012 - link

    More than that though, the SNB-E, Xeon E cores are not duplicates of the SNB desktop cores.

    Look at Anand's die shot of SNB-E, vs die shot of SNB. The CPU cores, L3 cache, controllers, are arranged completely different. Which makes sense as SNB-E doesn't have to deal with 40% of the die being GPU transistors. So, what we have now with Intel is two completely different dies between Xeon/SNB-E and Core. The individual CPU cores are the same, but the rest of the die is completely different.

    SNB-E:
    http://www.anandtech.com/show/5091/intel-core-i7-3...

    SNB:
    http://www.anandtech.com/show/4083/the-sandy-bridg...
  • cynic783 - Tuesday, March 6, 2012 - link

    omg these benches are so biased it's not even funny. everyone knows amd offers clock-for-clock more punch than intel and lower power as well

Log in

Don't have an account? Sign up now