The Xeon E5-2600: Dual Sandy Bridge for Servers

Name: The Xeon E5-2600: Dual Sandy Bridge for Servers
Item: The Xeon E5-2600: Dual Sandy Bridge for Servers
Author: Johan De Gelas

by Johan De Gelas on March 6, 2012 9:27 AM EST

81 Comments | Add A Comment

81 Comments

LS-DYNA

LS-DYNA is a "general purpose structural and fluid analysis simulation software package capable of simulating complex real world problems", developed by the Livermore Software Technology Corporation (LSTC). It is used by the automobile, aerospace, construction, military, manufacturing and bioengineering industry. Even simple simulations take hours to complete, so even a small performance increase results in tangible savings. Add to that that many of our readers have been asking that we perform some benchmarking with HPC workloads. So reasons enough to include our own LS-DYNA benchmarking.

These numbers are not directly comparable with AMD's and Intel's benchmarks as we did not perform any special tuning besides using the message passing interface (MPI) version of LS-DYNA ( ls971_mpp_hpmpi ) to run the LS-DYNA solver to get maximum scalability. This is HP-MPI version of LS-DYNA 9.71.

Our first test is a refined revised Neon crash test simulation.

LS-Dyna Neon-Refined Revised

This is one of the few benchmarks (besides SAP) where the Opteron 6276 outperforms the older Opteron 6174 by a tangible margin (about 20% faster) and is significantly faster than the Xeon 5600, by 40% to be more precise. However, the direct competitor of the 6276, the Xeon E5-2630, will do a bit better (see the E5-2660 6C score). When you are aiming for the best performance, it is impossible to beat the best Xeons: the Xeon E5-2660 offers 26% better performance, the 2690 is 46% faster. It is interesting to note that LS-Dyna does not scale well with clockspeed: the 32% higher clockspeed of the Xeon E5-2690 results in only a 15% speed increase.

A few other interesting things to note: we saw only a very smal performance increase (+5%) due to Hyperthreading. Memory bandwidth does not seem to be critical either, as performance increased by only 6% when we replaced DDR3-1333 with DDR3-1600. If LS-Dyna was bottlenecked severely by the memory speed we should have seen a performance increase close to 20% (1600 vs 1333).

CMT boosted the Opteron 6276's performance by up to 33%, which seems weird at first since LS-DYNA is a typical floating point intensive application. As the shared floating point "outsources" load and stores to the integer cores, the most logical explanation is that LS-DYNA is limited by the load/store bandwidth. This is in sharp contrast with for example 3DS Max where the additional overhead of 16 extra threads slowed the shared FP down instead of speeding it up.

Also, both CPUs seem to have made good use of their turbo capabilities. The AMD Opteron was running at 2.6 GHz most of the time, the Xeon 2690 at 3.3 GHz and the Xeon 2660 at 2.6 GHz.

The second test is the "Three Vehicle Collision Test" simulation, which runs a lot longer.

LS-Dyna Three Vehicle Collision Test

The three vehicle collision test does not change the benchmarking picture, it confirms our early findings. The Opteron Interlagos does well, but the Xeon E5 is the new HPC champion.

Blender and 3DS Max Compression and Encryption

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

81 Comments

View All Comments

alpha754293 - Tuesday, March 6, 2012 - link
Thanks for running those.

Are those results with HTT or without?

If you can write a little more about the run settings that you used (with/without HTT, number of processes), that would be great.

Very interesting results thought.

It would have been interesting to see what the power consumption and total energy consumption numbers would be for these runs (to see if having the faster processor would really be that beneficial).

Thanks!
alpha754293 - Tuesday, March 6, 2012 - link
I should work with you more to get you running some Fluent benchmarks as well.

But, yes, HPC simulations DO take a VERY long time. And we beat the crap out of our systems on a regular basis.
jhh - Tuesday, March 6, 2012 - link
This is the most interesting part to me, as someone interested in high network I/O. With the packets going directly into cache, as long as they get processed before they get pushed out by subsequent packets, the packet processing code doesn't have to stall waiting for the packet to be pulled from RAM into cache. Potentially, the packet never needs to be written to RAM at all, avoiding using that memory capacity. In the other direction, web servers and the like can produce their output without ever putting the results into RAM.
meloz - Tuesday, March 6, 2012 - link
I wonder if this Data Direct I/O Technology has any relevance to audio engineering? I know that latency is a big deal for those guys. In past I have read some discussion on latency at gearslutz, but the exact science is beyond me.

Perhaps future versions of protools and other professional DAWs will make use of Data Direct I/O Technology.
Samus - Tuesday, March 6, 2012 - link
wow. 20MB of on-die cache. thats ridiculous.
PwnBroker2 - Tuesday, March 6, 2012 - link
dont know about the others but not ATT. still using AMD even on the new workstation upgrades but then again IBM does our IT support, so who knows for the future.

the new xeon's processors are beasts anyways, just wondering what the server price point will be.
tipoo - Tuesday, March 6, 2012 - link
"AMD's engineers probably the dumbest engineers in the world because any data in AMD processor is not processed but only transferred to the chipset."

...What?
tipoo - Tuesday, March 6, 2012 - link
Think you've repeated that enough for one article?
tipoo - Wednesday, March 7, 2012 - link
Like the Ivy bridge comments, just for future readers note that this was a reply to a deleted troll and no longer applies.
IntelUser2000 - Tuesday, March 6, 2012 - link
Johan, you got the percentage numbers for LS-Dyna wrong.

You said for the first one: the Xeon E5-2660 offers 20% better performance, the 2690 is 31% faster. It is interesting to note that LS-Dyna does not scale well with clockspeed: the 32% higher clockspeed of the Xeon E5-2690 results in only a 14% speed increase.

E5-2690 vs Opteron 6276: +46%(621/426)
E5-2660 vs Opteron 6276: +26%(621/492)
E5-2690 vs E5-2660: +15%(492/426)

In the conclusion you said the E5 2660 is "56% faster than X5650, 21% faster than 6276, and 6C is 8% faster than 6276"

Actually...

LS Dyna Neon-

E5-2660 vs X5650: +77%(872/492)
E5-2660 vs 6276: +26%(621/492)
E5-2660 6C vs 6276: +9%(621/570)

LS Dyna TVC-

E5-2660 vs X5650: +78%(10833/6072)
E5-2660 vs 6276: +35%(8181/6072)
E5-2660 6C vs 6276: +13%(8181/7228)

It's funny how you got the % numbers for your conclusions. It's merely the ratio of lower number vs higher number multiplied by 100.

The Xeon E5-2600: Dual Sandy Bridge for Servers

Post Your Comment

81 Comments

View All Comments

alpha754293 - Tuesday, March 6, 2012 - link

alpha754293 - Tuesday, March 6, 2012 - link

jhh - Tuesday, March 6, 2012 - link

meloz - Tuesday, March 6, 2012 - link

Samus - Tuesday, March 6, 2012 - link

PwnBroker2 - Tuesday, March 6, 2012 - link

tipoo - Tuesday, March 6, 2012 - link

tipoo - Tuesday, March 6, 2012 - link

tipoo - Wednesday, March 7, 2012 - link

IntelUser2000 - Tuesday, March 6, 2012 - link

Log in

Don't have an account? Sign up now