The Xeon E5-2600: Dual Sandy Bridge for Servers

Name: The Xeon E5-2600: Dual Sandy Bridge for Servers
Item: The Xeon E5-2600: Dual Sandy Bridge for Servers
Author: Johan De Gelas

by Johan De Gelas on March 6, 2012 9:27 AM EST

81 Comments | Add A Comment

81 Comments

Rendering: Blender 2.6.0

Blender is a very popular open source renderer with a large and active community. We tested the 64-bit Windows edition, using version 2.6.0a. If you like, you can perform this benchmark very easily too. We used the metallic robot, a scene with rather complex lighting (reflections) and raytracing. Furthermore to make the benchmark more repetitive, we changed the following parameters:

The resolution was set to 2560x1600
Antialiasing was set to 16
We disabled compositing in post processing
Tiles were set to 16x16 (X=16, Y=16)
Threads was set to auto (one thread per CPU is set).

As we have explained, the current 24 and 32 core CPUs benefit from using a much larger number of tiles than we have previously used (64, 8x8). That is why we raised the number of tiles to 256 (16x16), though all CPUs perform better at this setting.

To make the results easier to read, we again converted the reported render time into images rendered per hour, so higher is better.

Blender 2.6.0

Blender is Xeon territory for sure, as Blender mostly runs in the L1 and L2 cache. Therefore a E5-2630 (2.3 GHz, 15 MB L3, $612) will probably perform about 4% faster than the six-core Xeon E5-2660 in this test. Our six-core Xeon E5-2660 is about 26% faster than the best Opteron. We estimate that the Xeon E5-2630 will offer more or less the same performance at an almost 30% lower pricepoint than the Opteron 6276. Whether you have a lot or little to spend, the Xeon E5 is your best bet for Blender.

Rendering Performance: 3DSMax 2012

As requested, we're reintroducing our 3DS Max benchmark. We used the "architecture" scene which is included in the SPEC APC 3DS Max test. As the Scanline renderer is limited to 16 threads, we're using the iray render engine, which is basically an self-configuring Mental Ray render engine.

We rendered at 720p (1280x720) resolution. We measured the time it takes to render 10 frames (from 20 to 29) with SSE enabled. We recorded the time and then calculated (3600 seconds * 10 frames / time recorded) how many frames a certain CPU configuration could render in one hour. All results are reported as rendered images per hour; higher is thus better. We used the 64-bit version of 3ds Max 2008 on 64-bit Windows 2008 R2 SP1.

3DSMax 2012 Architecture

Even with the advanced iray renderer, 3DS Max rendering reaches our scaling limits. The 32-thread Xeons do not come close to 100% CPU load (more like 90%) and in between the frames there are small periods of single threaded processing. Amdahl's law is most likely reason here. We suspect that highly clocked lower core count models can pass the 53 fps barrier we're seeing here.

Rendering Performance: Cinebench HPC: LSTC's LS Dyna

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

81 Comments

View All Comments

alpha754293 - Tuesday, March 6, 2012 - link
Thanks for running those.

Are those results with HTT or without?

If you can write a little more about the run settings that you used (with/without HTT, number of processes), that would be great.

Very interesting results thought.

It would have been interesting to see what the power consumption and total energy consumption numbers would be for these runs (to see if having the faster processor would really be that beneficial).

Thanks!
alpha754293 - Tuesday, March 6, 2012 - link
I should work with you more to get you running some Fluent benchmarks as well.

But, yes, HPC simulations DO take a VERY long time. And we beat the crap out of our systems on a regular basis.
jhh - Tuesday, March 6, 2012 - link
This is the most interesting part to me, as someone interested in high network I/O. With the packets going directly into cache, as long as they get processed before they get pushed out by subsequent packets, the packet processing code doesn't have to stall waiting for the packet to be pulled from RAM into cache. Potentially, the packet never needs to be written to RAM at all, avoiding using that memory capacity. In the other direction, web servers and the like can produce their output without ever putting the results into RAM.
meloz - Tuesday, March 6, 2012 - link
I wonder if this Data Direct I/O Technology has any relevance to audio engineering? I know that latency is a big deal for those guys. In past I have read some discussion on latency at gearslutz, but the exact science is beyond me.

Perhaps future versions of protools and other professional DAWs will make use of Data Direct I/O Technology.
Samus - Tuesday, March 6, 2012 - link
wow. 20MB of on-die cache. thats ridiculous.
PwnBroker2 - Tuesday, March 6, 2012 - link
dont know about the others but not ATT. still using AMD even on the new workstation upgrades but then again IBM does our IT support, so who knows for the future.

the new xeon's processors are beasts anyways, just wondering what the server price point will be.
tipoo - Tuesday, March 6, 2012 - link
"AMD's engineers probably the dumbest engineers in the world because any data in AMD processor is not processed but only transferred to the chipset."

...What?
tipoo - Tuesday, March 6, 2012 - link
Think you've repeated that enough for one article?
tipoo - Wednesday, March 7, 2012 - link
Like the Ivy bridge comments, just for future readers note that this was a reply to a deleted troll and no longer applies.
IntelUser2000 - Tuesday, March 6, 2012 - link
Johan, you got the percentage numbers for LS-Dyna wrong.

You said for the first one: the Xeon E5-2660 offers 20% better performance, the 2690 is 31% faster. It is interesting to note that LS-Dyna does not scale well with clockspeed: the 32% higher clockspeed of the Xeon E5-2690 results in only a 14% speed increase.

E5-2690 vs Opteron 6276: +46%(621/426)
E5-2660 vs Opteron 6276: +26%(621/492)
E5-2690 vs E5-2660: +15%(492/426)

In the conclusion you said the E5 2660 is "56% faster than X5650, 21% faster than 6276, and 6C is 8% faster than 6276"

Actually...

LS Dyna Neon-

E5-2660 vs X5650: +77%(872/492)
E5-2660 vs 6276: +26%(621/492)
E5-2660 6C vs 6276: +9%(621/570)

LS Dyna TVC-

E5-2660 vs X5650: +78%(10833/6072)
E5-2660 vs 6276: +35%(8181/6072)
E5-2660 6C vs 6276: +13%(8181/7228)

It's funny how you got the % numbers for your conclusions. It's merely the ratio of lower number vs higher number multiplied by 100.

The Xeon E5-2600: Dual Sandy Bridge for Servers

Rendering: Blender 2.6.0

Post Your Comment

81 Comments

View All Comments

alpha754293 - Tuesday, March 6, 2012 - link

alpha754293 - Tuesday, March 6, 2012 - link

jhh - Tuesday, March 6, 2012 - link

meloz - Tuesday, March 6, 2012 - link

Samus - Tuesday, March 6, 2012 - link

PwnBroker2 - Tuesday, March 6, 2012 - link

tipoo - Tuesday, March 6, 2012 - link

tipoo - Tuesday, March 6, 2012 - link

tipoo - Wednesday, March 7, 2012 - link

IntelUser2000 - Tuesday, March 6, 2012 - link

Log in

Don't have an account? Sign up now