AMD's 12-core "Magny-Cours" Opteron 6174 vs. Intel's 6-core Xeon

Name: AMD's 12-core "Magny-Cours" Opteron 6174 vs. Intel's 6-core Xeon
Item: AMD's 12-core "Magny-Cours" Opteron 6174 vs. Intel's 6-core Xeon
Author: Johan De Gelas

by Johan De Gelas on March 29, 2010 12:00 AM EST

Posted in
IT Computing

58 Comments | Add A Comment

58 Comments

Rendering: Blender 2.5 Alpha 2

Blender 2.5 Alpha 2
Operating System	Windows 2008 Enterprise R2 (64-bit)
Software	Blender 2.5 Alpha 2
Benchmark software	Built-in render engine

3dsmax 2010 crashed on almost all our servers. Granted, it is not meant to be run on a server but on a workstation. We’ll try some tests with Backburner later when the 2011 version is available. In the meantime, it is time for something less bloated and especially less expensive: Blender.

Blender has been getting a lot of positive attention and judging by its very fast growing community it is on its way to become one of the most popular 3D animation packages out there. The current stable version 2.49 can only render up to 8 threads. Blender 2.5 alpha 2 can go up to 64. To our surprise, the software was pretty stable, so we went ahead and started testing.

If you like, you can perform this benchmark very easily too. We used the “metallic robot”, a scene with rather complex lighting (reflections!) and raytracing. To make the benchmark more repetitive, we changed the following parameters:

The resolution was set to 2560 x 1600
Anti-alias was set to 16
We disabled compositing in post processing
Tiles were set to 8x8 (X=8, Y=8)
Threads was set to auto (one thread per CPU is set).

Let us first check out the results on Windows 2008 R2:

Blender 2.5 Alpha 2 Windows

At first the Opteron 6174 results were simply horrible: 44.6 seconds, slower than the dual Opteron six-core!

Ivan Paulos Tomé, the official maintainer of the Brazilian Blender 3D Wiki, gave us some interesting advice. The default number of tiles is apparently set of 5x5. This result in a short period of 100% CPU load on the Opteron 6174 and a long period where the CPU load drops below 30%. We first assumed that 8x6, two times as many tiles as the number of CPUs would be best. After some experimenting, we found that 8x8 is the best for all machines. The Xeons and six-core Opterons gained 10%, while the 12-core Opteron became 40% (!) faster. This underlines that the more cores you have, the harder they are to make good use of.

Blender can be run on several operating systems, so let us see what happens under 64 bit Linux (Suse SLES 11).

Rendering: Blender 2.5 Alpha 2 on SLES 11

Blender 2.5 Alpha 2
Operating System	SUSE SLES 11, Linux Kernel 2.6.27.19-5-default SMP
Software	Blender 2.5 Alpha 2
Benchmark software	Built-in render engine

Blender 2.5 Alpha 2 Linux

What happened here? Not only is Blender 50 to 70% faster on Linux, the tables have turned. As the software is still in Alpha 2 phase, it is good to take the results with a grain of salt, but still. For some reason, the Linux version is capable of keeping the cores fed much longer. On Windows, the first half of the benchmark is spent at 100% CPU load, and then it quickly goes down to 75, 50 and even 25% CPU load. In Linux, the CPU load, especially on the Opteron 6174 stays at 99-100% for much longer.

So is the Opteron 6174 the one to get? We are not sure. If these benchmarks are still accurate when we test with the final 2.5 version, there is a good chance that the octal-core 6136 2.4 GHz will be the Blender champion. It has a much lower price and slightly higher performance per core for less complex rendering work. We hope to follow up with new benchmarks. It is pretty amazing what Blender does with a massive number of cores. At the same time, we imagine Intel's engineers will quickly find out why the blender engine fails to make good use of the the dual Xeon X5670's 24 logical cores. This is far from over yet…

Rendering: Cinebench 11.5 OLTP benchmark Oracle Charbench “Calling Circle”

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

58 Comments

View All Comments

wolfman3k5 - Monday, March 29, 2010 - link
Great review! Thanks for the review, when will you guys be reviewing the AMD Phenom II X6 for us mere mortals? I wonder how the Phenom II X6 will stack up against the Core i7 920/930.

Keep up the good work!
ash9 - Tuesday, March 30, 2010 - link
Since SSE4.1,SSE4.2 are not in AMD's , its Andand's way of getting an easy benchmark win, seeing some of these benchmark test probably use them-

http://blogs.zdnet.com/Ou/?p=719
August 31st, 2007
SSE extension wars heat up between Intel and AMD

"Microprocessors take approximately five years to go from concept to product and there is no way Intel can add SSE5 to their Nehalem product and AMD can’t add SSE4 to their first-generation 45nm CPU “Shanghai” or their second-generation 45nm “Bulldozer” CPU even if they wanted to. AMD has stated that they will implement SSE4 following the introduction of SSE5 but declined to give a timeline for when this will happen."

asH
mariush - Tuesday, March 30, 2010 - link
One of the best optimized and multi threaded applications out there is the open source video encoder x264.

Would it be possible to test how well 2 x 8 and 2x12 amd configurations work at encoding 1080p video at some very high quality settings?

A workstation with 24 cores from AMD would cost almost as much as a single socket 6 cores system from Intel so it would be interesting to see if the increase in frequency and the additional SSE instructions would be more advantage than the number of cores.
Aclough - Tuesday, March 30, 2010 - link
I wonder if the difference between the Windows and Linux test results is related to the recentish changes in the scheduler? From what I understand the introduction of the CFS in 2.6.23 was supposed to be really good for large numbers of cores, and I'm given to understand that before that the Linux scheduler worked similarly to the recent Windows one. It would be interesting to try running that benchmark with a 2.6.22 kernel or one with the old O(1) patched in.

Or it could just be that Linux tends to be more tuned for throughput whereas Windows tends to be more tuned for low latency. Or both.
Aclough - Tuesday, March 30, 2010 - link
In any event, the place I work for is a Linux shop and our workload is probably most similar to Blender, so we're probably going to continue to buy AMD.
ash9 - Tuesday, March 30, 2010 - link
http://www.egenera.com/pdf/oracle_benchmarks.pdf

"Performance testing on the Egenera BladeFrame system has demonstrated that the platform
is capable of delivering high throughput from multiple servers using Oracle Real Application
Clusters (RAC) database software. Analysis using Oracle’s Swingbench demonstration tool
and the Calling Circle schema has shown very high transactions-per-minute performance
from single-node implementations with dual-core, 4-socket SMP servers based on Intel and
AMD architectures running a 64-bit-extension Linux operating system. Furthermore, results
demonstrated 92 percent scalability on either server type up to at least 10 servers.
The BladeFrame’s architecture naturally provides a host of benefits over other platforms
in terms of manageability, server consolidation and high availability for Oracle RAC."
nexox - Tuesday, March 30, 2010 - link
It could also be that Linux has a NUMA-aware scheduler, so it'd try to keep data stored in ram which is connected to the core that's running the thread which needs to access the data. I probably didn't explain that too well, but it'd cut down on memory latency because it would minimize going out over the HT links to fetch data. I doubt that Windows does this, given that Intel hasn't had NUMA systems for very long yet.

I sort of like to see more Linux benchmarks, since that's really all I'd ever consider running on data center-class hardware like this, and since apparently Linux performance has very little to do with Windows performance, based on that one test.
yasbane - Wednesday, May 19, 2010 - link
Agreed. I do find it disappointing that they put so few benchmarks for Linux for servers, and so many for windows.

-C
jbsturgeon - Tuesday, March 30, 2010 - link
I like the review and enjoyed reading it. I can't help but feel the benchmarks are less a comparison of CPU's and more a study on how well the apps can be threaded as well as the implementation of that threading -- higher clocked cpus will be better for serial code and more cores will win for apps that are well threaded. In scientific number crunching (the code I write ), more cores always wins (AMD). We do use Fluent too, so thanks for including those benchamarks!!
jbsturgeon - Tuesday, March 30, 2010 - link
Obviously that rule can be altered by a killer memory bus :-).

AMD's 12-core "Magny-Cours" Opteron 6174 vs. Intel's 6-core Xeon

Rendering: Blender 2.5 Alpha 2

Rendering: Blender 2.5 Alpha 2 on SLES 11

Post Your Comment

58 Comments

View All Comments

wolfman3k5 - Monday, March 29, 2010 - link

ash9 - Tuesday, March 30, 2010 - link

mariush - Tuesday, March 30, 2010 - link

Aclough - Tuesday, March 30, 2010 - link

Aclough - Tuesday, March 30, 2010 - link

ash9 - Tuesday, March 30, 2010 - link

nexox - Tuesday, March 30, 2010 - link

yasbane - Wednesday, May 19, 2010 - link

jbsturgeon - Tuesday, March 30, 2010 - link

jbsturgeon - Tuesday, March 30, 2010 - link

Log in

Don't have an account? Sign up now