Render Servers

To get a better idea on how the different server platforms compare, we did some rendering too. Most of our tests (MySQL, DB2, and SPECjbb2005) are very integer intensive, whereas render tests are floating point intensive. We start with a simple Cinebench 9.5 benchmark (on Windows 2003 32 bit), which is based on Maxon's Cinema 4D rendering engine.

Cinebench 9.5
CPU 1280x720
Quad Opteron 880 2.4 1720
Dual Quad Xeon E5345 2.33 1686
Dual DC Xeon 5160 3.0 1456
Quad Xeon E5345 2.33 1272
Quad DC Xeon 7130M 3.2 1169
Dual Opteron 880 2.4 1121
Dual DC Xeon 5060 3.73 1079
Dual DC Xeon 7130M 3.2 889

Four 2.4GHz Opteron cores are a bit slower than four 2.33GHz Xeons, but when we look at the eight core scores the Opteron is a bit faster. Again, it seems that the Opteron system scales better.

Cinebench 9.5 (32 bit)
Per core performance
CPU Quad core Octal core Scaling 4->8
Xeon 7130 3.2 GHz 889 1272 43%
Xeon 5345 2.33 GHz 1169 1686 44%
Opteron 880 2.4 GHz 1121 1720 53%
Opteron 890 2.8 GHz 1297 1990 53%
Xeon 5160 3 GHz 1456 N/A N/A
.
Xeon Scaling 2.33 -> 3 GHz 25%    
Opteron 880 vs. Quad core Xeon 2.33 GHz -4% 2% 21%

Why do we analyze this in so much detail? Cinebench, like most renders, couldn't care less about the memory subsystem. We tested our Clovertown system with two or four memory channels and the results were exactly the same. Therefore, we are pretty sure the slightly worse scaling of the Xeon E5345 is not a result of limited bandwidth or higher latency. There must be something else that limits scalability, and that something else is most likely cache coherency.

Cinebench 9.5 (32 bit)
Per socket performance
CPU Dual Socket
Quad core Xeon 2.33 GHz vs. Xeon 5160 16%
Quad core Xeon 2.33 GHz vs. Opteron 880 50%
Quad core Xeon 2.33 GHz vs. Opteron 890 30%

Cinebench is popular because it is an easy benchmark, but 3dsmax is a very popular application. We tested with 3dsmax version 9, which has been improved to work better with multi-core systems. We used the "architecture" scene, which has been our favorite benchmarking scene for years. All tests were done with 3dsmax's default scanline renderer, SSE enabled and we rendered at HD 720p resolution. We measure the time it takes to render frames 20 to 22.


3DS Max 9 Architecture
CPU 1280x720
Quad Opteron 880 2.4 273
Dual Quad Xeon E5345 2.33 308
Dual DC Xeon 5160 3.0 309
Quad Xeon E5345 2.33 392
Dual DC Xeon 5060 3.73 419
Quad DC Xeon 7130M 3.2 443
Dual Opteron 880 2.4 454

This cannot be a coincidence anymore: a single Xeon E5345 leaves the dual Opteron 880 far behind, but a dual Xeon E5345 trails the quad Opteron. It is not only the application that matters; the dataset has an impact too. Take a look at the table below where rendered at 720p and 480p resolution.

3DS Max 9 Architecture
CPU 720x480 1280x720
Quad Opteron 880 2.4 137 273
Dual Quad Xeon E5345 2.33 138 308
Dual DC Xeon 5160 3.0 133 309
Quad Xeon E5345 2.33 167 392
Dual DC Xeon 5060 3.73 188 419
Quad DC Xeon 7130M 3.2 201 443
Dual Opteron 880 2.4 196 454
.
Scaling Opteron 880 43% 66%
Scaling Xeon E5345 21% 27%

As you can see, the resolution at which you normally render determines how much you benefit from eight cores. Using an octal core machine to render relatively low resolution movies is like driving a potent 8 cylinder engine in a crowded city: all the horsepower goes to waste as you accelerate for a short period and then hit the brakes when approaching a red light. The same is true for rendering: unless you are rendering a complex scene at high resolution, the multi-core engine can never show its full potential. Thanks to better scaling, the quad Opteron platform has still a small advantage.

3DSMax 9 (32 bit)
Per socket performance
CPU Dual Socket
Quad core Xeon 2.33 GHz vs. Xeon 5160 0%
Quad core Xeon 2.33 GHz vs. Opteron 880 47%
Quad core Xeon 2.33 GHz vs. Opteron 890 27%

However, when it comes to price/performance, it is not the quad core Xeon or the Opteron that wins, but most likely the Xeon 5160. It is more flexible as it will outperform the quad core Xeon in any scene that is not as complex as architecture and resolutions that are lower than 720p. Only if your scenes use radiosity lighting can we see a clear advantage for using the quad core Xeon. We noticed that the Xeon was up to 40% faster in such scenes.

ERP: SAP Sales & Distribution Analysis
Comments Locked

15 Comments

View All Comments

  • zsdersw - Friday, December 29, 2006 - link

    quote:

    as opposed to a single die approach like Smithfield and Paxville DP


    Smithfield/Paxville is a MCM chip (two pieces of silicon in one package), as well.
  • Khato - Wednesday, December 27, 2006 - link

    Agreed on it being quite the good review, save for the lack of power consumption numbers/analysis. Form factor and power consumption can be just as important as the performance when the application can be spread across multiple machines, now can't it? At the very least, it would be nice to link to the power consumption numbers for the opteron platform in the first review it showed up in (which puts the dual clovertown at 365W load, while the quad 880 is supposedly 657W load.)
  • rowcroft - Wednesday, December 27, 2006 - link

    Loved the article, great job.

    I'm in the process of purchasing two dual quad core servers for VMWare use. Looking at the cost to performance analysis, it would be worth mentioning that many of the high end applications are licensed on a per socket basis. This alone is saving us $20,000 on our VMWare license and making it a compelling solution.

    I would love to see more of this type of article as well- very interesting and not something you can easily find elsewhere on the net. (Tom's hardware reviewed the chip running XP Pro!)
  • duploxxx - Friday, December 29, 2006 - link

    If you think that reading this review will help you to decide what to buy as VMWARE base you are going the wrong way! Yes these small tests are in favor for the new MCW architecture as we saw before and since haevy workload seems hard to test for some sites like anand! keep in mind that VMWARE is heavy workload, you combine the cpu and ram to whatever you want, guess what the fsb can't be combined like you wish!

    thinking that a 2x quad will outperform the 4p opteron is a big laugh! the fsb will kill youre whole ESX instantly from 4+ os on your system with normal load.

    the money you save is indeed for sure, the power you loose is an other thing!

    friendly info from a certified esx 3.0 beta tester :)
  • Viditor - Wednesday, December 27, 2006 - link

    Probably one of your most thorough and well-rounded articles Johan...many thanks!
    It was nice to see you working with large (16GB) memory.
    If you do get a Socket F system, will you be updating the article?

Log in

Don't have an account? Sign up now