Rendering: Blender 2.6.0

Blender is a very popular open source renderer with a large and active community. We tested the 64-bit Windows edition, using version 2.6.0a. If you like, you can perform this benchmark very easily too. We used the metallic robot, a scene with rather complex lighting (reflections) and raytracing. Furthermore to make the benchmark more repetitive, we changed the following parameters:

  1. The resolution was set to 2560x1600
  2. Antialiasing was set to 16
  3. We disabled compositing in post processing
  4. Tiles were set to 16x16 (X=16, Y=16)
  5. Threads was set to auto (one thread per CPU is set).

As we have explained, the current 24 and 32 core CPUs benefit from using a much larger number of tiles than we have previously used (64, 8x8). That is why we raised the number of tiles to 256 (16x16), though all CPUs perform better at this setting.

To make the results easier to read, we again converted the reported render time into images rendered per hour, so higher is better.

Blender 2.6.0

Blender is Xeon territory for sure, as Blender mostly runs in the L1 and L2 cache. Therefore a E5-2630 (2.3 GHz, 15 MB L3, $612) will probably perform about 4% faster than the six-core Xeon E5-2660 in this test. Our six-core Xeon E5-2660 is about 26% faster than the best Opteron. We estimate that the Xeon E5-2630 will offer more or less the same performance at an almost 30% lower pricepoint than the Opteron 6276. Whether you have a lot or little to spend, the Xeon E5 is your best bet for Blender.

Rendering Performance: 3DSMax 2012

As requested, we're reintroducing our 3DS Max benchmark. We used the "architecture" scene which is included in the SPEC APC 3DS Max test. As the Scanline renderer is limited to 16 threads, we're using the iray render engine, which is basically an self-configuring Mental Ray render engine.

We rendered at 720p (1280x720) resolution. We measured the time it takes to render 10 frames (from 20 to 29) with SSE enabled. We recorded the time and then calculated (3600 seconds * 10 frames / time recorded) how many frames a certain CPU configuration could render in one hour. All results are reported as rendered images per hour; higher is thus better. We used the 64-bit version of 3ds Max 2008 on 64-bit Windows 2008 R2 SP1.

3DSMax  2012 Architecture

Even with the advanced iray renderer, 3DS Max rendering reaches our scaling limits. The 32-thread Xeons do not come close to 100% CPU load (more like 90%) and in between the frames there are small periods of single threaded processing. Amdahl's law is most likely reason here. We suspect that highly clocked lower core count models can pass the 53 fps barrier we're seeing here.

Rendering Performance: Cinebench HPC: LSTC's LS Dyna
Comments Locked

81 Comments

View All Comments

  • JohanAnandtech - Wednesday, March 7, 2012 - link

    Argh. You are absolutely right. I reversed all divisions. I am fixing this as we type. Luckily this does not alter the conclusion: LS-DYNA does not scale with clockspeed very well.
  • alpha754293 - Wednesday, March 7, 2012 - link

    I think that I might have an answer for you as to why it might not scale well with clock speed.

    When you start a multiprocessor LS-DYNA run, it goes through a stage where it decomposes the problem (through a process called recursive coordinate bisection (RCB)).

    This decomposition phase is done every time you start the run, and it only runs on a single processor/core. So, suppose that you have a dual-socket server where the processors say...are hitting 4 GHz. That can potentially be faster than say if you had a four-socket server, but each of the processors are only 2.4 GHz.

    In the first case, you have a small number of really fast cores (and so it will decompose the domain very quickly), whereas in the latter, you have a large number of much slower cores, so the decomposition will happen slowly, but it MIGHT be able to solve the rest of it slightly faster (to make up for the difference) just because you're throwing more hardware at it.

    Here's where you can do a little more experimenting if you like.

    Using the pfile (command line option/flag 'p=file'), not only can you control the decomposition method, but you can also tell it to write the decomposition to a file.

    So had you had more time, what I would have probably done is written out the decompositions for all of the various permutations you're going to be running. (n-cores, m-number of files.)

    When you start the run, instead of it having to decompose the problem over and over again each time it starts, you just use the decomposition that it's already done (once) and then that way, you would only be testing PURELY the solving part of the run, rather than from beginning to end. (That isn't to say that the results you've got is bad - it's good data), but that should help to take more variables out of the equation when it comes to why it doesn't scale well with clock speed. (It should).
  • IntelUser2000 - Tuesday, March 6, 2012 - link

    Please refrain from creating flamebait in your posts. Your post is almost like spam, almost no useful information is there. If you are going to love one side, don't hate the other.
  • Alexko - Tuesday, March 6, 2012 - link

    It's not "like spam", it's just plain spam at this point. A little ban + mass delete combo seems to be in order, just to cleanup this thread—and probably others.
  • ultimav - Wednesday, March 7, 2012 - link

    My troll meter is reading off the charts with this guy. Reading between the lines, he's actually a hardcore AMD fan trying to come across as the Intel version of Sharikou to paint Intel fans in a bad light. Pretty obvious actually.
  • JohanAnandtech - Wednesday, March 7, 2012 - link

    We had to mass delete his posts as they indeed did not contain any useful info and were full of insults. The signal to noise ratio has been good the last years, so we must keep it that way.

    Inteluser2000, Alexko, Ultimav, tipoo: thx for helping to keep the tone civil here. Appreciate it.

    - Johan.
  • tipoo - Wednesday, March 7, 2012 - link

    And thank you for removing that stuff.
  • tipoo - Tuesday, March 6, 2012 - link

    We get it. Don't spam the whole place with the same post.
  • tipoo - Tuesday, March 6, 2012 - link

    No, he's just a rational persons. I don't care which company you like, if you say the same thing 10 times in one article someones sure to get annoyed and with justification.
  • MySchizoBuddy - Tuesday, March 6, 2012 - link

    I'm again requesting that when you do the benchmarks please do a Performance per watt metric along with stress testing by running folding@home for straight 48hours.

Log in

Don't have an account? Sign up now