3DSMax 9 Backburner Rendering

A Supermicro Twin 1U could also be used as part of a rendering farm. We tested with 3D Studio Max version 9, which has been improved to work better with multi-core systems (compared to version 7 and 8) . We used the "architecture" scene of the SPEC APC test (3DS Max 7), which has been our favorite benchmarking scene for years.

All tests are done with 3dsmax's default scanline renderer, SSE is enabled, and we render at HD 720p resolution. We measure the time it takes to render frames 20 to 29, exactly 10 frames. But this time we installed the "Backburn" server on each node. The Backburn server allows us to send our renderings via the network to each node. Both nodes are still connected via our Gigabit D-link DGS-1224T switch. Then we make use of "net render". Below you can see the Backburn server in action.


Notice how the first node renders the even frames. Below you can see how this looks on the 3DSMax client. The Backburn monitor gives us an overview of the current rendering. Also note that one node is acting as both a rendering node and a Backburner client.


That does not hamper performance however. "2N" means that Backburner used both nodes of the 1U twin. So the orange bar represents the 3dsmax performance we get from rendering at two nodes, with one dual core Xeon 5160 ("Woodcrest") in each node. In each node, one socket was not used. Dual Xeon 5160 3 GHz means that we only use one node with the two sockets occupied by Xeon 5160 CPUs.


We notice a 98% speedup going from one node (941s) to a second node (475s). Of course, with only two nodes, the network overhead is minimal but this is superb scaling!

Also notice that two Xeons 5160 in one node is a little slower - about 6% - than one Xeon 5160 in each node. The mostly likely explanation is that the "setup" (before rendering) of the frame is adding overhead, as this process is not multithreaded. With two nodes, two frames are set up in parallel, using one core per node. Only two cores are idle at that time. On a single node, three cores are idle and only one core will be busy processing this pre-rendering code.

Network Load Balancing Electricity Bill
POST A COMMENT

28 Comments

View All Comments

  • JohanAnandtech - Monday, May 28, 2007 - link

    Those DIMM slots are empty :-) Reply
  • yacoub - Monday, May 28, 2007 - link

    ohhh hahah thought they were filled with black DIMMs :D Reply
  • yacoub - Monday, May 28, 2007 - link

    Also on page 8:

    quote:

    In comparison, with 2U servers, we save about 130W or about 30% thanks to Twin 1U system

    You should remove that first comma. It was throwing me off because the way it reads it sounds like the 2U servers save about 130W but then you get to the end of the sentence and realize you mean "in comparison with 2U servers, we save about 130W or about 30% thanks to Twin 1U". You could also say "Compared with 2U servers, we save..." to make the sentence even more clear.

    Thanks for an awesome article, btw. It's nice to see these server articles from time to time, especially when they cover a product that appears to offer a solid TCO and strong comparative with the competition from big names like Dell.
    Reply
  • JohanAnandtech - Monday, May 28, 2007 - link

    Fixed! Good point Reply
  • gouyou - Monday, May 28, 2007 - link

    The part about infiniband's performance much better as you increase the number of core is really misleading.

    The graph is mixing core and nodes, so you cannot tell anything. We are in an era where a server has 8 cores: the scaling is completely different as it will depend less on the network. BTW, is the graph made for single core servers ? dual cores ?
    Reply
  • MrSpadge - Monday, May 28, 2007 - link

    Gouyou, there's a link called "this article" in the part on InfiniBand which answers your question. In the original article you can read that they used dual 3 GHz Woodcrests.

    What's interesting is that the difference between InfiniBand and GigE is actually more pronounced for the dual core Woodcrests compared with single core 3.4 GHz P4s (at 16 nodes). The explanation given is that the faster dual core CPUs need more communication to sustain performance. So it seems like their algorithm uses no locality optimizations to exploit the much faster communication within a node.

    @BitJunkie: I second your comment, very nice article!

    MrS
    Reply
  • BitJunkie - Monday, May 28, 2007 - link

    Nice article, I'm most impressed by the breadth and the detail you drilled in to - also the clarity with which you presented your thinking / results. It's always good to be stretched and great example of how to approach things in structured logical way.

    Don't mind the "it's an enthusiast site" comments. Some people will be stepping outside their comfort zone with this and won't thank you for it ;)
    Reply
  • JohanAnandtech - Monday, May 28, 2007 - link

    Thanks, very encouraging comment.

    And I guess it doesn't hurt the "enthusiast" is reminded that "pcs" can also be fascinating in another role than "Hardcore gaming machine" :-). Many of my students need the same reminder: being an ITer is more than booting Windows and your favorite game. My 2-year old daughter can do that ;-)
    Reply
  • yyrkoon - Monday, May 28, 2007 - link

    It is however nice to learn about InfiniBand. This is a technology I have been interrested in for a while now, and was under the impression was not going to be implemented until PCIe v2.0 (maybe I missed something here).

    I would still rather see this technology in the desktop class PC, and if this is yet another enterprise driven technology, then people such as myself, who were hoping to use it for decent home networking(remote storage) are once again, left out in the cold.
    Reply
  • yyrkoon - Monday, May 28, 2007 - link

    quote:

    And I guess it doesn't hurt the "enthusiast" is reminded that "pcs" can also be fascinating in another role than "Hardcore gaming machine" :-). Many of my students need the same reminder: being an ITer is more than booting Windows and your favorite game. My 2-year old daughter can do that ;-)


    And I am sure every gamer out there knows what iSCSI *is* . . .

    Even in 'IT' a 16 core 1U rack is a specialty system, and while they may be semi common in the load balancing/failover scenario(or maybe even used extensively in paralell processing, yes, and even more possible uses . . .), they are still not all that common comparred to the 'standard' server. Recently, a person that I know deployed 40k desktops/ 30k servers for a large company, and would'nt you know it, not one had more than 4 cores . . . and I have personally contracted work from TV/Radio stations(and even the odd small ISP), and outside of the odd 'Toaster', most machines in these places barely use 1 core.

    I too also find technologies such as 802.3 ad link aggregation, iSCSI, AoE, etc interresting, and sometimes like playing around with things like openMosix, the latest /hottest Linux Distro, but at the end of the day, other than experimentation, these things typically do not entertain me. Most of the above, and many other technologies for me, are just a means to an end, not entertainment.

    Maybe it is enjoyable staring at a machine of this type, not being able to use it to its full potential outside of the work place ? Personally I would not know, and honestly I really do not care, but if this is the case, perhaps you need to take notice of your 2 year old daughter, and relax once in a while.

    The point here ? The point being: pehaps *this* 'gamer' you speak of knows a good bit more about 'IT' than you give him credit for, and maybe even makes a fair amount of cash at the end of the day while doing so. Or maybe I am a *real* hardware enthusiast, who would rather be reading about technology, instead of reading yet another 'product review'. Especially since any person worth their paygrade in IT should already know how this system (or anything like) is going to perform beforehand.
    Reply

Log in

Don't have an account? Sign up now