Faster Unaligned Cache Accesses & 3D Rendering Performance

3dsmax r9

Our benchmark, as always, is the SPECapc 3dsmax 8 test but for the purpose of this article we only run the CPU rendering tests and not the GPU tests.

3dsmax 9

The results are reported as render times in seconds and the final CPU composite score is a weighted geometric mean of all of the test scores.

CPU / 3dsmax Score Breakdown Radiosity Throne Shadowmap CBALLS2 SinglePipe2 Underwater SpaceFlyby UnderwaterEscape
Nehalem (2.66GHz) 12.891s 11.193s 5.729s 20.771s 24.112s 30.66s 27.357s
Penryn (2.66GHz) 19.652s 14.186s 13.547s 30.249s 32.451s 33.511s 31.883s


The CBALLS2 workload is where we see the biggest speedup with Nehalem, performance more than doubles. It turns out that CBALLS2 calls a function in the Microsoft C Runtime Library (msvcrt.dll) that can magnify the Core architecture's performance penalty when accessing data that is not aligned with cache line boundaries. Through some circuit tricks, Nehalem now has significantly lower latency unaligned cache accesses and thus we see a huge improvement in the CBALLS2 score here. The CBALLS2 workload is the only one within our SPECapc 3dsmax test that really stresses the unaligned cache access penalty of the current Core architecture, but there's a pretty strong performance improvement across the board in 3dsmax.

Nehalem is just over 40% faster than Penryn, clock for clock, in 3dsmax.

Cinebench R10

A benchmarking favorite, Cinebench R10 is designed to give us an indication of performance in the Cinema 4D rendering application.

Cinebench R10

Cinebench also shows healthy gains with Nehalem, performance went up 20% clock for clock over Penryn.

We also ran the single-threaded Cinebench test to see how performance improved on an individual core basis vs. Penryn (Updated: The original single-threaded Penryn Cinebench numbers were incorrect, we've included the correct ones):

Cinebench R10 - Single Threaded Benchmark

Cinebench shows us only a 2% increase in core-to-core performance from Penryn to Nehalem at the same clock speed. For applications that don't go out to main memory much and can stay confined to a single core, Nehalem behaves very much like Penryn. Remember that outside of the memory architecture and HT tweaks to the core, Nehalem's list of improvements are very specific (e.g. faster unaligned cache accesses).

The single thread to multiple thread scaling of Penryn vs. Nehalem is also interesting:

 Cinebench R10 1 Thread N-Threads Speedup
Nehalem (2.66GHz) 3015 12596 4.18x
Core 2 Quad Q9450 - Penryn - (2.66GHz) 2931 10445 3.56x

 

The speedup confirms what you'd expect in such a well threaded FP test like Cinebench, Nehalem manages to scale better thanks to Hyper Threading. If Nehalem had the same 3.56x scaling factor that we saw with Penryn it would score a 10733, virtually inline with Penryn. It's Hyper Threading that puts Nehalem over the edge and accounts for the rest of the gain here.

While many 3D rendering and video encoding tests can take at least some advantage of more threads, what about applications that don't? One aspect of Nehalem's performance we're really not stressing much here is its IMC performance since most of these benchmarks ended up being more compute intensive. Where HT doesn't give it the edge, we can expect some pretty reasonable gains from Nehalem's IMC alone. The Nehalem we tested here is crippled in that respect thanks to a premature motherboard, but gains on the order of 20% in single or lightly threaded applications is a good expectation to have.

 

POV-Ray 3.7 Beta 24

POV-Ray is a popular raytracer, also available with a built in benchmark. We used the 3.7 beta which has SMP support and ran the built in multithreaded benchmark.

POV-Ray 3.7 Beta 24

Finally POV-Ray echoes what we've seen elsewhere, with a 36% performance improvement over the 2.66GHz Core 2 Q9450. Note that Nehalem continues to be faster than even the fastest Penryns available today, despite the lower clock speed of this early sample.

Nehalem's Media Encoding Performance Power Consumption
Comments Locked

108 Comments

View All Comments

  • SiliconDoc - Monday, July 28, 2008 - link

    Crysis- etc. :

    Pete, you can be very happy knowing it will do folding like mad, and you can fantasize that you've cured cancer while you spend your money for some tax subsidized already to the hilt University program, because you're such a good and loving person.
    ( I know YOU didn't mean anything like that - see sarcasm! )
    In the mean time, the OLD HT single core chips will do just fine cranking most games, and dual core or core2duo or 2180 or some other then $40 chip will be a few percentage pts. shy.
    My gawd, they've got our number.
    I bet they "unlock it !!!!! " OMG ! for like 2 grand if you're cooooool you can get one!
  • Crank the Planet - Thursday, June 5, 2008 - link

    I know it may be exciting but the article sounds fan-boyish. For most of the marks it shows what intel is claiming 20-30% boost. He gets one mark to go 50% and now it's 20-50% boost?? He compares in another mark AMD 21 and nehalem 14 and says it's almost 50% faster!!! and then compares penryn 18 and nehalem 14 and says it's 28%. I think the AMD mark was more like 35%.

    As I've said before everybody knows AMD was going to hurt themselves in the short run by buying ATI. If they didn't buy ATI I think things would be very different. Now that the last year of payments is being made for buying ATI AMD will be able to get back into the game.

    Intel has only now integrated the memory controller. Everybody knew as soon as they did they would see a nice bump. They haven't had any significant innovations in a long time. AMD is in the same position they were before K8. Just give them some time to finish absorbing ATI, then watch out- fusion is just around the corner :)
  • hs635 - Tuesday, June 17, 2008 - link

    Fuck off retard
  • masouth - Friday, June 6, 2008 - link

    What kind of idiot fan-boy drivel is this?

    "He gets one mark to go 50% and now it's 20-50% boost??"

    Ummm, yes?

    1, 2, 3, 4, 8

    What is the range of those numbers? 1-8, right?

    Does the majority of them being being in the 1-5 range somehow negate the fact that the actual range is 1-8?

    THINK PEOPLE!
  • michael2k - Thursday, June 5, 2008 - link

    You're the one that sounds like a fanboy.

    What makes you think Intel's CPU-GPU integration won't be as fabulous as their IMC or quad-core components? Intel doesn't need "significant innovations" (nor does AMD), they just need higher performance, lower power, and lower cost, which is exactly what they have.

    Innovations only exist to serve those aspects.
  • Justin Case - Sunday, June 8, 2008 - link

    Wrong.

    AMD64 (the instruction set) isn't about "more performance". Virtualization isn't about "more performance". Hardware no-execute flags aren't about "more performance". SATA's hot-plug ability isn't about "more performance".

    Your statement shows the kind of lack of vision that brought us the Pentium 4.

    I for one am far more excited about technology that allows me to do something new or different than "technology" that simply lets me do the same stuff faster. 99% of CPU cycles in the planet go unused anyway.

  • zsdersw - Thursday, June 5, 2008 - link

    Given the overall tone of your reply, the criticism of the article as "fan-boyish" is, really, the pot calling the kettle black.
  • Visual - Thursday, June 5, 2008 - link

    so you agree as well? yeah, me too.
    they are both black. they are both fanboys :)
  • zsdersw - Thursday, June 5, 2008 - link

    I've said nothing about agreeing with anything. What I have said, though, is that a fanboy calling someone else a fanboy is perhaps not indicative of any objective truth.
  • Jynx980 - Saturday, June 7, 2008 - link

    It will be a great day when I can read any CPU discussion without the word fanboy in it.

    The close up of the chip has waaaaaay to much thermal compound on it.

    Is it just me or is the first pic of the Intel roadmap rather... phallic?

Log in

Don't have an account? Sign up now