Faster Unaligned Cache Accesses & 3D Rendering Performance

3dsmax r9

Our benchmark, as always, is the SPECapc 3dsmax 8 test but for the purpose of this article we only run the CPU rendering tests and not the GPU tests.

3dsmax 9

The results are reported as render times in seconds and the final CPU composite score is a weighted geometric mean of all of the test scores.

CPU / 3dsmax Score Breakdown Radiosity Throne Shadowmap CBALLS2 SinglePipe2 Underwater SpaceFlyby UnderwaterEscape
Nehalem (2.66GHz) 12.891s 11.193s 5.729s 20.771s 24.112s 30.66s 27.357s
Penryn (2.66GHz) 19.652s 14.186s 13.547s 30.249s 32.451s 33.511s 31.883s


The CBALLS2 workload is where we see the biggest speedup with Nehalem, performance more than doubles. It turns out that CBALLS2 calls a function in the Microsoft C Runtime Library (msvcrt.dll) that can magnify the Core architecture's performance penalty when accessing data that is not aligned with cache line boundaries. Through some circuit tricks, Nehalem now has significantly lower latency unaligned cache accesses and thus we see a huge improvement in the CBALLS2 score here. The CBALLS2 workload is the only one within our SPECapc 3dsmax test that really stresses the unaligned cache access penalty of the current Core architecture, but there's a pretty strong performance improvement across the board in 3dsmax.

Nehalem is just over 40% faster than Penryn, clock for clock, in 3dsmax.

Cinebench R10

A benchmarking favorite, Cinebench R10 is designed to give us an indication of performance in the Cinema 4D rendering application.

Cinebench R10

Cinebench also shows healthy gains with Nehalem, performance went up 20% clock for clock over Penryn.

We also ran the single-threaded Cinebench test to see how performance improved on an individual core basis vs. Penryn (Updated: The original single-threaded Penryn Cinebench numbers were incorrect, we've included the correct ones):

Cinebench R10 - Single Threaded Benchmark

Cinebench shows us only a 2% increase in core-to-core performance from Penryn to Nehalem at the same clock speed. For applications that don't go out to main memory much and can stay confined to a single core, Nehalem behaves very much like Penryn. Remember that outside of the memory architecture and HT tweaks to the core, Nehalem's list of improvements are very specific (e.g. faster unaligned cache accesses).

The single thread to multiple thread scaling of Penryn vs. Nehalem is also interesting:

 Cinebench R10 1 Thread N-Threads Speedup
Nehalem (2.66GHz) 3015 12596 4.18x
Core 2 Quad Q9450 - Penryn - (2.66GHz) 2931 10445 3.56x

 

The speedup confirms what you'd expect in such a well threaded FP test like Cinebench, Nehalem manages to scale better thanks to Hyper Threading. If Nehalem had the same 3.56x scaling factor that we saw with Penryn it would score a 10733, virtually inline with Penryn. It's Hyper Threading that puts Nehalem over the edge and accounts for the rest of the gain here.

While many 3D rendering and video encoding tests can take at least some advantage of more threads, what about applications that don't? One aspect of Nehalem's performance we're really not stressing much here is its IMC performance since most of these benchmarks ended up being more compute intensive. Where HT doesn't give it the edge, we can expect some pretty reasonable gains from Nehalem's IMC alone. The Nehalem we tested here is crippled in that respect thanks to a premature motherboard, but gains on the order of 20% in single or lightly threaded applications is a good expectation to have.

 

POV-Ray 3.7 Beta 24

POV-Ray is a popular raytracer, also available with a built in benchmark. We used the 3.7 beta which has SMP support and ran the built in multithreaded benchmark.

POV-Ray 3.7 Beta 24

Finally POV-Ray echoes what we've seen elsewhere, with a 36% performance improvement over the 2.66GHz Core 2 Q9450. Note that Nehalem continues to be faster than even the fastest Penryns available today, despite the lower clock speed of this early sample.

Nehalem's Media Encoding Performance Power Consumption
Comments Locked

108 Comments

View All Comments

  • SiliconDoc - Monday, July 28, 2008 - link

    Oh yeah, and we're getting the knocked down lesser pins version probably, though not set in stone they won't be able to resist bending us all over and making all the massive die and tool and cuting restructurings required to pump out the lesser pinned models... while they tell us "it's cost effective" ( means they can charge 18 different rates and swirl the markets in confusion and gigantic price differences for mere few percentage performance differences).
    They sure have a lot of time to diggle around with it all, don't they- and a lot of capacity, a lot of marketers, a lot of board makers/changers...
    Oh gawd it's a multi-tentacled monster... just realize they had their group megaspam session and have figured the most confusing, confounding, and master profiteering into it all. It's got nothing to do with practicality or delivering us the performance we desire. NOTHING.
  • gochichi - Friday, June 6, 2008 - link

    Someone mentioned the breaking laws in the past (intel did).

    Just look at the distress that AMD is under. While they had the superior products, they couldn't make deals with Dell and so on. As soon as they were finally able to make deals fairly, Intel obliterated them on performance.

    So while they should have been piling up an R&D fund during their "crown years" they hardly grew. To the extent that even thought their CPUs are not competitive they are still growing in overall market share.

    I gotta balance my desire for performance now, and my ongoing desire for performance. I can't imagine how having AMD wiped out would be good for the long term. Performance is moving up surely enough but why can't we have the full rate of improvement? I mean, lets stop poluting the world with obsolete brand new equipment. I think the legal battle between Intel and AMD prevents Intel from eliminating AMD. The more they beat up on AMD, the higher the damages of their breaking the law and the higher the penalty for Intel.

    I think AMD can make a strong comeback though. They had a sloppy start with the AMD-ATI merger but ATI is actually not far at all from NVIDIA in terms of design and performance. These pendulums do swing, and perhaps AMDs chips will be better next time. I think the price-point wars are the most important. If you can deliver a nice quad-core or 3x core for about $100.00 you're gonna be in business or at least have market share.

  • BSMonitor - Friday, June 6, 2008 - link

    Giving a company incentives to exclusively sell your products is not a violation of any law. Aka, is E.A. Sports in violation of the law by signing an exclusive contract with the NFLPA ? No. How many GM dealers sell more than GM lines of cars? Not many... There are many other reasons to be excluse besides a "monopoly deal".

    Were Dell customers complaining about not having the choice of AMD processors? Not enough of them, clearly. You think for a second Dell would lose market share for Intel? Sorry, the answer is Hell No.

    When AMD did have a strong processor lineup, they also hit manufacturing capacity walls.... Quite simply, AMD does not have the capacity to fill Intel's market share. Its not like there were AMD processors on the shelves because Dell was exclusively Intel...

    Intel has more Fabs. Fabs don't get built overnight to meet demand... Now, AMD has inferior products and a couple more Fabs... Too little too late as they say...
  • hs635 - Tuesday, June 17, 2008 - link

    Get aids and die painfully cunt
  • Justin Case - Sunday, June 8, 2008 - link

    [quote]Giving a company incentives to exclusively sell your products is not a violation of any law.[/quote]

    Actually, it is, if you control more than a certain share (typically 50%) of the market.

    You can give volume discounts but you cannot make the cost depend on what other products your client sells.

    If you're under that "critical" market share, you can do pretty much anything you want. Above it, the rules change (and there are very good reasons for that, as anyone who's studied macroeconomy knows).

    There's really no need to come up with "examples" or ill-fitting "analogies". That's just the way the law is, and everyone who studied trade law knows that (including Intel's legal department). They've already been fined in Korea, they're on their way to being fined in the EU and Japan, and they'll probably be fined in the US too.

    Unless they bribe the right people like Microsoft did, of course.
  • SiliconDoc - Monday, July 28, 2008 - link

    I caught a couple articles on how Nvidia was hammering vendors for price structures - and how they were going to do it, a bit ahead of time of when it hit. Yeah, it hit, I saw it, eggs (hint) were broken all over the place.
    It's a kind of tyranny... lol
    Uhh, thank computers I guess, since they've made everything like that so easy to track and enforce ("private" enforcement not law enforcement)...
    Expect a lot more of it, too. Everything moves so fast in business, and courts move so slowly.
  • The Zerg - Friday, June 6, 2008 - link

    Guys... here's an example of bad luck, bad tech or both:

    I work in a corporation. A very large one, the largest in a specific industry.
    We use Intel-based CPUs. Worldwide.
    My Centrino (in its Dell Latitude incarnation) died two days ago (causes unknown - and this caused a lot of trouble). Be sure that I had some nice words for Intel in that moment.
    I use AMD at home (it was the best bang for the buck at that time). One week ago (and Hell YES, this is the bare truth) my ASUS motherboard died, together with an Athlon 3500+.
    See? Nobody's perfect. Maybe 2 strong CPU players (makers) are better than just one. Maybe I will not use an ASUS motherboard next time, because I have another 3-4 serious options...
    For the AMD/Intel fans: I am a Canon fan, but I really respect Nikon, Leica and Sony for their outstanding products. And: I can buy a 1Ds Mark III, but I currently own a 40D - "because I can 95% of the games with it"
    And there is never too little too late for a World Press Photo award :)
  • Barack Obama - Friday, June 6, 2008 - link

    Nehalem is looking to be beastly good. Let's see if it can combo well with Windows 7 and its multi-touch capabilities.
  • Egglick - Thursday, June 5, 2008 - link

    Here is my biggest question: Will these chips work with DDR2? In my opinion, DDR3 still isn't worth the price premium by a long shot.
  • coldpower27 - Friday, June 6, 2008 - link

    This shouldn't be much of an issue by the time this thing ships for mainstream platforms ala LGA1160, sometime in Early-Mid 2009.

    DDR3 is still cost prohibitive now, your looking at about 2x as much for the same amount of memory. However in 6-9 months prices can change alot.


Log in

Don't have an account? Sign up now