Faster Unaligned Cache Accesses & 3D Rendering Performance

3dsmax r9

Our benchmark, as always, is the SPECapc 3dsmax 8 test but for the purpose of this article we only run the CPU rendering tests and not the GPU tests.

3dsmax 9

The results are reported as render times in seconds and the final CPU composite score is a weighted geometric mean of all of the test scores.

CPU / 3dsmax Score Breakdown Radiosity Throne Shadowmap CBALLS2 SinglePipe2 Underwater SpaceFlyby UnderwaterEscape
Nehalem (2.66GHz) 12.891s 11.193s 5.729s 20.771s 24.112s 30.66s 27.357s
Penryn (2.66GHz) 19.652s 14.186s 13.547s 30.249s 32.451s 33.511s 31.883s


The CBALLS2 workload is where we see the biggest speedup with Nehalem, performance more than doubles. It turns out that CBALLS2 calls a function in the Microsoft C Runtime Library (msvcrt.dll) that can magnify the Core architecture's performance penalty when accessing data that is not aligned with cache line boundaries. Through some circuit tricks, Nehalem now has significantly lower latency unaligned cache accesses and thus we see a huge improvement in the CBALLS2 score here. The CBALLS2 workload is the only one within our SPECapc 3dsmax test that really stresses the unaligned cache access penalty of the current Core architecture, but there's a pretty strong performance improvement across the board in 3dsmax.

Nehalem is just over 40% faster than Penryn, clock for clock, in 3dsmax.

Cinebench R10

A benchmarking favorite, Cinebench R10 is designed to give us an indication of performance in the Cinema 4D rendering application.

Cinebench R10

Cinebench also shows healthy gains with Nehalem, performance went up 20% clock for clock over Penryn.

We also ran the single-threaded Cinebench test to see how performance improved on an individual core basis vs. Penryn (Updated: The original single-threaded Penryn Cinebench numbers were incorrect, we've included the correct ones):

Cinebench R10 - Single Threaded Benchmark

Cinebench shows us only a 2% increase in core-to-core performance from Penryn to Nehalem at the same clock speed. For applications that don't go out to main memory much and can stay confined to a single core, Nehalem behaves very much like Penryn. Remember that outside of the memory architecture and HT tweaks to the core, Nehalem's list of improvements are very specific (e.g. faster unaligned cache accesses).

The single thread to multiple thread scaling of Penryn vs. Nehalem is also interesting:

 Cinebench R10 1 Thread N-Threads Speedup
Nehalem (2.66GHz) 3015 12596 4.18x
Core 2 Quad Q9450 - Penryn - (2.66GHz) 2931 10445 3.56x

 

The speedup confirms what you'd expect in such a well threaded FP test like Cinebench, Nehalem manages to scale better thanks to Hyper Threading. If Nehalem had the same 3.56x scaling factor that we saw with Penryn it would score a 10733, virtually inline with Penryn. It's Hyper Threading that puts Nehalem over the edge and accounts for the rest of the gain here.

While many 3D rendering and video encoding tests can take at least some advantage of more threads, what about applications that don't? One aspect of Nehalem's performance we're really not stressing much here is its IMC performance since most of these benchmarks ended up being more compute intensive. Where HT doesn't give it the edge, we can expect some pretty reasonable gains from Nehalem's IMC alone. The Nehalem we tested here is crippled in that respect thanks to a premature motherboard, but gains on the order of 20% in single or lightly threaded applications is a good expectation to have.

 

POV-Ray 3.7 Beta 24

POV-Ray is a popular raytracer, also available with a built in benchmark. We used the 3.7 beta which has SMP support and ran the built in multithreaded benchmark.

POV-Ray 3.7 Beta 24

Finally POV-Ray echoes what we've seen elsewhere, with a 36% performance improvement over the 2.66GHz Core 2 Q9450. Note that Nehalem continues to be faster than even the fastest Penryns available today, despite the lower clock speed of this early sample.

Nehalem's Media Encoding Performance Power Consumption
Comments Locked

108 Comments

View All Comments

  • Poepstamper - Thursday, June 5, 2008 - link

    im not a fanboy but i like AMD better,i dont like big corporations anyways.
    but im pretty worried if AMD has no answer to this,then we would have to pay lots more for a processor.
  • Genx87 - Thursday, June 5, 2008 - link

    Being an AMD fan and sometimes fanboi over the past 12 years. My last major game rig build was a Core 2 Duo 6600. I did an upgrade 3 weeks ago with an E8400. I built a new computer for a friend who has had AMD chips since 1999 with an E7200.

    AMD needs to start making a show.
  • NullSubroutine - Thursday, June 5, 2008 - link

    First off, I am not a fan of either company, just to get that out of the way.

    You do realize that Nehalem is not or will not be a mainstream product for quite some time into 2009. Enthusiast may get a few chips in limited quantities, probably in the $1500+ range. Otherwise this is designed to be a high end Server processor. It will take some time for it to trickle down to be something most average people will buy and use.

    Intel is hitting back the same way AMD hit at Intel back with the K8. Making a great scaleable high performance server chip and letting it trickle its way down to the mainstream market.

    Trying to compare Nehalem to any AMD processor (or even most Core2/quad) is like trying to compare a Chevy Mailibu to a Formula 1 race car, its just not the same thing.

    What is exciting about Nehalem right now is the technological advancement of some of the stuff Intel has done, and the happiness that it will one day be availble mainstream.

    AMD is not going to be put in a bad position (other than the one its already in) in the mainstream desktop market with Nehalem - not for probably near another year. It will hurt it in the Server market, but at first Intel wont have many of these chips availible, so AMD will have a minor chance with a Shanghai or Bulldozer core - if they can actually execute a launch.

    AMD is also not trying to stay equal with Intel, it doesnt have the resources to do it. You are likely, in any near time frame, going to see AMD come out and just PWN Intel in performance numbers. You will see AMD put together what they call a good 'platform' meaning. You can buy your whole platform, MB/CPU/GPU/etc from AMD and it will be a solid platform.

    It's not going to win bragging rights to a bunch of 'nerds' running gaming websites claming how AMD sucks so much. You will probably see that actually, people saying how 1500 dollar processor pwns some 200 dollar one. AMD isnt currently trying to win performance crowns or win over enthusiast that spend boat loads of money on a CPU (or GPU) they are trying to push the mainstream market, which actually has the largest number of people to sell to. However I am sure they would like to keep their server side doing well (it makes good margins).

    I don't think you are going to see an AMD come back to any pure performance crowns. You may some crowns for price/performance/power for the whole platform.
  • NullSubroutine - Thursday, June 5, 2008 - link

    supposed to say You are NOT likely to see AMD come PWN Intel...

    And you could compare 8 series Opteron to Nehalem...
  • AmberClad - Thursday, June 5, 2008 - link

    That picture of the socket -- I only recall a single board with that colored PCB in the INQ's coverage of the wall of Nehalem boards. Maybe that picture is giving away more than intended, as far as the identity of the company that provided the sample? (I suppose it's possible that whoever leaked the Nehalem sample isn't the same person that provided the motherboard.)
  • RaynorWolfcastle - Thursday, June 5, 2008 - link

    These benches are mighty impressive for such an immature platform. There had better be some serious performance and clock speed bumps in store for AMD's K10.5 or they will be dead in the water.

    Also, is there any indication as to when Intel will start transitioning Nehalem to the mobile space? I have a 1st gen (Core Duo) MacBook Pro that's getting a little long in the tooth and I'm debating whether to jump on the Montevina train or wait for Nehalem mobile. I'd love to get a mobile Nehalem if it launches any time in 1H09.
  • emboss - Friday, June 6, 2008 - link

    Non-Extremely-Expensive-Edition single-socket Nehalems now aren't coming until sometime in 2H09, so you'll probably be lucky to see any mobile Nehalems in 2009 at all.

    As such, I'd say Intel failed to tick on time. Conroe hit mainstream July 06 (eg: E6400). We should be getting mainstream Nehalems in 1 month, not 12+ months.

    Then again, Intel has been futzing around with the release dates quite a bit, so it may get pulled forward.
  • piroroadkill - Thursday, June 5, 2008 - link

    Because Nehalem is frankly so much faster than the already rapid Core 2, that as already said, AMD is going to be struggling for a long time to come.

    Unless some miracle occurs, all I know is right now I want Nehalem.
  • TonyB - Thursday, June 5, 2008 - link

    can it play crysis
  • PeteRoy - Thursday, June 5, 2008 - link

    Exactly, where are the gaming performance? It's the first thing I care about by a long way.

    I don't do all the other stuff you benched on my PC.

Log in

Don't have an account? Sign up now