Nehalem's Weakness: Cache

Intel opted for a very Opteron-like cache hierarchy with Nehalem, each core gets a small L2 cache and they all sit behind one large, shared L3 cache. This sort of a setup benefits large codebase applications that are also well threaded, for example the type of things you'd encounter in a database server. The problem is that the CPU launching today, the Core i7, is designed to be used in a desktop.

Let's look at a quick comparison between Nehalem and Penryn's cache setups:

  Intel Nehalem Intel Penryn
L1 Size / L1 Latency 64KB / 4 cycles 64KB / 3 cycles
L2 Size / L2 Latency 256KB / 11 cycles 6MB* / 15 cycles
L3 Size / L3 Latency 8MB / 39 cycles N/A
Main Memory Latency (DDR3-1600 CAS7) 107 cycles (33.4 ns) 160 cycles (50.3 ns)

*Note 6MB per 2 cores

Nehalem's L2 cache does get a bit faster, but the speed doesn't make up for the lack of size. I suspect that Intel will address the L2 size issue with the 32nm shrink, but until then most applications will have to deal with a significantly reduced L2 cache size per core. The performance impact is mitigated by two things: 1) the fast L3 cache, and 2) the very fast on die memory controller. Fortunately for Nehalem, most applications can't fit entirely within cache and thus even the large 6MB and 12MB L2 caches of its predecessors can't completely contain everything, thus giving Nehalem's L3 cache and memory controller time to level the playing field.

The end result, as you'll soon see, is that in some cases Nehalem's architecture manages to take two steps forward, and two steps back, resulting a zero net improvement over Penryn. The perfect example is 3D gaming as you can see below:

  Intel Nehalem (3.2GHz) Intel Penryn (3.2GHz)
Age of Conan 123 fps 107.9 fps
Race Driver GRID 102.9 fps 103 fps
Crysis 40.5 fps 41.7 fps
Farcry 2 115.1 fps 102.6 fps
Fallout 3 83.2 fps 77.2 fps

 

Age of Conan and Fallout 3 show significant improvements in performance when not GPU bound, while Crysis and Race Driver GRID offer absolutely no benefit to Nehalem. It's almost Prescott-like in that Intel put in a lot of architectural innovation into a design that can, at times, offer no performance improvement over its predecessor. Where Nehalem fails to be like Prescott is in that it can offer tremendous performance increases and it's on the very opposite end of the power efficiency spectrum, but we'll get to that in a moment.

The Chips Understanding Nehalem's Memory Architecture
Comments Locked

73 Comments

View All Comments

  • Gary Key - Monday, November 3, 2008 - link

    "The 920 to 3.6/3.8 is a nice overclock but I wonder what you mean by proper cooling and how close you came to crossing the 80C "boundary"?"

    It was actually quite easy to do with the retail cooler, in fact in our multi-task test playing back a BD title while encoding a BD title, the core temps hit 98C. Cinebench multi-core test and OCCT both had the core temps hit 100C at various points. Our tests were in a closed case loaded out with a couple of HD4870 cards, two optical drives, three hard drives, and two case fans.

    Proper cooling (something we will cover shortly) consisted of the Thermalright Xtreme120, Vigor Monsoon II, and Cooler Master V8 along with the Freezone Elite. We were able to keep temps under 70C with a full load on air and around 45C with the Freezone unit.
  • Th3Eagle - Tuesday, November 4, 2008 - link

    Wow, thats interesting. Can't wait to see the new article. Always nice to see an article about coolers.

    Thanks for the reply.
  • Anand Lal Shimpi - Monday, November 3, 2008 - link

    Gary did the i7-920 tests so I'll let him chime in there, we're also working on an overclocking guide that should help address some of these concerns.

    -A
  • whatthehey - Monday, November 3, 2008 - link

    Tom's? You might as well reference HardOCP....

    Okay, THG sometimes gets things right, but I've seen far too many "expose" articles where they talk about the end of the world to take them seriously. Ever since the i820 chipset fiasco, they seem to think everything is a big deal that needs a whistle blower.

    Anandtech got 3.8GHz with an i7-920, and I would assume due diligence in performance testing (i.e. it's not just POSTing, but actually running benchmarks and showing a performance improvement). I'm still running an overclocked Q6600, though, and the 3.6GHz I've hit is really far more than I need most of the time. I should probalby run at 3.0GHz and shave 50-100W from my power use instead. But it's winter now, and with snow outside it's nice to have a little space heater by my feet!
  • The0ne - Monday, November 3, 2008 - link

    TomHardware and Anandtech were the one websites I visited 13 years ago during my college years. Tom's has since been pushed far down the list of "to visit sites" mainly due to their poor articles and their ad littered, poorly designed website. If you have any type of no-script enable there's quite a bit to enable to have the website working. The video commentary is a joke as they're not professionals to get the job done professionally...visually anyhow.

    Anandtech has stayed true to it's root and although I find some articles a bit confusing I don't mind them at all. Example of this are camera reviews :)
  • GaryJohnson - Monday, November 3, 2008 - link

    Geez, calling a core 2 a space heater. How soon we forget prescott...
  • JarredWalton - Monday, November 3, 2008 - link

    I think overclocked Core 2 Quad is still very capable of rating as a space heater. The chips can easily use upwards of 150W when overclocked, which if memory serves is far more than any of the Prescott chips did. After all, we didn't see 1000W PSUs back in the Prescott era, and in fact I had a 350W PSU running a Pentium D 920 at 3.4 GHz without any trouble. :-)
  • Griswold - Tuesday, November 4, 2008 - link

    Funny comparison. If it was just for the space heater arguments sake (well, 150W is by far not enough to qualify as a real space heater to be honest), I could follow you but saying the 150W of a 4 core, more-IPC-than-any-P4-can-ever-dream-of, processor should or could be compared to the wattage of the infamous thermonuclear furnace AKA prescott, is a bit of a long stretch, dont you think? :p
  • Ryan Smith - Monday, November 3, 2008 - link

    Intel can call it supercalifragilisticexpialidocious until they're blue in the face, but take it from a local, it's Neh-Hay-Lem. Just see how it's pronounced in this news segment:

    http://www.katu.com/outdoors/3902731.html?video=YH...">http://www.katu.com/outdoors/3902731.html?video=YH...
  • mjrpes3 - Monday, November 3, 2008 - link

    Any chance we'll see some database/apache benchmarks based on Nehalem soon?

Log in

Don't have an account? Sign up now