Nehalem's Weakness: Cache

Intel opted for a very Opteron-like cache hierarchy with Nehalem, each core gets a small L2 cache and they all sit behind one large, shared L3 cache. This sort of a setup benefits large codebase applications that are also well threaded, for example the type of things you'd encounter in a database server. The problem is that the CPU launching today, the Core i7, is designed to be used in a desktop.

Let's look at a quick comparison between Nehalem and Penryn's cache setups:

  Intel Nehalem Intel Penryn
L1 Size / L1 Latency 64KB / 4 cycles 64KB / 3 cycles
L2 Size / L2 Latency 256KB / 11 cycles 6MB* / 15 cycles
L3 Size / L3 Latency 8MB / 39 cycles N/A
Main Memory Latency (DDR3-1600 CAS7) 107 cycles (33.4 ns) 160 cycles (50.3 ns)

*Note 6MB per 2 cores

Nehalem's L2 cache does get a bit faster, but the speed doesn't make up for the lack of size. I suspect that Intel will address the L2 size issue with the 32nm shrink, but until then most applications will have to deal with a significantly reduced L2 cache size per core. The performance impact is mitigated by two things: 1) the fast L3 cache, and 2) the very fast on die memory controller. Fortunately for Nehalem, most applications can't fit entirely within cache and thus even the large 6MB and 12MB L2 caches of its predecessors can't completely contain everything, thus giving Nehalem's L3 cache and memory controller time to level the playing field.

The end result, as you'll soon see, is that in some cases Nehalem's architecture manages to take two steps forward, and two steps back, resulting a zero net improvement over Penryn. The perfect example is 3D gaming as you can see below:

  Intel Nehalem (3.2GHz) Intel Penryn (3.2GHz)
Age of Conan 123 fps 107.9 fps
Race Driver GRID 102.9 fps 103 fps
Crysis 40.5 fps 41.7 fps
Farcry 2 115.1 fps 102.6 fps
Fallout 3 83.2 fps 77.2 fps

 

Age of Conan and Fallout 3 show significant improvements in performance when not GPU bound, while Crysis and Race Driver GRID offer absolutely no benefit to Nehalem. It's almost Prescott-like in that Intel put in a lot of architectural innovation into a design that can, at times, offer no performance improvement over its predecessor. Where Nehalem fails to be like Prescott is in that it can offer tremendous performance increases and it's on the very opposite end of the power efficiency spectrum, but we'll get to that in a moment.

The Chips Understanding Nehalem's Memory Architecture
POST A COMMENT

74 Comments

View All Comments

  • Jingato - Monday, November 3, 2008 - link

    If the 920 can easily be overclocked to 3.8Ghz on air, what intensive is there to purchase the 965 for more that triple the price? Reply
  • TantrumusMaximus - Monday, November 3, 2008 - link

    I don't understand why the tests were on such low resolutions... most gamers are running higher res than 1280x1024 etc etc....

    What gives?
    Reply
  • daniyarm - Monday, November 3, 2008 - link

    Because if they ran gaming benchmarks at higher res, the difference in FPS would be hardly visible and you wouldn't go out and buy a new CPU.
    If they are going to show differences between Intel and AMD CPUs, show Nehalem at 3.2 GHz vs 9950 OC to 3.2 GHz so we can see clock for clock differences in performance and power.
    Reply
  • npp - Monday, November 3, 2008 - link

    9950 consumes about 30W more at idle than the 965XE, and 30W less under load. I guess that OC'ing it to 3,2Ghz will need more than 30W... Given that the 965 can process 4 more threads, I think the result should be more or less clear. Reply
  • tim851 - Monday, November 3, 2008 - link

    Higher resolutions stress the GPU more and it will become a bottleneck. Since the article was focussing on CPU power and not GPU power they were lowering the resolution enough to effectively take the GPU out of the picture. Reply
  • Caveman - Monday, November 3, 2008 - link

    It would be nice to see these CPU reviews use relevant "gaming" benchmarks. It would be good to see the results with something like MS flight simulator FSX or DCS Black Shark, etc... The flight simulators these days are BOTH graphically and calculation intensive, but really stress the CPU. Reply
  • AssBall - Monday, November 3, 2008 - link

    No, they don't, actually. Reply
  • philosofool - Monday, November 3, 2008 - link

    It would have been nice to see a proper comparison of power consumption. Given all of Intel's boast about being able to shut off cores to save power, I'd like to see some figures about exact savings. Reply
  • nowayout99 - Monday, November 3, 2008 - link

    Ditto, I was wondering about power too. Reply
  • Anand Lal Shimpi - Monday, November 3, 2008 - link

    Soon, soon my friend :)

    -A
    Reply

Log in

Don't have an account? Sign up now