Nehalem's Weakness: Cache

Intel opted for a very Opteron-like cache hierarchy with Nehalem, each core gets a small L2 cache and they all sit behind one large, shared L3 cache. This sort of a setup benefits large codebase applications that are also well threaded, for example the type of things you'd encounter in a database server. The problem is that the CPU launching today, the Core i7, is designed to be used in a desktop.

Let's look at a quick comparison between Nehalem and Penryn's cache setups:

  Intel Nehalem Intel Penryn
L1 Size / L1 Latency 64KB / 4 cycles 64KB / 3 cycles
L2 Size / L2 Latency 256KB / 11 cycles 6MB* / 15 cycles
L3 Size / L3 Latency 8MB / 39 cycles N/A
Main Memory Latency (DDR3-1600 CAS7) 107 cycles (33.4 ns) 160 cycles (50.3 ns)

*Note 6MB per 2 cores

Nehalem's L2 cache does get a bit faster, but the speed doesn't make up for the lack of size. I suspect that Intel will address the L2 size issue with the 32nm shrink, but until then most applications will have to deal with a significantly reduced L2 cache size per core. The performance impact is mitigated by two things: 1) the fast L3 cache, and 2) the very fast on die memory controller. Fortunately for Nehalem, most applications can't fit entirely within cache and thus even the large 6MB and 12MB L2 caches of its predecessors can't completely contain everything, thus giving Nehalem's L3 cache and memory controller time to level the playing field.

The end result, as you'll soon see, is that in some cases Nehalem's architecture manages to take two steps forward, and two steps back, resulting a zero net improvement over Penryn. The perfect example is 3D gaming as you can see below:

  Intel Nehalem (3.2GHz) Intel Penryn (3.2GHz)
Age of Conan 123 fps 107.9 fps
Race Driver GRID 102.9 fps 103 fps
Crysis 40.5 fps 41.7 fps
Farcry 2 115.1 fps 102.6 fps
Fallout 3 83.2 fps 77.2 fps

 

Age of Conan and Fallout 3 show significant improvements in performance when not GPU bound, while Crysis and Race Driver GRID offer absolutely no benefit to Nehalem. It's almost Prescott-like in that Intel put in a lot of architectural innovation into a design that can, at times, offer no performance improvement over its predecessor. Where Nehalem fails to be like Prescott is in that it can offer tremendous performance increases and it's on the very opposite end of the power efficiency spectrum, but we'll get to that in a moment.

The Chips Understanding Nehalem's Memory Architecture
Comments Locked

73 Comments

View All Comments

  • npp - Tuesday, November 4, 2008 - link

    Well, the funny thing is THG got it all messed up, again - they posted a large "CRIPPLED OVERCKLOCKING" article yesterday, and today I saw a kind of apology from them - they seem to have overlooked a simple BIOS switch that prevents the load through the CPU from rising above 100A. Having a month to prepare the launch article, they didn't even bother to tweak the BIOS a bit. That's why I'm not taking their articles seriously, not because they are biased towards Intel ot AMD - they are simply not up to the standars (especially those here @anandtech).
  • gvaley - Tuesday, November 4, 2008 - link

    Now give us those 64-bit benchmarks. We already knew that Core i7 will be faster than Core 2, we even knew how much faster.
    Now, it was expected that 64-bit performance will be better on Core i7 that on Core 2. Is that true? Draw a parallel between the following:

    Performance jump from 32- to 64-bit on Core 2
    vs.
    Performance jump from 32- to 64-bit on Core i7
    vs.
    Performance jump from 32- to 64-bit on Phenom
  • badboy4dee - Tuesday, November 4, 2008 - link

    and what's those numbers on the charts there? Are they frames per second? high is better then if thats what they are. Charts need more detail or explanation to them dude!

    TSM
  • MarchTheMonth - Tuesday, November 4, 2008 - link

    I don't believe I saw this anywhere else, but the spots for the cooler on the Mobo, they the same as like the LGA 775, i.e. can we use (non-Intel) coolers that exist now for the new socket?
  • marc1000 - Tuesday, November 4, 2008 - link

    no, the new socket is different. the holes are 80mm far from each other, on socket 775 it was 72mm away.
  • Agitated - Tuesday, November 4, 2008 - link

    Any info on whether these parts provide an improvement on virtualized workloads or maybe what the various vm companies have planned for optimizing their current software for nehalem?
  • yyrkoon - Tuesday, November 4, 2008 - link

    Either I am not reading things correctly, or the 130W TDP does not look promising for the end user such as myself that requires/wants a low powered high performance CPU.

    The future in my book is using less power, not more, and Intel does not right now seem to be going in this direction. To top things off, the performance increase does not seem to be enough to justify this power increase.

    Being completely off grid(100% solar / wind power), there seem to be very few options . . . I would like to see this change. Right now as it stands, sticking with the older architecture seems to make more sense.
  • 3DoubleD - Tuesday, November 4, 2008 - link

    130W TDP isn't much worse for previous generations of quad core processors which were ~100W TDP. Also, TDP isn't a measure of power usage, but of the required thermal dissipation of a system to maintain an operating temperature below an set value (eg. Tjmax). So if Tjmax is lower for i7 processors than it is for past quad cores, it may use the same amount of power, but have a higher TDP requirement. The article indicates that power draw has increased, but usually with a large increase in performance. Page 9 of the article has determined that this chip has a greater performance/watt than its predecessors by a significant margin.

    If you are looking for something that is extremely low power, you shouldn't be looking at a quad core processor. Go buy a laptop (or an EeePC-type laptop with an Atom processor). Intel has kept true to its promise of 2% performance increase for every 1% power increase (eg. a higher performance per watt value).

    Also, you would probably save more power overall if you just hibernate your computer when you aren't using it.
  • Comdrpopnfresh - Monday, November 3, 2008 - link

    Do differing cores have access to another's L2? Is it directly, through QPI, or through L3?
    Also, is the L2 inclusive in the L3; does the L3 contain the L2 data?
  • xipo - Monday, November 3, 2008 - link

    I know games are not the strong area of nehalem, but there are 2 games i'd like to see tested. Unreal T. 3 and Half Life 2 E2.. just to know how does nehalem handles those 2 engines ;D

Log in

Don't have an account? Sign up now