Nehalem.

Nuh - hay - lem

At least that's how Intel PR pronounces it.

I've been racking my brain for the past month on how best to review this thing, what angle to take, it's tough. You see, with Conroe the approach was simple: the Pentium 4 was terrible, AMD proudly wore its crown and Intel came in and turned everyone's world upside down. With Nehalem, the world is fine, it doesn't need fixing. AMD's pricing is quite competitive, Intel's performance is solid, power consumption isn't getting out of control...things are nice.

But we've got that pesky tick-tock cadence and things have to change for the sake of change (or more accurately, technological advancement, I swear I'm not getting cynical in my old age):


2008, that's us, that's Nehalem.

Could Nehalem ever be good enough? It's the first tock after Conroe, that's like going on stage after the late Richard Pryor, it's not an enviable position to be in. Inevitably Nehalem won't have the same impact that Conroe did, but what could Intel possibly bring to the table that it hasn't already?

Let's go ahead and get started, this is going to be interesting...

Nehalem's Architecture - A Recap

I spent 15 pages and thousands of words explaining Intel's Nehalem architecture in detail already, but what I'm going to try and do now is summarize that in a page. If you want greater detail please consult the original article, but here are the cliff's notes.


Nehalem

Nehalem, as I've mentioned countless times before, is a "tock" processor in Intel's tick-tock cadence. That means it's a new microarchitecture but based on an existing manufacturing process, in this case 45nm.

A quad-core Nehalem is made up of 731M transistors, down from 820M in Yorkfield, the current quad-core Core 2s based on the Penryn microarchitecture. The die size has gone up however, from 214 mm^2 to 263 mm^2. That's fewer transistors but less densely packed ones, part of this is due to a reduction in cache size and part of it is due to a fundamental rearchitecting of the microprocessor.

Nehalem is Intel's first "native" quad-core design, meaning that all four cores are a part of one large, monolithic die. Each core has its own L1 and L2 caches, and all four sit behind a large 8MB L3 cache. The L1 cache remains unchanged from Penryn (the current 45nm Core 2 architecture), although it is slower at 4 cycles vs. 3. The L2 cache gets a little faster but also gets a lot smaller at 256KB per core, whereas the lowest end Penryns split 3MB of L2 among two cores. The L3 cache is a new addition and serves as a common pool that all four cores can access, which will really help in cache intensive multithreaded applications (such as those you'd encounter in a server). Nehalem also gets a three-channel, on-die DDR3 memory controller, if you haven't heard by now.

At the core level, everything gets deeper in Nehalem. The CPU is just as wide as before and the pipeline stages haven't changed, but the reservation station, load and store buffers and OoO scheduling window all got bigger. Peak execution power hasn't gone up, but Nehalem should be much more efficient at using its resources than any Core microarchitecture before it.

Once again to address the server space Nehalem increases the size of its TLBs and adds a new 2nd level unified TLB. Branch prediction is also improved, but primarily for database applications.

Hyper Threading is back in its typical 2-way fashion, so a single quad-core Nehalem can work on 8 threads at once. Here we have yet another example of Nehalem making more efficient use of the execution resources rather than simply throwing more transistors at the problem. With Penryn Intel hit nearly 1 billion transistors for a desktop quad-core chip, clearly Nehalem was an attempt to both address the server market and make more efficient use of those transistors before the next big jump and crossing the billion transistor mark.

Multiple Clock Domains and My Concern
POST A COMMENT

74 Comments

View All Comments

  • npp - Tuesday, November 04, 2008 - link

    Well, the funny thing is THG got it all messed up, again - they posted a large "CRIPPLED OVERCKLOCKING" article yesterday, and today I saw a kind of apology from them - they seem to have overlooked a simple BIOS switch that prevents the load through the CPU from rising above 100A. Having a month to prepare the launch article, they didn't even bother to tweak the BIOS a bit. That's why I'm not taking their articles seriously, not because they are biased towards Intel ot AMD - they are simply not up to the standars (especially those here @anandtech). Reply
  • gvaley - Tuesday, November 04, 2008 - link

    Now give us those 64-bit benchmarks. We already knew that Core i7 will be faster than Core 2, we even knew how much faster.
    Now, it was expected that 64-bit performance will be better on Core i7 that on Core 2. Is that true? Draw a parallel between the following:

    Performance jump from 32- to 64-bit on Core 2
    vs.
    Performance jump from 32- to 64-bit on Core i7
    vs.
    Performance jump from 32- to 64-bit on Phenom
    Reply
  • badboy4dee - Tuesday, November 04, 2008 - link

    and what's those numbers on the charts there? Are they frames per second? high is better then if thats what they are. Charts need more detail or explanation to them dude!

    TSM
    Reply
  • MarchTheMonth - Tuesday, November 04, 2008 - link

    I don't believe I saw this anywhere else, but the spots for the cooler on the Mobo, they the same as like the LGA 775, i.e. can we use (non-Intel) coolers that exist now for the new socket? Reply
  • marc1000 - Tuesday, November 04, 2008 - link

    no, the new socket is different. the holes are 80mm far from each other, on socket 775 it was 72mm away. Reply
  • Agitated - Tuesday, November 04, 2008 - link

    Any info on whether these parts provide an improvement on virtualized workloads or maybe what the various vm companies have planned for optimizing their current software for nehalem? Reply
  • yyrkoon - Tuesday, November 04, 2008 - link

    Either I am not reading things correctly, or the 130W TDP does not look promising for the end user such as myself that requires/wants a low powered high performance CPU.

    The future in my book is using less power, not more, and Intel does not right now seem to be going in this direction. To top things off, the performance increase does not seem to be enough to justify this power increase.

    Being completely off grid(100% solar / wind power), there seem to be very few options . . . I would like to see this change. Right now as it stands, sticking with the older architecture seems to make more sense.
    Reply
  • 3DoubleD - Tuesday, November 04, 2008 - link

    130W TDP isn't much worse for previous generations of quad core processors which were ~100W TDP. Also, TDP isn't a measure of power usage, but of the required thermal dissipation of a system to maintain an operating temperature below an set value (eg. Tjmax). So if Tjmax is lower for i7 processors than it is for past quad cores, it may use the same amount of power, but have a higher TDP requirement. The article indicates that power draw has increased, but usually with a large increase in performance. Page 9 of the article has determined that this chip has a greater performance/watt than its predecessors by a significant margin.

    If you are looking for something that is extremely low power, you shouldn't be looking at a quad core processor. Go buy a laptop (or an EeePC-type laptop with an Atom processor). Intel has kept true to its promise of 2% performance increase for every 1% power increase (eg. a higher performance per watt value).

    Also, you would probably save more power overall if you just hibernate your computer when you aren't using it.
    Reply
  • Comdrpopnfresh - Monday, November 03, 2008 - link

    Do differing cores have access to another's L2? Is it directly, through QPI, or through L3?
    Also, is the L2 inclusive in the L3; does the L3 contain the L2 data?
    Reply
  • xipo - Monday, November 03, 2008 - link

    I know games are not the strong area of nehalem, but there are 2 games i'd like to see tested. Unreal T. 3 and Half Life 2 E2.. just to know how does nehalem handles those 2 engines ;D Reply

Log in

Don't have an account? Sign up now