Nehalem.

Nuh - hay - lem

At least that's how Intel PR pronounces it.

I've been racking my brain for the past month on how best to review this thing, what angle to take, it's tough. You see, with Conroe the approach was simple: the Pentium 4 was terrible, AMD proudly wore its crown and Intel came in and turned everyone's world upside down. With Nehalem, the world is fine, it doesn't need fixing. AMD's pricing is quite competitive, Intel's performance is solid, power consumption isn't getting out of control...things are nice.

But we've got that pesky tick-tock cadence and things have to change for the sake of change (or more accurately, technological advancement, I swear I'm not getting cynical in my old age):


2008, that's us, that's Nehalem.

Could Nehalem ever be good enough? It's the first tock after Conroe, that's like going on stage after the late Richard Pryor, it's not an enviable position to be in. Inevitably Nehalem won't have the same impact that Conroe did, but what could Intel possibly bring to the table that it hasn't already?

Let's go ahead and get started, this is going to be interesting...

Nehalem's Architecture - A Recap

I spent 15 pages and thousands of words explaining Intel's Nehalem architecture in detail already, but what I'm going to try and do now is summarize that in a page. If you want greater detail please consult the original article, but here are the cliff's notes.


Nehalem

Nehalem, as I've mentioned countless times before, is a "tock" processor in Intel's tick-tock cadence. That means it's a new microarchitecture but based on an existing manufacturing process, in this case 45nm.

A quad-core Nehalem is made up of 731M transistors, down from 820M in Yorkfield, the current quad-core Core 2s based on the Penryn microarchitecture. The die size has gone up however, from 214 mm^2 to 263 mm^2. That's fewer transistors but less densely packed ones, part of this is due to a reduction in cache size and part of it is due to a fundamental rearchitecting of the microprocessor.

Nehalem is Intel's first "native" quad-core design, meaning that all four cores are a part of one large, monolithic die. Each core has its own L1 and L2 caches, and all four sit behind a large 8MB L3 cache. The L1 cache remains unchanged from Penryn (the current 45nm Core 2 architecture), although it is slower at 4 cycles vs. 3. The L2 cache gets a little faster but also gets a lot smaller at 256KB per core, whereas the lowest end Penryns split 3MB of L2 among two cores. The L3 cache is a new addition and serves as a common pool that all four cores can access, which will really help in cache intensive multithreaded applications (such as those you'd encounter in a server). Nehalem also gets a three-channel, on-die DDR3 memory controller, if you haven't heard by now.

At the core level, everything gets deeper in Nehalem. The CPU is just as wide as before and the pipeline stages haven't changed, but the reservation station, load and store buffers and OoO scheduling window all got bigger. Peak execution power hasn't gone up, but Nehalem should be much more efficient at using its resources than any Core microarchitecture before it.

Once again to address the server space Nehalem increases the size of its TLBs and adds a new 2nd level unified TLB. Branch prediction is also improved, but primarily for database applications.

Hyper Threading is back in its typical 2-way fashion, so a single quad-core Nehalem can work on 8 threads at once. Here we have yet another example of Nehalem making more efficient use of the execution resources rather than simply throwing more transistors at the problem. With Penryn Intel hit nearly 1 billion transistors for a desktop quad-core chip, clearly Nehalem was an attempt to both address the server market and make more efficient use of those transistors before the next big jump and crossing the billion transistor mark.

Multiple Clock Domains and My Concern
POST A COMMENT

74 Comments

View All Comments

  • Spectator - Monday, November 03, 2008 - link

    that sht is totally logical.

    And Im proper impressed. I would do that.

    you can re-process your entire stock at whim to satisfy the current market. that sht deserves some praise, even more so when die shrinks happen. Its an apparently seemless transition. Unless world works it out and learns how to mod existing chips?

    Chukkle. but hey im drunk; and I dont care. I just thought that would be a logical step. Im still waiting for cheap SSD's :P

    Spectator.
    Reply
  • tential - Monday, November 03, 2008 - link

    We already knew nehalem wasn't going to be that much of a game changer. The blog posts you guys had up weeks ago said that because of the cache sizes and stuff not to expect huge gains in performance of games if any. However because of hyperthreading I think there also needs to be some tests to see how multi tasking goes. No doubt those gains will be huge. Virus scanning while playing games and other things should have extremely nice benefits you would think. Those tests would be most interesting although when I buy my PC nehalem will be mainstream. Reply
  • npp - Monday, November 03, 2008 - link

    I'm very curious to see some scientific results from the new CPUs, MATLAB and Mathematica benchmarks, and maybe some more. It's interesting to see if Core i7 can deliver something on these fronts, too. Reply
  • pervisanathema - Monday, November 03, 2008 - link

    I was afraid Nehalem was going to be a game changer. My wallet is grateful that its overall performance gains do not even come close to justifying dumping my entire platform. My x3350 @ 3.6GHz will be just fine for quite some time yet. :)

    Additionally, its relatively high price means that AMD can still be competitive in the budget to low mid range market which is good for my wallet as well. Intel needs competition.
    Reply
  • iwodo - Monday, November 03, 2008 - link

    Since there are virtually no performance lost when using Dual Channel. Hopefully we will see some high performance DDR3 with low Latency next year?
    And which means apart from having half the core, Desktop version doesn't look so bad.

    And since you state the Socket 1366 will be able to sit a Eight Core inside, i expect the 11xx socket will be able to suit a Quad Core as well?

    So why we dont just have 13xx Socket to fit it all? Is the cost really that high?
    Reply
  • QChronoD - Monday, November 03, 2008 - link

    How long are they going to utilize this new socket??
    $284 for the i7-920 isn't bad, but will it be worth the extra to buy a top end board that will appreciate a CPU upgrade 1-2 years later? Or is this going to be useless once Intel Ticks in '10?
    Reply
  • steveyballme - Monday, November 03, 2008 - link

    We worked side by side with Intel to be sure that Vista was optimised for running on this thing!

    http://fakesteveballmer.blogspot.com">http://fakesteveballmer.blogspot.com
    Reply
  • Strid - Monday, November 03, 2008 - link

    Great article. I enjoyed reading it. One thing I stumbled upon though.

    "The PS/2 keyboard port is a nod to the overclocking crowd as is the clear CMOS switch."

    What makes a PS/2 port good for overclockers? I see the use for the clear CMOS switch, but ...
    Reply
  • 3DoubleD - Monday, November 03, 2008 - link

    In my experience USB keyboards do not consistently allow input during the POST screen. If you are overclocking and want to enter the BIOS or cancel an overclock you need a keyboard that works immediately once the POST screen appears. I've been caught with only a USB keyboard and I got stuck with a bad overclock and had to reset the CMOS to gain control back because I couldn't cancel the overclock. Reply
  • Clauzii - Monday, November 03, 2008 - link

    I thought the "USB Legacy support" mode was for exactly that? So legacy mode is for when the PC are booted in DOS, but not during pre? Reply

Log in

Don't have an account? Sign up now