Looking at Nehalem

Let’s start with this diagram:

What we have above is a single Nehalem core, note that you won’t actually be able to buy one of these as it doesn’t include the memory controller, L3 cache and all of what Intel considers to be the un-core. The diagram is drawn correctly; the execution engine isn’t even 1/3 of the core and nearly as much die area is dedicated to the out-of-order scheduling and retirement logic. Now you can understand why Atom is an in-order core.

A single Nehalem core isn’t made up of a majority of cache. Approximately 1/3 of the core is L1/L2 cache, 1/3 is the out of order execution engine and the remaining 1/3 is decode, the branch prediction logic, memory ordering and paging. Obviously this is a bit deceiving since you’re only looking at the core here, the un-core includes a massive 8MB L3 cache which changes the balance of things considerably:

Here we have a full quad-core Nehalem. What Intel calls the un-core is the L3 cache, the I/O, the memory controller logic and the Quick Path Interconnects (QPI). Desktop Nehalem processors will only have one QPI link (QPI 0) while server/workstation chips will have two (QPI 0 and QPI 1).

The Nehalem architecture is designed to be scalable and modular, you will see dual-core, quad-core and eight-core versions in 2009:

Some versions of Nehalem will also include a graphics core, it will be located in the "un-core" of Nehalem as you'll soon see. The graphics won’t be Larrabee based, it will simply be a derivative of the current G45 architecture.

Index Not Another Conroe


View All Comments

  • retardliquidator - Thursday, August 21, 2008 - link

    ... think again.

    more luck next time before starting the flamebait about not two bytes wide but 20bits.

    effective usable speed is exactly 2bytes, as with 10/8 coding you need 20bits to encode your 16 relevant ones.

    you fail at failing.
  • defter - Friday, August 22, 2008 - link

    Links are 20-bit wide, regardless of encoding or whether 1,2,8,16 or 20 bits are used to tranmist the data.

    I wonder who is flamebaiting here, a previous poster just mentioned the correct link width, he wasn't talking about "usable speed".
  • rbadger - Thursday, August 21, 2008 - link

    "Each QPI link is bi-directional supporting 6.4 GT/s per link. Each link is 2-bytes wide..."

    This is actually incorrect. Each link is 20 bits wide, not 16 (2 bytes). This information is on the slide posted directly below the paragraph.
  • JarredWalton - Thursday, August 21, 2008 - link

    It's 20-bits but using a standard 8/10 encoding mechanism, so of the 20 bits only 16 are used to transmit data and the other four bits are (I believe) for clock signaling and/or error correction. It's the same thing we see with SATA and HyperTransport. Reply
  • ltcommanderdata - Thursday, August 21, 2008 - link

    Since the PCU has a firmware, I wonder if it will be updatable? It would be useful if lessons learn in the power management logic of later steppings and in Westmere can be brought back to all Nehalems through a firmware update for lower power consumption or even better performance with better Turbo mode application. Although a failed or corrupt firmware update on a CPU could be very problematic. Reply
  • wingless - Thursday, August 21, 2008 - link

    I thought about this when I read about it the first time too. Flashing your CPU could kill the power management or the whole CPU in one fell swoop! Reply

Log in

Don't have an account? Sign up now