Nehalem will support 2-way SMT (two threads per core), much like the Pentium 4 did before it. With a shorter pipeline than NetBurst and a greater ability to get data to the cores, there's more opportunity for increased parallelism (and thus performance) thanks to SMT on Nehalem than on Pentium 4.

The cache subsystem of Nehalem is almost entirely changed from Penryn. While Nehalem has the same 32KB L1 instruction and data caches of Penryn, the L2 and L3 caches are brand new. Each core in a quad-core Nehalem now has a smaller 256KB L2 cache, which Intel is calling "low latency" (potentially lower latency than Penryn thanks to a smaller cache size). While ditching the shared L2, Intel equipped Nehalem with a large 8MB fully-shared L3 cache that can be used by all cores.

This setup seems very similar to AMD's Phenom architecture, obviously built on Intel's Core 2 base however - the major difference here is that the cache hierarchy is inclusive and not exclusive like AMD's. The inclusive architecture means that each level of cache has a copy of data from the lower cache levels.

Nehalem effectively includes the only remaining advantages AMD held over Intel with respect to memory performance and interconnect speed - you can expect a tremendous performance increase going from Penryn to Nehalem because of this. Intel is expecting memory accesses to be around twice the speed in Nehalem as they are in Penryn, which thanks to its aggressive prefetchers are already incredibly fast. If you think Intel's performance advantage is significant today, Nehalem should completely redefine your perspective - AMD needs its Bobcat and Bulldozer cores if it is going to want to compete.

Intel has also added a new 2nd level TLB in Nehalem, similar in approach to its new 2nd level branch predictor. The first level TLB does a good job of keeping the cores fed quickly, but if there isn't a physical/virtual address mapping found in the first level TLB Nehalem can now look in the second level TLB instead of looking in the cache to keep performance high and latency low.

The TLB enhancements in particular look to be particularly great at server workloads, we suspect that Intel may be looking to really take on Opteron with Nehalem.

Above you see examples of the first Nehalem platforms - they should look very familiar to block diagrams of AMD K8 platforms we've seen for years now. The first high end desktop Nehalem parts will have an integrated 3-channel DDR3 memory controller supporting DDR3-800, 1066 and 1333.

On the server side you'll see registered memory support from Nehalem's IMC.

Nehalem Architecture: Improvements Detailed Intel 32nm Update
Comments Locked


View All Comments

  • haplo602 - Tuesday, March 18, 2008 - link

    I wonder what the real world usage will be. I mean first you need to get Microsoft to code a new version of Windows to eat all that horse power. Then you are back at the begining... You have more cores but Windows is using most of them again (or not using all of them in case of old version).

    Anyway I don't see any significant benefits of these CPUs except highend server and workstation load.

    Consumer will drift more into the console or memory/specialised processing unit (GPU, sound processors ...) markets ...
  • oldhoss - Tuesday, March 18, 2008 - link

    The screwdriver is actually fuel for the IFRPS (Intel Fusion Reactor power supply), rated @ 1.21 Jigawatts! ;-P
  • brshoemak - Monday, March 17, 2008 - link

    I assume I'm not the only one who notices the glass of OJ on the 4 core Nehalem system? Kinda odd as I doubt they carry a lot of spares around.
  • ryback - Tuesday, March 18, 2008 - link

    It's not OJ. It's a screwdriver.
  • tmouse - Tuesday, March 18, 2008 - link

    Its part of the new processor cooling system. Also Intel's additional strategy Tick, Tock, Crock : enough alcohol = even BETTER coverage by the press ;)
  • 7Enigma - Tuesday, March 18, 2008 - link

    HAHAHAHA! Very nice.
  • Imaginer - Monday, March 17, 2008 - link

    With intel doing things that way, I would expect the PC platform to finally have a standard instruction set for graphics processing similar to general purpose computing with the x86 standard. Would that mean that it would be ALOT EAISER for game developers to produce for the PC akin the way they are doing right now specializing for a particular console?

    I like that idea very much. Hopefully AMD/ATi and Nvidia would eventually be in on the standard as well.
  • Griswold - Tuesday, March 18, 2008 - link

    I too think you got it all wrong on that one. See the other comment.
  • kaddar - Monday, March 17, 2008 - link

    No, because in general game development isn't done on instruction sets or assembly, it's done in programming languages utilizing API's. Specifically, DirectX or OpenGL. The architecture is abstracted away, and rightly so.
  • Nihility - Monday, March 17, 2008 - link

    Sounds pretty exciting. The huge cache on the Penryn procs does a pretty good job of negating the side effects of the slower memory interconnect so I'd be surprised if we see huge gains from Nehalem just because of the memory part as it wasn't that big of a bottleneck. Probably see more benefits on the server side. However, 8 cores is definitely a treat.

Log in

Don't have an account? Sign up now