Nehalem will support 2-way SMT (two threads per core), much like the Pentium 4 did before it. With a shorter pipeline than NetBurst and a greater ability to get data to the cores, there's more opportunity for increased parallelism (and thus performance) thanks to SMT on Nehalem than on Pentium 4.

The cache subsystem of Nehalem is almost entirely changed from Penryn. While Nehalem has the same 32KB L1 instruction and data caches of Penryn, the L2 and L3 caches are brand new. Each core in a quad-core Nehalem now has a smaller 256KB L2 cache, which Intel is calling "low latency" (potentially lower latency than Penryn thanks to a smaller cache size). While ditching the shared L2, Intel equipped Nehalem with a large 8MB fully-shared L3 cache that can be used by all cores.

This setup seems very similar to AMD's Phenom architecture, obviously built on Intel's Core 2 base however - the major difference here is that the cache hierarchy is inclusive and not exclusive like AMD's. The inclusive architecture means that each level of cache has a copy of data from the lower cache levels.

Nehalem effectively includes the only remaining advantages AMD held over Intel with respect to memory performance and interconnect speed - you can expect a tremendous performance increase going from Penryn to Nehalem because of this. Intel is expecting memory accesses to be around twice the speed in Nehalem as they are in Penryn, which thanks to its aggressive prefetchers are already incredibly fast. If you think Intel's performance advantage is significant today, Nehalem should completely redefine your perspective - AMD needs its Bobcat and Bulldozer cores if it is going to want to compete.

Intel has also added a new 2nd level TLB in Nehalem, similar in approach to its new 2nd level branch predictor. The first level TLB does a good job of keeping the cores fed quickly, but if there isn't a physical/virtual address mapping found in the first level TLB Nehalem can now look in the second level TLB instead of looking in the cache to keep performance high and latency low.

The TLB enhancements in particular look to be particularly great at server workloads, we suspect that Intel may be looking to really take on Opteron with Nehalem.

Above you see examples of the first Nehalem platforms - they should look very familiar to block diagrams of AMD K8 platforms we've seen for years now. The first high end desktop Nehalem parts will have an integrated 3-channel DDR3 memory controller supporting DDR3-800, 1066 and 1333.

On the server side you'll see registered memory support from Nehalem's IMC.

Nehalem Architecture: Improvements Detailed Intel 32nm Update
Comments Locked

53 Comments

View All Comments

  • Yongsta - Tuesday, March 18, 2008 - link

    Anime Characters don't count.
  • RamIt - Monday, March 17, 2008 - link

    Yep, you are the only one that cares :)
  • teko - Monday, March 17, 2008 - link

    I also ponder on the title and image. Actually, I clicked and read the whole article trying to look for a connection.

    Yea, I think it's bad taste.
  • Owls - Tuesday, March 18, 2008 - link

    Who cares. If only Elliot Spitzer paid his hookers in Penryns no one would have known.
  • Deville - Monday, March 17, 2008 - link

    Wow! Exciting stuff!

    AMD? Hello???
  • InternetGeek - Monday, March 17, 2008 - link

    But would you rather buy a Tick processor or a Tock processor?

    In any case you have to accept that the following generation (Tick or Tock) will perform faster. It's how Intel makes money.

    The scenario is that on keeping your computer for 3-4 years. I rekcon that's still the average time. Basically when your GPU can no longer play the new games decently. On the CPU side, I think buying a Tock processor might be a better deal because you're getting the refined version of your generation.

    Problem is the way Intel introduces their SIMD extensions. I've seen that done randomly (Ticks or Tocks) and sometimes you do want to have those extensions.

    Is there a way to correlate Ticks/Tocks and SIMD extensions?
  • ocyl - Monday, March 17, 2008 - link

    For some unknown reasons, Intel's "tick-tock" terminology is used inversely of the same phrase's common understanding, per Page 4 of the article. With Intel, "tock" refers to a brand new generation, while "tick" refers to refinement of the current generation. Why this is the case, I have no idea.
  • InternetGeek - Monday, March 17, 2008 - link

    I think you got it wrong. It is perfectly explained at http://www.intel.com/technology/tick-tock/index.ht...">http://www.intel.com/technology/tick-tock/index.ht...

    Tick is a new silicon process, Tock is an upgrade.

    ---
    Year 1: "Tick"
    Intel delivers a new silicon process technology that dramatically increases transistor density versus the previous generation. This technology is used to enhance performance and energy efficiency by shrinking and refining the existing microarchitecture.

    Year 2: "Tock"
    Intel delivers an entirely new processor microarchitecture to optimize the value of the increased number of transistor and technology updates that are now available
    ---
  • InternetGeek - Monday, March 17, 2008 - link

    Ehm, You're actually correct. Tick is the upgrade. Tock is the new thing.
  • masher2 - Monday, March 17, 2008 - link

    Not quite. "Tick" is the new silicon process (the shrink). "Tock" is the new uCore architecture. They're both "new things".

    From a refinement perspective, the Tick generation will also typically include uCore refinements, and the Tock likewise includes process refinements.

Log in

Don't have an account? Sign up now