Opening the Kimono: Intel Details Nehalem and Tempts with Larrabee

Name: Opening the Kimono: Intel Details Nehalem and Tempts with Larrabee
Item: Opening the Kimono: Intel Details Nehalem and Tempts with Larrabee
Author: Anand Lal Shimpi

by Anand Lal Shimpi on March 17, 2008 5:00 PM EST

Posted in
CPUs

53 Comments | Add A Comment

53 Comments

Nehalem will support 2-way SMT (two threads per core), much like the Pentium 4 did before it. With a shorter pipeline than NetBurst and a greater ability to get data to the cores, there's more opportunity for increased parallelism (and thus performance) thanks to SMT on Nehalem than on Pentium 4.

The cache subsystem of Nehalem is almost entirely changed from Penryn. While Nehalem has the same 32KB L1 instruction and data caches of Penryn, the L2 and L3 caches are brand new. Each core in a quad-core Nehalem now has a smaller 256KB L2 cache, which Intel is calling "low latency" (potentially lower latency than Penryn thanks to a smaller cache size). While ditching the shared L2, Intel equipped Nehalem with a large 8MB fully-shared L3 cache that can be used by all cores.

This setup seems very similar to AMD's Phenom architecture, obviously built on Intel's Core 2 base however - the major difference here is that the cache hierarchy is inclusive and not exclusive like AMD's. The inclusive architecture means that each level of cache has a copy of data from the lower cache levels.

Nehalem effectively includes the only remaining advantages AMD held over Intel with respect to memory performance and interconnect speed - you can expect a tremendous performance increase going from Penryn to Nehalem because of this. Intel is expecting memory accesses to be around twice the speed in Nehalem as they are in Penryn, which thanks to its aggressive prefetchers are already incredibly fast. If you think Intel's performance advantage is significant today, Nehalem should completely redefine your perspective - AMD needs its Bobcat and Bulldozer cores if it is going to want to compete.

Intel has also added a new 2nd level TLB in Nehalem, similar in approach to its new 2nd level branch predictor. The first level TLB does a good job of keeping the cores fed quickly, but if there isn't a physical/virtual address mapping found in the first level TLB Nehalem can now look in the second level TLB instead of looking in the cache to keep performance high and latency low.

The TLB enhancements in particular look to be particularly great at server workloads, we suspect that Intel may be looking to really take on Opteron with Nehalem.

Above you see examples of the first Nehalem platforms - they should look very familiar to block diagrams of AMD K8 platforms we've seen for years now. The first high end desktop Nehalem parts will have an integrated 3-channel DDR3 memory controller supporting DDR3-800, 1066 and 1333.

On the server side you'll see registered memory support from Nehalem's IMC.

Nehalem Architecture: Improvements Detailed Intel 32nm Update

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

53 Comments

View All Comments

pugster - Monday, March 24, 2008 - link
Intel core2duo is probably good for business, but the OS doesn't need need anything more than 2 cores running at an average of 2ghz. I know that there are people out there who wants the latest and greatest for games, but more and more people rather buy in a game console like the ps2 rather than putting money down for an geforce 9800. It seems that the only way for Intel to make money making new products like the silverthorne or going back on the flash memory race.
PlasmaBomb - Thursday, March 20, 2008 - link
Since it is based on penryn isn't 16 MB of cache an odd number? Should that not be 18 MB? (i.e. 3 x dual cores at 6 MB each)
IntelUser2000 - Sunday, March 23, 2008 - link
Plasmabomb, Penryn has 6MB L2, not L3. Dunnington has 16MB L3 in addition to the whatever L2 it will have, please read!
perzy - Wednesday, March 19, 2008 - link
Larrabee, thats good news. Finally some competition in the graphics department!
Let's face it, right now you can get 2 xbox 360's and an ipod for the price of one fast graphics card...that can't be right.
AcaClone - Tuesday, March 18, 2008 - link
What can I say ...
AcaClone - Tuesday, March 18, 2008 - link
On second thought - I guess that it is possible that the demo software is indeed multithreaded, but that only one thread is running when left idle??
ajg - Tuesday, March 18, 2008 - link
The slide showing Intel: The architeturr for life is a page lifted from AMDs slide "Diversifying Platform Design Tracks"

link below
http://www.tgdaily.com/index.php?option=com_conten...">http://www.tgdaily.com/index.php?option...mp;slide...

The CPU architecture is no different. I guess can't make expect an old dog to come up with new tricks?
clnee55 - Tuesday, March 18, 2008 - link
Yes, AMD said it but couldn't do it. Easily said than done
micha90210 - Tuesday, March 18, 2008 - link
Is that possible? There's a limit in XP to 3.25GB of ram. XP can't handle 16GB... is that picture real?
oldhoss - Tuesday, March 18, 2008 - link
I'd venture to guess either XP Pro x64, or Windows 2003 Server.

Opening the Kimono: Intel Details Nehalem and Tempts with Larrabee

Post Your Comment

53 Comments

View All Comments

pugster - Monday, March 24, 2008 - link

PlasmaBomb - Thursday, March 20, 2008 - link

IntelUser2000 - Sunday, March 23, 2008 - link

perzy - Wednesday, March 19, 2008 - link

AcaClone - Tuesday, March 18, 2008 - link

AcaClone - Tuesday, March 18, 2008 - link

ajg - Tuesday, March 18, 2008 - link

clnee55 - Tuesday, March 18, 2008 - link

micha90210 - Tuesday, March 18, 2008 - link

oldhoss - Tuesday, March 18, 2008 - link

Log in

Don't have an account? Sign up now