Nehalem - Everything You Need to Know about Intel's New Architectureby Anand Lal Shimpi on November 3, 2008 1:00 PM EST
- Posted in
Once again, I’ve already talked about the Nehalem cache hierarchy in great detail so I’ll keep this to a quick overview.
Nehalem, like AMD’s Phenom, features a 3-level cache hierarchy. There’s a 64KB L1 cache (32KB I + 32KB D), a 256KB L2 cache (per core, unshared) and up to an 8MB L3 cache (shared among all cores).
The L1 cache is the same size as what we have in Penryn, but it’s actually slower (4 cycles vs. 3 cycles). Intel slowed down the L1 cache as it was gating clock speed, especially as the chip grew in size and complexity. Intel estimated a 2 - 3% performance hit due to the higher latency L1 cache in Nehalem.
The L2 cache also gets a hit - while in Penryn we had a 6MB L2 cache shared between two cores, Nehalem moves the L2 cache next to each individual core and reduces the size to a meager 256KB. We haven’t had a high performance Intel CPU with such a small L2 cache since the first Pentium 4. The smaller L2 is quicker, it only takes 10 cycles from load to get data out of the L2 cache.
The L2 cache acts as a buffer to the L3 cache so you don’t have all of the cores banging on the L3 cache, requiring tons of bandwidth.
The L3 cache is shared by all cores and in the initial Core i7 processors will be 8MB large, although its size will vary depending on the number of cores. Multi-threaded applications that are being worked on by all cores will enjoy the large, shared L3 cache.
Intel defended its reasoning for using an inclusive cache architecture with Nehalem, something it has always done in the past. Nehalem’s L3 cache is inclusive in that it contains all data stored in the L1 and L2 caches as well. The benefit is that if the CPU looks for data in L3 and doesn’t find it, it knows that the data doesn’t exist in any core’s L1 or L2 caches - thereby saving core snoop traffic, which not only improves performance but reduces power consumption as well.
An inclusive cache also prevents core snoop traffic from getting out of hand as you increase the number of cores, something that Nehalem has to worry about given its aspirations of extending beyond 4 cores.