Motherboards Memory Storage Cases/Cooling/PSUs IT Computing Displays Mobile Mac CPUs & Chipsets Video Digital Cameras Linux Gadgets Systems Trade Shows Guides Home Increase Font Size Decrease Font Size Change Page Size
Nehalem Part 3: The Cache Debate, LGA-1156 and the 32nm Future
Nehalem Part 3: The Cache Debate, LGA-1156 and the 32nm Future
Date: November 19th, 2008
Topic: CPU & Chipset
Manufacturer: Intel
Author: Anand Lal Shimpi
Buy the Intel BX80601975 Core Extreme Edition
Blank
 CostCentral $1,041.17
 Newegg $999.99
 CircuitCity $974.99
 
 

Another Part? Oh there will be more

In an unexpected turn of events I found myself deep in conversation with many Intel engineers as well as Pat Gelsinger himself about the design choices made in Nehalem. At the same time, Intel just released its 2009 roadmap which outlined some of the lesser known details of the mainstream LGA-1156 Nehalem derivatives.

I hadn’t planned on my next Nehalem update being about caches and mainstream parts, but here we go. For further reading I'd suggest our first two Nehalem articles and the original Nehalem architecture piece.

Nehalem’s Cache: More Controversial Than You’d Think

I spoke with Ronak Singhal, Chief Architect on Nehalem, at Intel’s Core i7 launch event last week in San Francisco and I said to him: “I think you got the cache sizes wrong on Nehalem”. I must be losing my shyness.

He thought I was talking about the L3 cache and asked if I meant it needed to be bigger, and I clarified that I was talking about the anemic 256KB L2 per core.

We haven’t seen a high end Intel processor with only 256KB L2 per core since Willamette, the first Pentium 4. Since then Intel has been on a steady ramp upwards as far as cache sizes go. I made a graph of L2 cache size per core of all of the major high end Intel cores for the past decade:


Click to Enlarge

For the most part we’ve got a linear trend, there are a few outliers but you can see that earlier in 2008 you’d expect Intel CPUs to have around 2 - 3MB of L2 cache per core. Now look at the lower right of the chart, see the little orange outlier? Yeah, that’s the Core i7 with its 256KB L2 cache per core, it’s like 2002 - 2007 never happened.

If we look at total on-chip cache size however (L2 + L3), the situation is very different:


Click to Enlarge

Now we’ve got an exponential growth of cache size, not linear, and all of the sudden the Core i7 conforms to societal norms. To understand why, we have to look at what happened around 2005 - 2006: Intel started shipping dual-core CPUs. As core count went up, so did the total amount of cache per chip. Dual core CPUs quickly started shipping with 2MB and 4MB of cache per chip and the outgoing 45nm quad-core Penryns had 12MB of L2 cache on a single package.

The move to multi-core chip designs meant that the focus was no longer on feeding the individual core, but making sure all of the cores on the chip were taken care of. It’s all so very socialist (oh no! ;) ).

Nehalem was designed to be a quad-core product, but also one that’s able to scale up to 8 cores and down to 2 cores. Intel believes in this multi-core future so designing for dual-core didn’t make sense as eventually dual-core will go away in desktops, a future that’s still a few years away but a course we’re on nonetheless.


AMD's shift to an all quad-core client roadmap

Intel is pushing the shift to quad-core, much like AMD is. By 2010 all of AMD’s mainstream and enthusiast CPUs will be quad-core with the ultra low end being dual-core, a trend that will continue into 2011. The shift to quad-core makes sense, unfortunately today very few consumer applications benefit from four cores. I hate to keep re-using this same table but it most definitely applies here:

Back when AMD introduced its triple-core Phenom parts I put together a little table illustrating the speedup you get from one, two and four cores in SYSMark 2007:

  SYSMark 2007 Overall E-Learning Video Creation Productivity 3D
Intel Celeron 420 (1 core, 512KB, 1.6GHz) 55 52 55 54 58
Intel Celeron E1200 (2 cores, 512KB, 1.6GHz) 76 68 91 70 78
% Increase from 1 to 2 cores 38% 31% 65% 30% 34%
Intel Core 2 Duo E6750 (2 cores, 4MB, 2.66GHz) 138 147 141 120 145
Intel Core 2 Quad Q6700 (4 cores, 8MB, 2.66GHz) 150 145 177 121 163
% Increase from 2 to 4 cores 8.7% 0% 26% 1% 12%

 

Not only are four cores unnecessary for most consumers today, but optimizing a design for four cores by opting for very small, low latency L2 caches and a large, higher latency L3 cache for the chip isn’t going to yield the best desktop performance.

A Nehalem optimized for two cores would have a large L2 cache similar to what we saw happening on the first graph, but one optimized for four or more cores would look like what the Core i7 ended up being. What’s impressive is that Intel, in optimizing for a quad-core design, was still able to ensure that performance either didn’t change at all or improved in applications that aren’t well threaded.

Apparently the L2 cache size was and still is a controversial issue within Intel, many engineers still feel like it is too small for current workloads. The problem with making it larger is not just one of die size, but also one of latency. Intel managed to get Nehalem’s L2 cache down to 10 cycles, the next bump in L2 size would add another 1 - 2 cycles to its latency. At 512KB per core, 20% longer to access the cache was simply unacceptable to the designers.

In fact, going forward there’s no guarantee that the L2 caches will see growth in size, but the focus instead may be on making the L3 cache faster. Right now the 8MB L3 cache takes around 41 cycles to access, but there’s clearly room for improvement - getting a 30 cycle L3 should be within the realm of possibility. I pushed Ronak for more details on how Intel would achieve a lower latency L3, but the best I got was “microarchitectural tweaks”.

As I mentioned before, Ronak wanted the L3 to be bigger on Nehalem; at 8MB that’s only 2MB per core and merely sufficient in his eyes. There are two 32nm products due out in the next 2 years, I suspect that at least one of them will have an even larger L3 to continue the exponential trend I showed in the second chart above.

Could the L2 be larger? Sure. But Ronak and his team ultimately felt that the tradeoff between size/latency was necessary for what Intel’s targets were with Nehalem. And given its 0 - 60% performance increase, clock for clock, over Penryn - I can’t really argue.

Mainstream Nehalem: On-chip GPU and On-chip PCIe   Next Page

 
  Index

Tools Share
Find lowest prices Find the lowest prices
Digg   del.icio.us   E-mail  
Print This Article Print this article  

33 Comments - Last by SiliconDoc, 328 days ago
Username:
Password:
P55 chipset PCIe support by Lonyo, 366 days ago
If the Lynnfield has x16 PCIe, and the diagram shows no SB/MCH, does that mean the P55 will be a single chip design and include the extra PCIe slots, and might it be possible to do triple SLI if manufacturers use the PCIe slots from the CPU as well as the chipset?

Reply
Want to understand why Intel went with such a small L2 cache on Nehalem? by piesquared, 366 days ago
Nope, don't give a shit. But do want to know what keeps happening to all these AMD and ATI reviews you keep promising over and over.

Reply
RE: Want to understand why Intel went with such a small L2 cache on Nehalem? by whatthehey, 365 days ago
You want an AMD review? Here's one for you: AMD's current products suck for the vast majority of users. The only place they're worthwhile is in the 8S server space; otherwise, they cost too much and deliver too little. Their dual-core parts were awesome when all they had to do was beat Pentium D, but Intel has progressed substantially since then and all AMD has got is a bloated, buggy, slow POS known as Phenom. At least the name is right: it's a phenomenal failure.

Or maybe you mean the various ATI reviews posted during the past couple months?
http://www.anandtech.com/video/showdoc.aspx?i=3441
http://www.anandtech.com/video/showdoc.aspx?i=3437
http://www.anandtech.com/video/showdoc.aspx?i=3420
http://www.anandtech.com/video/showdoc.aspx?i=3415
http://www.anandtech.com/video/showdoc.aspx?i=3405

Oh, but that's not good enough for the AMD fanboyz! Everyone needs to baby AMD and talk about how awesome they are, when AMD is busily circling the drain and getting ready to spin off their fabrication to a separate company. ATI is doing pretty well, and AMD made some good hardware in the past; unfortunately, it doesn't look like they were able to continue to compete.

And honestly, it's no big surprise. Even Intel is having a tough time competing with their own products. Nehalem is a nice design, but as I've told others we are at the point where 95% of people don't need anything more than a three year old Athlon 64 X2. Quad-core only matters to a small number of desktop users at best, and here Intel and AMD are both looking to hex-core and octal-core in the not too distant future. That's great if you do video work or 3D rendering, but pretty much useless for everyone else.

I lust after the new Nehalem upgrades as much as the next guy, but invariably I come back to the realization that my pathetic Q6600 @ 3.00GHz (yes, I backed off from 3.6GHz when I realized that the extra voltage and stress on the system wasn't actually improving performance in any of the applications I use on a regular basis) was more than fast enough for any current program. About the only thing I need right now is an upgrade in the video card department, and I don't need Nehalem for that!

Reply
RE: Want to understand why Intel went with such a small L2 cache on Nehalem? by Regs, 365 days ago
AMD's primary problem in the industry is setting a tangible goal/dead line and actually meeting that goal or dead line. "Well at least they won't release it all buggy en' what not". Maybe that line works for video games, but not in this cut throat industry. Intel has been beating AMD to the punch time and again in the CPU market and eroding AMD's sales and market share. Which is why AMD has to retool, reorganize, and follow through with their roadmaps or else they'll have to figure out what products they can actually compete with. Shanghai, even though a little 18 months late, is a good sign of execution by AMD; delaying a CPU/GPU combination platform for a notebook until 2011 is not. Notebooks are a big source a revenue AMD will be passing up in the next 2 years to Intel and they're really going to need something highly competitive if they wish to earn any market share back by 2011.

Reply
RE: Want to understand why Intel went with such a small L2 cache on Nehalem? by Griswold, 365 days ago
After reading

"AMD's current products suck for the *vast majority* of users"

I knew that your entire posting would have less substance than a steaming pile of cow dung. Why is it that the most clueless people always type up the biggest shitstorm of incoherent garbage..?

Reply
RE: Want to understand why Intel went with such a small L2 cache on Nehalem? by whatthehey, 364 days ago
Okay, my first sentence or two was off base, I admit. It's because piesquared made an assinine comment about an article. Anand gives an interesting piece about cache sizes, and some prick responds with, "nope - all I care about is hearing about AMD's same-old same-old designs!" Most of us like to think about the ramifications of cache sizes and CPU architectures, and frankly AMD doesn't have a lot to discuss in that area right now. Nehalem is a pretty major change to Intel's recent architectures, and as such it's worth discussing.

If you'd climb off your high horse for a minute and read the rest of my post (rather than getting your "I Love AMD" dander up, oh great and noble Griswold, defender of AMD), you'd see a lot of facts that are hard to argue with. Performance wise, AMD is sucking Intel's dust in pretty much every area except 8S heavily loaded servers, and there the bigger deal is they do better in performance per watt. Pricing on their CPUs is good, but only because they need to lower prices in order to compete - and Intel has been matching them quite well.

That said, anyone that doesn't see MASSIVE problems for AMD right now has some serious blinders on. This new Foundry Company split is going to put a stake in their heart, mark my words. I only hope someone steps in to fill the void when AMD inevitably fails and disappears, because you just can't compete with Intel by breaking your business into smaller pieces that will have more problems working together than they do when they're all under the same umbrella. If AMD is already behind schedules repeatedly with the current setup, how are they going to do better when they become fabless and have to go through a third party for the various stages of production?

Reply
RE: Want to understand why Intel went with such a small L2 cache on Nehalem? by chizow, 365 days ago
LOL. There was a brilliant post on DT that basically claims AMD has now shifted their focus to producing Roadmaps. A bit harsh, but honestly pretty accurate.

Wait til AMD actually releases a new product before getting all emo about a lack of AMD reviews.

Reply
AMD? by CEO Ballmer, 366 days ago
AMD is still in the game?
I had written them off!

http://fakesteveballmer.blogspot.com

Reply
RE: AMD? by Derbalized, 365 days ago
AMD is still in the game.
AMD is designing Intels next chip.
Probaly with an integrated memory controller also.

Reply
RE: AMD? by Derbalized, 365 days ago
I probaly should have spelled probably right. LOL

Reply
Comments Page 1 of 4

Unlicensed Software at Your Last Company
Anonymously Report Unlicensed Software with Our Form Now. Get Up to $1 Million.
We Buy Laptop and PC Memory! Sell to Us!
Min of 25 pieces required. Call us today at 239.354.1230.
Special Offer from The Economist
Get 12 issues of The Economist for $12. US subscribers only.
Free Forrester Risk Management Report
Demystifying Enterprise Risk Management. Download Free With Registration.
Download Microsoft Visual Studio ® Team System
Streamline Dev processes, Reduce time to market. Try Microsoft Visual Studio Team System, FREE!




Latest news by
DailyTech

 November 20, 2009

Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank

 November 19, 2009

Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank


Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
more CPU & Chipset Discussions



pipeboost
Copyright © 1997-2009 AnandTech, Inc. All rights reserved. Terms, Conditions and Privacy Information.
Click Here for Advertising Information