Homework: How Turbo Mode Works

AMD and Intel both figured out the practical maximum power consumption of a desktop CPU. Intel actually discovered it first, through trial and error, in the Prescott days. At the high end that's around 130W, for the upper mainstream market that's 95W. That's why all high end CPUs ship with 120 - 140W TDPs.

Regardless of whether you have one, two, four, six or eight cores - the entire chip has to fit within that power envelope. A single core 95W chip gets to have a one core eating up all of that power budget. This is where we get very high clock speed single core CPUs from. A 95W dual core processor means that individually the cores have to use less than the single 95W processor, so tradeoffs are made: each core runs at a lower clock speed. A 95W quad core processor requires that each core uses less power than both a single or dual core 95W processor, resulting in more tradeoffs. Each core runs at a lower clock speed than the 95W dual core processor.

The diagram below helps illustrate this:

  Single Core Dual Core Quad Core Hex Core
TDP
Tradeoff

 

The TDP is constant, you can't ramp power indefinitely - you eventually run into cooling and thermal density issues. The variables are core count and clock speed (at least today), if you increase one, you have to decrease the other.

Here's the problem: what happens if you're not using all four cores of the 95W quad core processor? You're only consuming a fraction of the 95W TDP because parts of the chip are idle, but your chip ends up being slower than a 95W dual core processor since its clocked lower. The consumer has to thus choose if they should buy a faster dual core or a slower quad core processor.

A smart processor would realize that its cores aren't frequency limited, just TDP limited. Furthermore, if half the chip is idle then the active cores could theoretically run faster.

That smart processor is Lynnfield.

Intel made a very important announcement when Nehalem launched last year. Everyone focused on cache sizes, performance or memory latency, but the most important part of Nehalem was far more subtle: the Power Gate Transistor.

Transistors are supposed to act as light switches - allowing current to flow when they're on, and stopping the flow when they're off. One side effect of constantly reducing transistor feature size and increasing performance is that current continues to flow even when the transistor is switched off. It's called leakage current, and when you've got a few hundred million transistors that are supposed to be off but are still using current, power efficiency suffers. You can reduce leakage current, but you also impact performance when doing so; the processes with the lowest leakage, can't scale as high in clock speed.

Using some clever materials engineering Intel developed a very low resistance, low leakage, transistor that can effectively drop any circuits behind it to near-zero power consumption; a true off switch. This is the Power Gate Transistor.

On a quad-core Phenom II, if two cores are idle, blocks of transistors are placed in the off-state but they still consume power thanks to leakage current. On any Nehalem processor, if two cores are idle, the Power Gate transistors that feed the cores their supply current are turned off and thus the two cores are almost completely turned off - with extremely low leakage current. This is why nothing can touch Nehalem's idle power:

Since Nehalem can effectively turn off idle cores, it can free up some of that precious TDP we were talking about above. The next step then makes perfect sense. After turning off idle cores, let's boost the speed of active cores until we hit our TDP limit.

On every single Nehalem (Lynnfield included) lies around 1 million transistors (about the complexity of a 486) whose sole task is managing power. It turns cores off, underclocks them and is generally charged with the task of making sure that power usage is kept to a minimum. Lynnfield's PCU (Power Control Unit) is largely the same as what was in Bloomfield. The architecture remains the same, although it has a higher sampling rate for monitoring the state of all of the cores and demands on them.

The PCU is responsible for turbo mode.

New Heatsinks and Motherboards Lynnfield's Turbo Mode: Up to 17% More Performance
Comments Locked

343 Comments

View All Comments

  • jasperjones - Tuesday, September 8, 2009 - link

    Wonderful article as usual on AT. Read the articles on the website of your main competitor minutes before and didn't learn nearly as much about the LGA 1156 platform as I did here. Well done!

    I have one somewhat cheap comment. I always feel there's only one thing I do for which I really "need" my Core i7. And that's test-driving and debugging my well-threaded code (which makes use of OpenMP, MPI, threaded Intel MKL, etc.) before scp-ing it over to a cluster. Obviously, when testing code, I run using 8 threads. Still think that the Core i7 is probably more competitive in that area (performance/$ wise) than in the ones which this review focuses on (simply because I assume such code puts enough stress on the processors such that turbo-boosting is out of the question). On the other hand, I don't really care if gzip takes 2.5 or 3 seconds to compress a file (or if flac takes 8 or 9 seconds to encode my wav).

    As I said, it's a cheap point. Just saying that I feel I primarily need "oomph" when running well-threaded stuff. Again, great article!

  • AeroWB - Tuesday, September 8, 2009 - link

    Thanks for the interesting read, I do agree with some other people that some things are missing (clock for clock comparison) and some things are weird (core i7 with 1066DDR3). Some people are saying that everyone is overclocking their core i7, and while most readers of this article will probably be geeks that overclock I also read these articles as a systembuilder and I know that at least 95% of my customers don't overclock, so I really dig non-overclocked comparisons and results.
    There is also one thing I do not agree on, lets have another look at the page "The Best Gaming CPU?" and look to the DoWII results. What I see there is totally different from your conlusion though you do mention it in the text, the Bloomfield has lower minimum framerate then Lynnfield, but still your conclusion is Bloomfield is better then Lnynfield and Lynnfield is better then the Core2E8600. Ehm ???
    Lets be clear the core i7 920 really sucks here as of its really low minimum fps you will have stutters. Great gaming is all about having a butter smooth FPS which dependent on the game type needs to be between 30 and 60 FPS. Basically the best game experience here will probably be with the E8600 as it has the highest minimum at 33 FPS which is great for RTS gaming. In order to say which CPU is best you should have an extra statistic like how much and how long the framerate dropped below 30FPS or something but as we do not have this data the minimum framerate is our next best thing. As weve seen before the Core i7 is good when using SLI/Crossfire but on par with the core2 when using a single GPU. Intel also told us themselfs that Core i7 was not made for gaming but for taking a bigger part in the server market. When increasing resolution/quality of gaming when using one GPU the Phenom 2 was often as fast and sometimes even faster then the Core i7. Unfortunately most CPU comparison with gaming are done at low to medium resolutions and quality so this effect couldn't be seen in most tests, but there were very few where this could be seen. So gaming with Core i7 920 only made sense when using SLI/Crossfire (as it scaled much better with these then Phenom2) or when paying the extra money (over Phenom2) because you used the system mostly for other task like video editing or so.
    Now we can see this gaming problem of the Core i7 has been (at least partly) solved with Lynnfield, but still the Phenom2 965 has a higher minimum then the Core i5 750 so I would still prefer that one.
    The other gaming test are not really relevant as all cpu's score a minimum of 60 FPS (ok one exception on 59) and so you won't notice any difference between all of the tested CPU's with those settings.
    Still it is probable that the better gaming CPU in these test will also be better with higher settings, but as I have seen with the weird Core i7 / Phenom2 results I want to see tests with higher settings or test with more demanding games. And we want minimum and average results to determine which is best.
    Sorry for the long post
  • iwodo - Tuesday, September 8, 2009 - link

    I am waiting for SandyBridge or even Ivy Bridge for FMA.

    For now, a C2Q two years ago with money spent on graphics card will do fine.
    The whole LGA socket and naming is a complete mess.

    Dont get me wrong, it is a good Processors, but not the jump from Pentium 4 to C2D.

    Money spend on SSD and Graphics is much better valued.
  • JonnyDough - Tuesday, September 8, 2009 - link

    My dual core Opty 185 is still doing fine...Fallout 3 is still playable with my 8800GTS 640. The system has a slight OC and is chugging along at a minimum of 45FPS in the game on decent settings. Granted, it can't play every game - but I can only play one at a time anyway, and my life does not revolve around gaming. Hello...BEER PONG!
  • Griswold - Tuesday, September 8, 2009 - link

    I agree. I'll get excited when the 32nm dual cores with HT arrive. That would be a worthwhile "upgrade" (but a downgrade in number of cores, simply because I dont need 4 physical cores that much anymore) from my q6600 on a p35.

    Still, its a good product, just not worth an upgrade for everyone.
  • strikeback03 - Tuesday, September 8, 2009 - link

    I was hoping there would be 32nm quads in this cycle, but it appears not. I'd definitely like something faster than my E6600/P965, but don't think it is worthwhile in time or money to just go to a C2Q.
  • R3MF - Tuesday, September 8, 2009 - link

    I spent much of the past year harping on AMD selling Nehalem-sized Phenom IIs for less than Intel sold Nehalems. With Lynnfield, Intel actually made Nehalem even >>>bigger<<< all while driving prices down.

    i think you mean smaller.
  • strikeback03 - Tuesday, September 8, 2009 - link

    Nope, he meant bigger. Same process + more transistors = larger die, as is illustrated in the table.
  • JonnyDough - Tuesday, September 8, 2009 - link

    I think AMD realized years ago that they had awoken a sleeping giant, and it was a smart move to start thinking about competing graphically when they did. They saw how IBM had to change when Intel reared its ugly head. If you put all your eggs in one basket, you'll surely drop your next meal at some point. Diversifying into new markets was a smart move. Anyone who said that AMD didn't have good leadership didn't know what they were saying. Sure, money got really tight - but that's what has to happen to someone in a very competitive market at some point. Just take a look at GM. Giants crumble, little guys take over, and giants can muster a comeback...
  • blyndy - Tuesday, September 8, 2009 - link

    "I think AMD realised years ago that they had awoken a sleeping giant, and it was a smart move to start thinking about competing graphically when they did."

    That's an interesting thought.

    I think there were to mains reasons why AMD acquired ATI.
    1) in response to the news of Larrabee -- pre-emptive defensive move.
    2)To diversify in preparation for Intels technological onslaught to finally kill its only CPU competitor.

    So it may have been a smart move. On the other hand, knowing how patent riddled the CPU business is, maybe they could have ramped up R&D, but AMD is puny next to Intel.

Log in

Don't have an account? Sign up now