AMD’s Turbo: It Works

In the Pentium 4 days Intel quickly discovered that there was a ceiling in terms of how much heat you could realistically dissipate in a standard desktop PC without resorting to more exotic cooling methods. Prior to the Pentium 4, desktop PCs saw generally rising TDPs for both CPUs and GPUs with little regard to maximum power consumption. It wasn’t until we started hitting physical limits of power consumption and heat dissipation that Intel (and AMD) imposed some limits.

High end desktop CPUs now spend their days bumping up against 125 - 140W limits. While mainstream CPUs are down at 65W. Mobile CPUs are generally below 35W. These TDP limits become a problem as you scale up clock speed or core count.

In homogenous multicore CPUs you’ve got a number of identical processor cores that together have to share the maximum TDP of the processor. If a single hypothetical 4GHz processor core hits 125W, then fitting two of them into the same TDP you have to run the cores at a lower clock speed. Say 3.6GHz. Want a quad-core version? Drop the clock speed again. Six cores? Now you’re probably down to 3.2GHz.

Single Core Dual Core Quad Core Hex Core

This is fine if all of your applications are multithreaded and can use all available cores, but life is rarely so perfect. Instead you’ve got a mix of applications and workloads that’ll use anywhere from one to six cores. Browsing the web may only task one or two cores, gaming might use two or four and encoding a video can use all six. If you opt for a six core processor you get great encoding performance, but worse gaming and web browsing performance. Go for a dual core chip and you’ll run the simple things quickly, but suffer in encoding and gaming performance. There’s no winning.

With Nehalem, Intel introduced power gate transistors. Stick one of these in front of a supply voltage line to a core, turn it off and the entire core shuts off. In the past AMD and Intel only put gates in front of the clock signal going to a core (or blocks of a core), this would make sure the core remained inactive but it could still leak power - a problem that got worse with smaller transistor geometries. These power gate transistors however addressed both active and leakage power, an idle core could be almost completely shut off.

If you can take a single core out of the TDP equation, then with some extra logic (around 1M transistors on Nehalem) you can increase the frequency of the remaining cores until you run into TDP or other physical limitations. This is how Intel’s Turbo Boost technology works. Depending on how many cores are active and the amount of power they’re consuming a CPU with Intel’s Turbo Boost can run at up to some predefined frequency above its stock speed.

With Thuban, AMD introduces its own alternative called Turbo Core. The original Phenom processor had the ability to adjust the clock speed of each individual core. AMD disabled this functionality with the Phenom II to avoid some performance problems we ran into, but it’s back with Thuban.

If half (or more) of the CPU cores on a Thuban die are idle, Turbo Core does the following:

1) Decreases the clock speed of the idle cores down to as low as 800MHz.
2) Increases the voltage of all of the cores.
3) Increases the clock speed of the active cores up to 500MHz above their default clock speed.

The end result is the same as Intel’s Turbo Boost from a performance standpoint. Lightly threaded apps see a performance increase. Even heavily threaded workloads might have periods of time that are bound by the performance of a single thread - they benefit from AMD’s Turbo Core as well. In practice, Turbo Core appears to work. While I rarely saw the Phenom II X6 1090T hit 3.6GHz, I would see the occasional jump to 3.4GHz. As you can tell from the screenshot above, there's very little consistency between the cores and their operating frequencies - they all run as fast or as slow as they possibly can it seems.

AMD's Turbo Core Benefit
AMD Phenom II X6 1090T Turbo Core Disabled Turbo Core Enabled Performance Increase
x264-HD 3.03 1st Pass 71.4 fps 74.5 fps 4.3%
x264-HD 3.03 2nd Pass 29.4 fps 30.3 fps 3.1%
Left 4 Dead 117.3 fps 127.2 fps 8.4%
7-zip Compression Test 3069 KB/s 3197 KB/s 4.2%

Turbo Core generally increased performance between 2 and 10% in our standard suite of tests. Given that the max clock speed increase on a Phenom II X6 1090T is 12.5%, that’s not a bad range of performance improvement. Intel’s CPUs stand to gain a bit more (and use less power) from turbo thanks to the fact that Lynnfield, Clarkdale, et al. will physically shut off idle cores rather than just underclock them.

I have noticed a few situations where performance in a benchmark was unexpectedly low with Turbo Core enabled. This could be an artifact of independent core clocking similar to what we saw in the Phenom days, however I saw no consistent issues in my time with the chip thus far.

Introduction The Performance Summary
Comments Locked

168 Comments

View All Comments

  • kwm - Wednesday, April 28, 2010 - link

    did the big bad caps scare you. sorry
  • Taft12 - Tuesday, April 27, 2010 - link

    OF COURSE 6 CPU CORES WILL PROVIDE A TANGIBLE BENEFIT TO VIRTUALIZED PLATFORMS!
  • Skiprudder - Tuesday, April 27, 2010 - link

    To folks asking for a more detailed overclocking review, I would just say that Anand almost always releases an in-depth OC article on a new CPU architecture anywhere from a day to a week later. I think he usually wants to get the basic info out first, then delve into the nitty gritty for those who OC.
  • silverblue - Tuesday, April 27, 2010 - link

    I think Thuban could've been a little better realised;

    1) Higher uncore speed - whatever happened to the touted 5.2GT/s HT3 link?
    2) Triple channel controller - AMD have been using dual channel controllers for the best part of a decade - this HAS to be starving Thuban
    3) Keeping the Phenom II's core control system. Phenom I may have been more elegant, but even if Thuban is faster at ramping up the voltages, it'll still result in issues with XP and Vista. So, the targetted audience, at least for Microsoft users, would be Windows 7.

    Which reminds me... SysMark is on Vista, an OS known to cause issues with Phenoms. Would this have detrimentally affected the X6's scores, even if two cores are being taxed?

    I don't think extra cache would be viable for AMD. The Athlon II X4s aren't far behind equivalent clocked Phenom II X4s even without any L3 cache, plus the added expense and die complexity would've just pushed prices, and temperatures, upwards. Of course, a higher model with 8MB of L3 cache would be nice to see.

    It's not really a disappointment to see Thuban fail to topple the entry-level Nehalems. Remember that they're logical 8-thread CPUs and are thus more efficient at keeping their pipelines fed. You can still get a high-end AMD setup for cheaper than the competing Intel setups; just throw some heavily threaded software at it and it'll do very nicely. The new X4s may just give Intel cause to drop prices though.

    One final thing - AMD's offerings are known to perform far closer to Intel CPUs when every single bit of eye candy is enabled in games, including AA, and pushing the resolution upwards. It may have been more telling had this been done.
  • silverblue - Tuesday, April 27, 2010 - link

    Ignore the last bit; it wouldn't be a good indication of the power of Thuban.
  • gruffi - Wednesday, April 28, 2010 - link

    1) Higher uncore speed means higher power consumption and probably less power efficiency.
    2) You would need a new platform that makes the current one obsolete. You would also need much more time and money to validate.
    3) I actually see no problem.

    Sry, but your claims are unrealistic or pointless.
  • silverblue - Thursday, April 29, 2010 - link

    "1) Higher uncore speed means higher power consumption and probably less power efficiency."
    You could just reduce the clock speeds to compensate, assuming a higher uncore yields a satisfactory performance increase. The i7-920 has an uncore speed of 2.13GHz and Phenom IIs at 2GHz.

    "2) You would need a new platform that makes the current one obsolete. You would also need much more time and money to validate."
    Fair dos.

    "3) I actually see no problem."
    The potential for a thread hitting an idle core would still be there as, even with Turbo CORE doing its thing, there would be the potential for three idle cores, however this will be minimised if AMD has decreased the delay needed for a core to ramp back up from 800MHz.

    "Sry, but your claims are unrealistic or pointless."
    That's fine.
  • jonup - Tuesday, April 27, 2010 - link

    Nice read; well done Anand! Are you planning to do an OC follow up like you,ve done in the past. Also I noticed that on the second "CPU Specification Comparison" chart on the first page "AMD Phenom II X4 965" is included twice.
    p.s. What's IOMMU? Can someone explain please?
  • Ryan Smith - Tuesday, April 27, 2010 - link

    The short answer is that an IOMMU is a memory mapping unit (MMU) for I/O devices (video cards, network controllers, etc). For most readers of this site, the only time they'd use an IOMMU is when using a virtual machine, as an IOMMU allows the virtualize OS to talk more or less directly to the hardware by translating the virtual addresses to the physical addresses the hardware is using. However it does have other uses.
  • jonup - Tuesday, April 27, 2010 - link

    Thanks!

Log in

Don't have an account? Sign up now