AMD’s Turbo: It Works

In the Pentium 4 days Intel quickly discovered that there was a ceiling in terms of how much heat you could realistically dissipate in a standard desktop PC without resorting to more exotic cooling methods. Prior to the Pentium 4, desktop PCs saw generally rising TDPs for both CPUs and GPUs with little regard to maximum power consumption. It wasn’t until we started hitting physical limits of power consumption and heat dissipation that Intel (and AMD) imposed some limits.

High end desktop CPUs now spend their days bumping up against 125 - 140W limits. While mainstream CPUs are down at 65W. Mobile CPUs are generally below 35W. These TDP limits become a problem as you scale up clock speed or core count.

In homogenous multicore CPUs you’ve got a number of identical processor cores that together have to share the maximum TDP of the processor. If a single hypothetical 4GHz processor core hits 125W, then fitting two of them into the same TDP you have to run the cores at a lower clock speed. Say 3.6GHz. Want a quad-core version? Drop the clock speed again. Six cores? Now you’re probably down to 3.2GHz.

Single Core Dual Core Quad Core Hex Core

This is fine if all of your applications are multithreaded and can use all available cores, but life is rarely so perfect. Instead you’ve got a mix of applications and workloads that’ll use anywhere from one to six cores. Browsing the web may only task one or two cores, gaming might use two or four and encoding a video can use all six. If you opt for a six core processor you get great encoding performance, but worse gaming and web browsing performance. Go for a dual core chip and you’ll run the simple things quickly, but suffer in encoding and gaming performance. There’s no winning.

With Nehalem, Intel introduced power gate transistors. Stick one of these in front of a supply voltage line to a core, turn it off and the entire core shuts off. In the past AMD and Intel only put gates in front of the clock signal going to a core (or blocks of a core), this would make sure the core remained inactive but it could still leak power - a problem that got worse with smaller transistor geometries. These power gate transistors however addressed both active and leakage power, an idle core could be almost completely shut off.

If you can take a single core out of the TDP equation, then with some extra logic (around 1M transistors on Nehalem) you can increase the frequency of the remaining cores until you run into TDP or other physical limitations. This is how Intel’s Turbo Boost technology works. Depending on how many cores are active and the amount of power they’re consuming a CPU with Intel’s Turbo Boost can run at up to some predefined frequency above its stock speed.

With Thuban, AMD introduces its own alternative called Turbo Core. The original Phenom processor had the ability to adjust the clock speed of each individual core. AMD disabled this functionality with the Phenom II to avoid some performance problems we ran into, but it’s back with Thuban.

If half (or more) of the CPU cores on a Thuban die are idle, Turbo Core does the following:

1) Decreases the clock speed of the idle cores down to as low as 800MHz.
2) Increases the voltage of all of the cores.
3) Increases the clock speed of the active cores up to 500MHz above their default clock speed.

The end result is the same as Intel’s Turbo Boost from a performance standpoint. Lightly threaded apps see a performance increase. Even heavily threaded workloads might have periods of time that are bound by the performance of a single thread - they benefit from AMD’s Turbo Core as well. In practice, Turbo Core appears to work. While I rarely saw the Phenom II X6 1090T hit 3.6GHz, I would see the occasional jump to 3.4GHz. As you can tell from the screenshot above, there's very little consistency between the cores and their operating frequencies - they all run as fast or as slow as they possibly can it seems.

AMD's Turbo Core Benefit
AMD Phenom II X6 1090T Turbo Core Disabled Turbo Core Enabled Performance Increase
x264-HD 3.03 1st Pass 71.4 fps 74.5 fps 4.3%
x264-HD 3.03 2nd Pass 29.4 fps 30.3 fps 3.1%
Left 4 Dead 117.3 fps 127.2 fps 8.4%
7-zip Compression Test 3069 KB/s 3197 KB/s 4.2%

Turbo Core generally increased performance between 2 and 10% in our standard suite of tests. Given that the max clock speed increase on a Phenom II X6 1090T is 12.5%, that’s not a bad range of performance improvement. Intel’s CPUs stand to gain a bit more (and use less power) from turbo thanks to the fact that Lynnfield, Clarkdale, et al. will physically shut off idle cores rather than just underclock them.

I have noticed a few situations where performance in a benchmark was unexpectedly low with Turbo Core enabled. This could be an artifact of independent core clocking similar to what we saw in the Phenom days, however I saw no consistent issues in my time with the chip thus far.

Introduction The Performance Summary
POST A COMMENT

168 Comments

View All Comments

  • JGabriel - Tuesday, April 27, 2010 - link

    I agree with your approach, Anand. As a customer, and for a general review, I'm most interested in what performance I can get without increasing the core voltage and power consumption.

    Pushing the overclock seems more suited for a separate article, or for sites specializing in overclocking or gaming.

    .
    Reply
  • GullLars - Tuesday, April 27, 2010 - link

    +1 on more overclocking testing. It could be a new article dedicated to investigating OC vs stock against the i5 cpus. Also, i remember seing some results for powerconsumption pr performance on different clocks and voltages, that would be interresting to see for the PII x6.
    I would also like to see what effect just bumping the FSB up from 200 makes, and how high you get it with stable turbo modes. Finding the max stable FSB with lowered multiplier gives a hint to how far it can clock, and how far the non-BE versions can clock.

    I'm considering a a 1090T BE or 1055T to replace my 9850BE (too hot, and won't OC) untill bulldozer chips come out. If the 1055T can take a 20-25% FSB increase and stay stable, that would be my choice.
    Reply
  • Anand Lal Shimpi - Tuesday, April 27, 2010 - link

    Ask and you shall receive: http://anandtech.com/show/3676/phenom-ii-x6-4ghz-a... Reply
  • jav6454 - Tuesday, April 27, 2010 - link

    I miss when Intel was on the run for the market... AMD's 6-core does well, but it'd marginally better than a 4-core from Intel... true you get more cores, but is them small performance worth it? I think not. Reply
  • KaarlisK - Tuesday, April 27, 2010 - link

    Overclocking it is definitely interesting, and it would be nice to know whether it's power consumption is much lower or not. It's a lower bin, so it should have slightly higher power consumption at the same clock, but it does have a much lower clock. Reply
  • Anand Lal Shimpi - Tuesday, April 27, 2010 - link

    We didn't have an actual Phenom II 1055T, we simply underclocked our 1090T and lowered the turbo core ratios to simulate one. This is why we don't have power consumption results for it either.

    Take care,
    Anand
    Reply
  • adrien - Tuesday, April 27, 2010 - link

    Hi,

    I find the results of the 7zip compression benchmark a bit weird. I own a Phenom II X4 955 and I'm currently compressing something (800MB) in a windows7 virtual machine with only two cores (way to go MS!). Still, I'm getting something more than 2MB/s (and this virtual machine has crappy disk I/O, maybe less than 4MB/s sustained). LZMA is slower on data that compresses badly so it's not possible to draw conclusions but overall, it still looks weird.

    Which version of 7zip is being used? 4 or 9? And which minor version?
    Also, are you using LZMA or LZMA2? IIRC, LZMA can only use 2 threads while LZMA2 can use more (maybe powers of two, not sure).
    Also, I think that the 64bit versions are faster. I guess you're using 64bit binaries but it's better to check (running 32bit software on 64bit OSes indeed has a cost), maybe like 5% to 15%.
    Reply
  • Anand Lal Shimpi - Tuesday, April 27, 2010 - link

    I'm using 7-zip 9.10 beta and LMZA compression in order to compare to previous results, which unfortunately limits us to two threads. In the future we'll start transitioning to LMZA2 to take advantage of the 4+ core CPUs on the market. Today I offer both the benchmark results (max threads) and the compression test (2 threads) to give users an idea of the spectrum of performance.

    Take care,
    Anand
    Reply
  • adrien - Wednesday, April 28, 2010 - link

    OK, very good. Thanks. :-)

    That explains why the deca (and actually quad) cores don't benefit much from that. It'd probably deserve a mention on the chart or next to it.
    Reply
  • adrien - Wednesday, April 28, 2010 - link

    Not decacores but hexacores of course. Too early in the morning I guess. Reply

Log in

Don't have an account? Sign up now