Power Management and Real Turbo Core

Like Llano, Bulldozer incorporates significant clock and power gating throughout its design. Power gating allows individual idle cores to be almost completely powered down, opening up headroom for active cores to be throttled up above and beyond their base operating frequency. Intel's calls this dynamic clock speed adjustment Turbo Boost, while AMD refers to it as Turbo Core.

The Phenom II X6 featured a rudimentary version of Turbo Core without any power gating. As a result, Turbo Core was hardly active in those processors and when it was on, it didn't stay active for very long at all.

Bulldozer's Turbo Core is far more robust. While it still uses Llano's digital estimation method of determining power consumption (e.g. the CPU knows ALU operation x consumes y-watts of power), the results should be far more tangible than what we've seen from any high-end AMD processor in the past.

Turbo Core's granularity hasn't changed with the move to Bulldozer however. If half (or fewer) of the processor cores are active, max turbo is allowed. If any more cores are active, a lower turbo frequency can be selected. Those are the only two frequencies available above the base frequency.

AMD doesn't currently have a Turbo Core monitoring utility so we turned to Core Temp to record CPU frequency while running various workloads to measure the impact of Turbo Core on Bulldozer compared to Phenom II X6 and Sandy Bridge.

First let's pick a heavily threaded workload: our x264 HD benchmark. Each run of our x264 test is composed of two passes: a lightly threaded first pass that analyzes the video, and a heavily threaded second pass that performs the actual encode. Our test runs four times before outputting a result. I measured the frequency of Core 0 over the duration of the test.

Let's start with the Phenom II X6 1100T. By default the 1100T should run at 3.3GHz, but with half or fewer cores enabled it can turbo up to 3.7GHz. If Turbo Core is able to work, I'd expect to see some jumps up to 3.7GHz during the lightly threaded passes of our x264 test:

Unfortunately we see nothing of the sort. Turbo Core is pretty much non-functional on the Phenom II X6, at least running this workload. Average clock speed is a meager 3.31GHz, just barely above stock and likely only due to ASUS being aggressive with its clocking.

Now let's look at the FX-8150 with Turbo Core. The base clock here is 3.6GHz, max turbo is 4.2GHz and the intermediate turbo is 3.9GHz:

Ah that's more like it. While the average is only 3.69GHz (+2.5% over stock), we're actually seeing some movement here. This workload in particular is hard on any processor as you'll see from Intel's 2500K below:

The 2500K runs at 3.3GHz by default, but thanks to turbo it averages 3.41GHz for the duration of this test. We even see a couple of jumps to 3.5 and 3.6GHz. Intel's turbo is a bit more consistent than AMD's, but average clock increase is quite similar at 3%.

Now let's look at the best case scenario for turbo: a heavy single threaded application. A single demanding application, even for a brief period of time, is really where these turbo modes can truly shine. Turbo helps launch applications quicker, make windows appear faster and make an easy time of churning through bursty workloads.

We turn to our usual favorite Cinebench 11.5, as it has an excellent single-threaded benchmark built in. Once again we start with the Phenom II X6 1100T:

Turbo Core actually works on the Phenom II X6, albeit for a very short duration. We see a couple of blips up to 3.7GHz but the rest of the time the chip remains at 3.3GHz. Average clock speed is once again, 3.31GHz.

Bulldozer does far better:

Here we see blips up to 4.2GHz and pretty consistent performance at 3.9GHz, exactly what you'd expect. Average clock speed is 3.93GHz, a full 9% above the 3.6GHz base clock of the FX-8150.

Intel's turbo fluctuates much more frequently here, moving between 3.4GHz and 3.6GHz as it runs into TDP limits. The average clock speed remains at 3.5GHz, or a 6% increase over the base. For the first time ever, AMD actually does a better job at scaling frequency via turbo than Intel. While I would like to see more granular turbo options, it's clear that Turbo Core is a real feature in Bulldozer and not the half-hearted attempt we got with Phenom II X6. I measured the performance gains due to Turbo Core across a number of our benchmarks:

Average performance increased by just under 5% across our tests. It's nothing earth shattering, but it's a start. Don't forget how unassuming the first implementations of Turbo Boost were on Intel architectures. I do hope with future generations we may see even more significant gains from Turbo Core on Bulldozer derivatives.

Independent Clock Frequencies

When AMD introduced the original Phenom processor it promised more energy efficient execution by being able to clock each core independently. You could have a heavy workload running on Core 0 at 2.6GHz, while Core 3 ran a lighter thread at 1.6GHz. In practice, we felt Phenom's asynchronous clocking was a burden as the CPU/OS scheduler combination would sometimes take too long to ramp up a core to a higher frequency when needed. The result, at least back then, was that you'd get significantly lower performance in these workloads that shuffled threads from one core to the next. The problem was so bad that AMD abandoned asynchronous clocking altogether in Phenom II.

The feature is back in Bulldozer, and this time AMD believes it will be problem free. The first major change is with Windows 7, core parking should keep some threads from haphazardly dancing around all available cores. The second change is that Bulldozer can ramp frequencies up and down much quicker than the original Phenom ever could. Chalk that up to a side benefit of Turbo Core being a major part of the architecture this time around.

Asynchronous clocking in Bulldozer hasn't proven to be a burden in any of our tests thus far, however I'm reluctant to embrace it as an advantage just yet. At least not until we've had some more experience with the feature under our belts.

The Pursuit of Clock Speed The Impact of Bulldozer's Pipeline
Comments Locked

430 Comments

View All Comments

  • Elric42 - Thursday, December 1, 2011 - link

    I wanted to say one thing i dont have one but a friend of mine does and he showed me somthing my i5 cant do he was playing a game called crissis if thats how u spell it and running a video editting program at he same time well i cant do that with my i5 if i did the game would start to lag crissis takes alot out of your cpu bad programing even video cards have trouble with the game but bd seems to muti task better then what my i5 can do just wondering if its more for peeps who do alot of stuff at one time.
  • ZyferXY - Monday, January 2, 2012 - link

    Thanks for pointing that out because not so long ago i saw a video on amd's web site where they were showing of a amd Llano notebook vs a intel sandy bridge core i7 notebook they started the same benchmark on both notebooks and the intel was quite fast but as they open more and more programs at the same time the intel starts to drop in performance where the amd is running stable. So my suggestion would be to run all benchmark on the bulldozer and i7 2600k again but this time open about 10 or 20 other programs a the same time then u will truly see the bulldozer shine. I am not a amd fanboy my current build a intel Pentium G860 and i am very dissapointed in myself i shouldve gone with the amd q640 it was around the same price when i bought it. My next build will be a Amd FX4100. HA
  • makaira - Thursday, December 8, 2011 - link

    Well I very excitedly bought a 8150 based system for number crunching as the performance/$ looked very good. I could buy a "quiet" system for Aus $ 1130 with SSD and only 8Gb RAM.
    I had previously purchased a Intel i7 2600K, but could never get it to overclock and run 64 bit Java app (Napoleon Spike from DUG) 24/7, it fell over after 6 hrs or 12 or 23 or 47, it always fell over despite water cooling.
    Now the bulk of my work is done by Xeons in the rack, with a couple of dual 5680's systems doing the heavy lifting (2 x 6 core + hyperthreading looks like 24 CPU's to OS). These are good stable systems with 96Gb RAM, but high overall system cost.
    I wanted a few cheap and moveable fast CPU's. Boy did the Bulldozer fail to deliver
    More is Better measure in Bytes inversion throughput/minute
    BD 8150 115-123k in 8/8 threads i.e. flat out
    i7 2600 237-268k in 8/8 threads i.e. flat out
    Xeon dual 5680 333-356k in 12/24 threads i.e.half loaded
    i7-870 166k in 8/8 threads i.e flat out
    Xeon Dual E5520 190k 12/16 threads
    Xeon Dual 5430 132k 8/8 threads

    The Bulldozer is the slowest and the newest....very poor performance. Eclipsed by Intel at similiar price point. I might as well replace the MB and CPU and go with i73960 or 3930...
  • wepexpert117 - Thursday, December 8, 2011 - link

    I dunno if anyone noticed, but if u study the architectures carefully, then what AMD calls as a 'module' is comparable to a 'core' of Intels. Intels Hyperthreading allows two logical thread executions per core. But AMD's TruCore theory, only allows one thread per core. The Intel i5-2500K has 4 physical cores and 8 logical threads. Compared to that the most powerful of the AMD, the FX-8170, contains 4 modules which can execute 8 threads, with 2 cores per module, each core executing 1 thread. On the other hand the i7-2600K contains 6 physical cores and 12 logical threads. Hence by no chance, can the FX-8150, can match the capability of the 2600K, as the latter as 2 more cores to add to the power. As for the results of the benchmarking, it also agrees with the fact that the FX-8150 is comparable albeit a little less powerful than the i5-2500K, because of the architecture difference between Intels core and AMD's Bulldozer.If AMD ever brings out (according to them) a 12 core FX processor (Prob. FX-12XXX), then it would be really interesting to see how that matches with the i7-2600K. Altough the shared L2 cache architecture, is what may be detrimental to the performance of these processors.
  • Jondenmark - Saturday, December 24, 2011 - link

    Something is wrong. If I look at a die shot of Llano then the core is about 1½ times the size of the 1 MB L2 cache. If I look at a Bulldozer module, it is about 1½ times the 2 MB L2 cache. To me this indicates, that a Buldozer module is about 100% larger than a phenom II core which is far from the 12% more core size, which AMD has previously indicated was the cost of adding another core to form a module. The 12% was expected to allow AMD to add nearly double the core count on a given process node to convince the server market and give plenty of die space for the GPU on the Llano APU. Where am I wrong and what is right?
  • 8 core cpu - Friday, January 6, 2012 - link

    This <a herf="http://8corecpu.com/">8 Core Cpu</a> is high spreed CPU. It is best than other CPU
  • 8 core cpu - Friday, January 6, 2012 - link

    This 8 Core Cpu is high spreed CPU. It is best than other CPU. For more info please ....
    http://8corecpu.com/
  • Raven0628 - Saturday, January 14, 2012 - link

    I beleive amd realy missed it shoot badly, but it is still the right social choice caus what will happen if intel get x86 monopol and they are still resonably priced and whene you have to live with it in every day life will you realy notice the diferance in perfomance. Unless you realy to go for all the top of the line in every part of your system you will got for the top of intel i7.
    But i'v never did and alway ended up with reliable good perfomance amd sys for less than 800$ counting with the power supply i had to replace. this year. my point unless you want a death machine go for amd and you will feel better with your self ;).
    PS. sry for the terible english.
  • Ernst0 - Sunday, February 5, 2012 - link

    Hey guys.

    There is no doubt that whatever critiques have been posted are valid but I skimmed a few pages and saw no "Consumer" comments.

    I have purchased an 8150 with a AMD3+ motherboard and will be putting the unit together.

    In my days since the Z80 and 48k this represents the nicest cpu ever for me.
    That it was affordable and that I will have 8 cores to task with my hobby programming such as trying to factor RSA-numbers or the ilk the AMD 8=core is a dream system for the price.

    I picked up case, mother board power supply, 1.5 TB drive DVD, 1 gb video, 16 gb ram, 28 inch monitor, wall mount for monitor so I can have two 28's with one the long way for source code and perhaps something else.. Anyway $1200 is the cost.
    Now this is my first bare-bones experience too so all in all it is exciting to get such a dream machine and I am happy to step forward and support AMD

    I don't know what awaits when the memory arrives and I boot up but it feels like Starship already and I have vowed to learn OpenMP under GCC to advance into multi-core programming.

    So perhaps there will be issues. perhaps this is not all that nor is it wat will come but from where I am at I am still on the AMD home team and my money is flowing in the economy.

    I went from trs 80 to Amiga then to twin AMD single core chips on one Motherboard, Moved to the early quad cores dreaming of dual quad cores when a system with 8 cores of that day would have cost $4900 and now picked up a system that as a boy in 1973 I would have considered Alien-ufo technology for about what I paid for dual single core chips just a few years ago.

    So BullDozer can't be all that bad. The price is good! I will see how she runs. I often peg cores at 100% for days when searching for RSA factors.. Looks like I get more bang for the same bucks this time and I am all for that.

    Thank you AMD for such a wonderful cpu. I plan to make use and thanks to the motherboard I can watch out for heat issues much easier than ever,

    Not to mention it looks like the sound system is way advanced over the last computer as well.

    So from a consumer / hobby programmer point of view this is very cool indeed.

    Ernst
  • mumbles - Sunday, February 12, 2012 - link

    Thank you for being the first to actually contribute some real world response to this architecture. So many trolls on this thread that are intel fanboys.

    Also, if your using xen with this thing, I would be interested in seeing some feedback on how multiple guests(like more than 4) act when trying to fight for floating point processor time. Be interesting also to see if 4 floating point threads and 4 integer threads can all run at the same time with no waiting. That might be asking too much for now tho.

Log in

Don't have an account? Sign up now