Power Management and Real Turbo Core

Like Llano, Bulldozer incorporates significant clock and power gating throughout its design. Power gating allows individual idle cores to be almost completely powered down, opening up headroom for active cores to be throttled up above and beyond their base operating frequency. Intel's calls this dynamic clock speed adjustment Turbo Boost, while AMD refers to it as Turbo Core.

The Phenom II X6 featured a rudimentary version of Turbo Core without any power gating. As a result, Turbo Core was hardly active in those processors and when it was on, it didn't stay active for very long at all.

Bulldozer's Turbo Core is far more robust. While it still uses Llano's digital estimation method of determining power consumption (e.g. the CPU knows ALU operation x consumes y-watts of power), the results should be far more tangible than what we've seen from any high-end AMD processor in the past.

Turbo Core's granularity hasn't changed with the move to Bulldozer however. If half (or fewer) of the processor cores are active, max turbo is allowed. If any more cores are active, a lower turbo frequency can be selected. Those are the only two frequencies available above the base frequency.

AMD doesn't currently have a Turbo Core monitoring utility so we turned to Core Temp to record CPU frequency while running various workloads to measure the impact of Turbo Core on Bulldozer compared to Phenom II X6 and Sandy Bridge.

First let's pick a heavily threaded workload: our x264 HD benchmark. Each run of our x264 test is composed of two passes: a lightly threaded first pass that analyzes the video, and a heavily threaded second pass that performs the actual encode. Our test runs four times before outputting a result. I measured the frequency of Core 0 over the duration of the test.

Let's start with the Phenom II X6 1100T. By default the 1100T should run at 3.3GHz, but with half or fewer cores enabled it can turbo up to 3.7GHz. If Turbo Core is able to work, I'd expect to see some jumps up to 3.7GHz during the lightly threaded passes of our x264 test:

Unfortunately we see nothing of the sort. Turbo Core is pretty much non-functional on the Phenom II X6, at least running this workload. Average clock speed is a meager 3.31GHz, just barely above stock and likely only due to ASUS being aggressive with its clocking.

Now let's look at the FX-8150 with Turbo Core. The base clock here is 3.6GHz, max turbo is 4.2GHz and the intermediate turbo is 3.9GHz:

Ah that's more like it. While the average is only 3.69GHz (+2.5% over stock), we're actually seeing some movement here. This workload in particular is hard on any processor as you'll see from Intel's 2500K below:

The 2500K runs at 3.3GHz by default, but thanks to turbo it averages 3.41GHz for the duration of this test. We even see a couple of jumps to 3.5 and 3.6GHz. Intel's turbo is a bit more consistent than AMD's, but average clock increase is quite similar at 3%.

Now let's look at the best case scenario for turbo: a heavy single threaded application. A single demanding application, even for a brief period of time, is really where these turbo modes can truly shine. Turbo helps launch applications quicker, make windows appear faster and make an easy time of churning through bursty workloads.

We turn to our usual favorite Cinebench 11.5, as it has an excellent single-threaded benchmark built in. Once again we start with the Phenom II X6 1100T:

Turbo Core actually works on the Phenom II X6, albeit for a very short duration. We see a couple of blips up to 3.7GHz but the rest of the time the chip remains at 3.3GHz. Average clock speed is once again, 3.31GHz.

Bulldozer does far better:

Here we see blips up to 4.2GHz and pretty consistent performance at 3.9GHz, exactly what you'd expect. Average clock speed is 3.93GHz, a full 9% above the 3.6GHz base clock of the FX-8150.

Intel's turbo fluctuates much more frequently here, moving between 3.4GHz and 3.6GHz as it runs into TDP limits. The average clock speed remains at 3.5GHz, or a 6% increase over the base. For the first time ever, AMD actually does a better job at scaling frequency via turbo than Intel. While I would like to see more granular turbo options, it's clear that Turbo Core is a real feature in Bulldozer and not the half-hearted attempt we got with Phenom II X6. I measured the performance gains due to Turbo Core across a number of our benchmarks:

Average performance increased by just under 5% across our tests. It's nothing earth shattering, but it's a start. Don't forget how unassuming the first implementations of Turbo Boost were on Intel architectures. I do hope with future generations we may see even more significant gains from Turbo Core on Bulldozer derivatives.

Independent Clock Frequencies

When AMD introduced the original Phenom processor it promised more energy efficient execution by being able to clock each core independently. You could have a heavy workload running on Core 0 at 2.6GHz, while Core 3 ran a lighter thread at 1.6GHz. In practice, we felt Phenom's asynchronous clocking was a burden as the CPU/OS scheduler combination would sometimes take too long to ramp up a core to a higher frequency when needed. The result, at least back then, was that you'd get significantly lower performance in these workloads that shuffled threads from one core to the next. The problem was so bad that AMD abandoned asynchronous clocking altogether in Phenom II.

The feature is back in Bulldozer, and this time AMD believes it will be problem free. The first major change is with Windows 7, core parking should keep some threads from haphazardly dancing around all available cores. The second change is that Bulldozer can ramp frequencies up and down much quicker than the original Phenom ever could. Chalk that up to a side benefit of Turbo Core being a major part of the architecture this time around.

Asynchronous clocking in Bulldozer hasn't proven to be a burden in any of our tests thus far, however I'm reluctant to embrace it as an advantage just yet. At least not until we've had some more experience with the feature under our belts.

The Pursuit of Clock Speed The Impact of Bulldozer's Pipeline
Comments Locked

430 Comments

View All Comments

  • THizzle7XU - Wednesday, October 12, 2011 - link

    Well, why would you target the variable PC segment when you can program for a well established, large user-base platform with a single configuration and make a ton more money with probably far less QA work since there's only one set (two for multi-platform PS3 games) of hardware to test?

    And it's not like 360/PS3 games suddenly look like crap 5-6 years into their cycles. Think about how good PS2 games looked 7 years into that system's life cycle (God of War 2). Devs are just now getting the most of of the hardware. It's a great time to be playing games on 360/PS3 (and PC!).
  • GatorLord - Wednesday, October 12, 2011 - link

    Consider what AMD is and what AMD isn't and where computing is headed and this chip is really beginning to make sense. While these benches seem frustrating to those of us on a desktop today I think a slightly deeper dive shows that there is a whole world of hope here...with these chips, not something later.

    I dug into the deal with Cray and Oak Ridge, and Cray is selling ORNL massively powerful computers (think petaflops) using Bulldozer CPUs controlling Nvidia Tesla GPUs which perform the bulk of the processing. The GPUs do vastly more and faster FPU calculations and the CPU is vastly better at dishing out the grunt work and processing the results for use by humans or software or other hardware. This is the future of High Performance Computing, today, but on a government scale. OK, so what? I'm a client user.

    Here's what: AMD is actually best at making GPUs...no question. They have been in the GPGPU space as long as Nvidia...except the AMD engineers can collaborate on both CPU and GPU projects simultaneously without a bunch of awkward NDAs and antitrust BS getting in the way. That means that while they obviously can turn humble server chips into supercomputers by harnessing the many cores on a graphics card, how much more than we've seen is possible on our lowly desktops when this rebranded server chip enslaves the Ferraris on the PCI bus next door...the GPUs.

    I get it...it makes perfect sense now. Don't waste real estate on FPU dies when the one's next door are hundreds or thousands of times better and faster too. This is not the beginning of the end of AMD, but the end of the beginning (to shamlessely quote Churchill). Now all that cryptic talk about a supercomputer in your tablet makes sense...think Llano with a so-so CPU and a big GPU on the same die with some code tweaks to schedule the GPU as a massive FPU and the picture starts taking shape.

    Now imagine a full blown server chip (BD) harnessing full blown GPUs...Radeon 6XXX or 7XXX and we are talking about performance improvements in the orders of magnitude, not percentage points. Is AMD crazy? I'm thinking crazy like a fox.

    Oh..as a disclaimer, while I'm long AMD...I'm just an enthusiast like the rest of you and not a shill...I want both companies to make fast chips that I can use to do Monte Carlos and linear regressions...it just looks like AMD seems to have figured out how to play the hand they're holding for change...here's to the future for us all.
  • Menoetios - Wednesday, October 12, 2011 - link

    I think you bring up a very good point here. This chip looks like it's designed to be very closely paired with a highly programmable GPU, which is where the GPU roadmaps are leading over the next year and a half. While the apples-to-apples nature of this review draw a disappointing picture, I'm very curious how AMD's "Fusion" products next year will look, as the various compute elements of the CPU and GPU become more tightly integrated. Bulldozer appears to fit perfectly in an ecosystem that we don't quite have yet.
  • GatorLord - Wednesday, October 12, 2011 - link

    Exactly. Ecosystem...I like it. This is what it must feel like to pick up a flashlight at the entrance to the tunnel when all you're used to is clubs and torches. Until you find the switch, it just seems worse at either...then viola!
  • actionjksn - Wednesday, October 12, 2011 - link

    Wow I hope that made you feel better about the crappy chip also known a "Man With A Shovel"
    I was just hoping AMD would quit forcing Intel to have to keep on crippling their chips, just to keep them from putting AMD out of business. AMD better fix this abortion quick, this is getting old.
  • GatorLord - Thursday, October 13, 2011 - link

    Feeling fine. Not as good in the short run, but feeling better about the long run. Unfortunately, due to constraints, it takes AMD too long to get stuff dialed in and by the time they do, Intel has already made an end run and beat them to the punch.

    Intel can do that, they're 40x as big as AMD. Actually, and this may sound crazy until you digest it, the smartest thing Intel could do is spin off a couple of really good dev labs as competitors. Relying on AMD to drive your competition is risky in that AMD may not be able to innovate fast enough to push Intel where it could be if they had more and better sharks in the water nipping at their tails.

    You really need eight or more highly capable, highly aggressive competitors to create a fully functioning market free of monopolistic and oligopolistic sluggishness and BS hand signalling between them. This space is too capital intensive for that at the time being with the current chip making technology what it is.
  • yankeeDDL - Wednesday, October 12, 2011 - link

    Just to be the devil's advocate ...
    The launch event in London sported 2 PC, side by side, running Cinebench.
    One had the core i5-2500k, the other the FX8150.
    Of course, these systems are prepared by AMD, so the results from Anand are clearly more reliable (at least all the conditions are documented).
    Nevertheless, it is clear that in the demo from AMD, the FX runs faster. Not by a lot, but it is clearly faster than the i5.
    Video: http://www.viddler.com/explore/engadget/videos/335...

    Even so, assuming that this was a valid datapoint, things won't change too much: the i5-2500k is cheaper and (would be) slightly slower than the FX8150 in the most heavily threaded benchmark. But it would be slightly better than Anand's results show.
  • KamikaZeeFu - Wednesday, October 12, 2011 - link

    "Nevertheless, it is clear that in the demo from AMD, the FX runs faster. Not by a lot, but it is clearly faster than the i5."

    Check the review, cinebench r11.5 multithreaded chart.
    Anand's numbers mirror the ones by AMD. Multithreaded workloads are the only case where the 8150 will outperform an i5 2500k because it can process twice the amount of threads.

    Really disappointed in AMD here, but I expected subpar performance because it was eerily quiet about the FX line as far as performance went.

    Desktop BD is a full failure, they were aiming for high clock speeds and made sacrifices, but still failed their objective. By the time their process is mature and 4 GHz dozers hit the channel, Ivy bridge will be out.

    As far as server performance goes, not even sure they will succeed there.
    As seen in the review, clock for clock performance isn't up compared to the prvious generation, and in some cases it's actually slower. Considering that servers run at lower clocks in the first place, I don't see BD being any threat to intels server lineup.

    4 years to develop this chip, and their motto seemed to be "we'll do netburst but in not-fail"
  • medi01 - Wednesday, October 12, 2011 - link

    So CPU is a bottleneck in your games eh?
  • TekDemon - Wednesday, October 12, 2011 - link

    It's not but people don't buy CPUs for today's games, generally you want your system to be future proof so the more extra headroom there is in these CPU benchmarks the better it holds up over the long term. Look back at CPU benchmarks from 3-4 years ago and you'll see that the CPUs that barely passed muster back then easily bottleneck you whereas CPUs that had extra headroom are still usable for gaming. For example the Core 2 Duo E8400 or E8500 is still a very capable gaming CPU, especially when given a mild overclock and frankly in games that only use a few threads (like Starcraft 2) it gives Bulldozer a run for the money.
    I'm not a fanboy either way since I own that E8400 as well as a Phenom II (unlocked to X4, OC'ed to 3.9Ghz) and a i5 2500K but if I was building a new system I sure as heck would want extra headroom for future-proofing.
    That said? Of course these chips will be more than enough power for general use. They're just not going to be good for high end systems. But in a general use situation the problem is that the power consumption is just crappy compared to the intel solutions, even if you can argue that it's more than enough power for most people why would you want to use more electricity?

Log in

Don't have an account? Sign up now