Bulldozer's Power Management

AMD confirmed that the power management of the Bulldozer core is an improved version of the power management improvements that are part of the “Llano” CPU. Just like Llano, Bulldozer has a Digital APM Module. The APM modules samples a number of performance counter signals and these samples are used to estimate dynamic power with 98% accuracy. Now combine this power estimate with Bulldozer's power gating at the module level and vastly improved clock gating and you can start to understand what is possible. 


Bulldozer reduces the number of active and power consuming circuits by vastly improved clock gating

If your application runs only one or a few threads on your 8-module, 16-core Interlagos CPU, several of those modules might be power gated. Or if you run integer-only threads, the fact that quite a few unused parts (i.e. the FPU) of the module will be clockgated might be enough to stay under the configured TDP. So in those cases, it won't be necessary to limit the clock speed. And that is really great, especially in the real world.

In the real world, only a few HPC application behave like the SPEC CPU rate benchmarks, which spawns threads accross all cores.  Most server applications do not fully utilize all available cores all the time. Sometimes, only one thread will be really critical and the perceived application performance will depend on it. A little bit later several threads might demand CPU power (but not all cores will be busy). Only a certain percentage of the time are all the cores used. That is exactly the reason why the cheaper Magny-Cours make so much sense for HPC applications, yet it struggles to keep up with the higher clocked, higher IPC Xeon Westmere cores when running OLTP and ERP applications. Putting a power cap on a Magny-Cours means even lower frequencies, and as a result even higher response times (as we have measured here). 

By adding power consumption measurements to the CPU, Bulldozer will run most server applications at full speed unless you lower the TDP too far. (Obviously, if the TDP is lowered enough, the CPU will not be able to operate at higher frequencies, thus degrading the response time performance too.) The maximum throughput will be a little bit lower, but most server applications almost never run at maximum throughput. In fact, maximum throughput only matters for HPC applications and benchmarks. For real human users, response times are the only thing that matter.

The beauty of this new power cap system is that in normal circumstances (e.g. the server is running at 40-70% load), the response times will hardly be any longer. At the same time, the adminstrator can make sure that the server cluster does not exceed the capacity of the cooling equipment and the power lines.

This TDP Power Cap technology could be very interesting to small and medium businesses too, and not only to owners of large server clusters. TDP Power Cap could be a way to make sure that your collocated servers never exceeds the maximum amount of amps allocated to you, and as result you will not have to pay unexpected high electricity bills. However, whether or not this ideal world of low response times and low electricity bills will become a reality for the Bulldozer server owners will also depend on the availability of a good and decently priced management software tool that allows the administrator to configure the TDP on all servers simultaneously.

On a standard server, you will get a section in BIOS that allows you to tweak the TDP in 1W increments (or a maximum of 64 power settings), a good step forward compared to the current p-state setting. But to control a server cluster in an efficient way, good management software is needed. Currently, you either have to buy all your servers from the same vendor (HP for example) and then pay for management software such as HP's Insight Control software. To really unlock this technology, AMD or one of their partners needs to make sure this kind of software is widely available--some open source code perhaps?

TDP Power Cap Final Thoughts and AMD's Future Plans
POST A COMMENT

59 Comments

View All Comments

  • mino - Friday, July 15, 2011 - link

    "single-threaded performance is still a sore spot for Bobcat compared to other architectures"

    What "other architectures" ??? To my knowledge the is exactly ZERO other architectures with faster single-threaded performance at the power level Bobcat plays at.

    The faster "competitors" are either running at their lowes-feasible power levels (SB, C2D) or are vastly slower (Atom, A15 etc.).
    Reply
  • JarredWalton - Saturday, July 16, 2011 - link

    We never said "at the same power level". Bobcat is much faster than Atom, but Core 2 beats Bobcat silly, and Core 2010/2011 are even faster. Bobcat is fine for low power, low performance, decent multimedia; that's not the same as being good for general use. Reply
  • GaMEChld - Saturday, July 16, 2011 - link

    Why even go to Intel for the comparison? Bobcat loses to the old STARS cores too doesn't it? Athlon II, Phenom II, Llano? Generally it's assumed that comparisons are done between competing products for a given market or price point.

    What sense would there be in reviewing an Intel Atom chip, and then taking the time to say, well, sadly the Intel Atom does not have as good single-threaded performance as the Core i7 990X Super Jeebus Edition. Or that the Radeon 5450 does not offer superior graphics performance to the GTX 590? Well, duh!
    Reply
  • 529th - Friday, July 15, 2011 - link

    AMD seems to be highlighted alot around the word "server"

    .. just not my market.. what a let down for the pursuit in competitive CPUs for Intel desktop

    fee nom - whutever
    Reply
  • shmmy - Friday, July 15, 2011 - link

    Wow really? Do you people really need to nitpick the details on stuff thats not even out yet? 8 cores 10 cores who the heck cares get back to work slackers! :) Reply
  • JarredWalton - Sunday, July 17, 2011 - link

    Core 2 ULV (all the CULV stuff from early 2010) already offered us power levels similar to Bobcat, with better per-core performance. What it didn't offer was the GPU side of things, which is why Optimus was useful. As the article here states, "single-threaded performance is still a sore spot for Bobcat compared to other architectures", it seemed fairly obvious that we're discussing Bobcat in the greater market, not just Bobcat in low-power uses. And yet, Mino went and complained regardless.

    For those interested in a few comparisons:

    Core 2 SU7200 @ 1.3GHz w/GMA4500
    (ASUS UL80Vt: http://www.anandtech.com/show/2886)
    PCMark Vantage: 2993
    CB10 1CPU: 1643
    CB10 SMP: 3138
    x264 1st Pass: 18.12 FPS
    x264 2nd Pass: 4.5 FPS
    Idle Power: ~5.94W
    Internet Power: ~8.59W
    H.264 Power: ~13.96W

    Core i3-330UM @ 1.2GHz w/HD Graphics
    (ASUS UL80Jt: http://www.anandtech.com/show/4009)
    PCMark Vantage: 3558
    CB10 1CPU: 1724
    CB10 SMP: 3859
    x264 1st Pass: 21.45 FPS
    x264 2nd Pass: 5.67 FPS
    Idle Power: ~7.91W
    Internet Power: ~10.5W
    H.264 Power: ~17.68W

    AMD E-350 @ 1.6GHz w/6310M
    (MSI X370: http://www.anandtech.com/show/4218/)
    PCMark Vantage: 2511
    CB10 1CPU: 1158
    CB10 SMP: 2175
    x264 1st Pass: 13.96 FPS
    x264 2nd Pass: 3.43 FPS
    Idle Power: ~7.47W
    Internet Power: ~8.81W
    H.264 Power: ~13.57W

    So when Mino says that "to my knowledge the is exactly ZERO other architectures with faster single-threaded performance at the power level Bobcat plays at", he is either uninformed, ignorant, or totally biased. CULV way back in late 2009 offered 42% higher single-threaded performance than Bobcat in early 2011, with lower power requirements. Core 2010 ULV improved performance further at the cost of power requirements: it's 49% faster but uses 6% to 30% more power than Bobcat. Either way, performance per watt both CULV and i3-ULV do better than Bobcat. They also have much worse IGPs, so it's not a complete loss for AMD.

    Even so, architecturally I don't think Bobcat has a lot of legs. Going quad-core does nothing for single-threaded performance, and multi-threaded performance on a low power design is sort of silly to discuss. It's the same problem I have with ARM: sure, they can do low power really well, but what happens when you need more performance? For many tasks, a 2.0GHz dual-core ARM is no worse than a 2.0GHz quad-core ARM, and in raw computer performance even Atom is likely faster than ARM right now.

    Windows 8 running on ARM is going to be interesting; can the chip even handle a full OS like Windows? Will it do so while still offering good battery life? I'd say Bobcat is the bare minimum performance we need for a full Windows OS to work well, and Bobcat is at least twice as fast as Atom. Will ARM manage to equal Bobcat next year? I wouldn't bet on it, but maybe I'll be wrong.
    Reply
  • zgoodbbb - Wednesday, July 20, 2011 - link

    http://www.ifancyshop.com

    Women's fashion, men's personality + shoes

    Travel bagthat eye-catching jacket + super pack free shipping
    Reply
  • morohmoroh - Friday, July 22, 2011 - link

    i have two hand

    5 finger in my left hand and 5 finger in right hand

    i cannot grab a 1 rock with 1 finger , i still can grab it with boths hand with each 2 finger but still hard then decided grab a rock with 3 or 5 finger

    now i have 8 hand and 40 finger how s bout that?

    if i can grab a rock with invisible hand and finger it look like a magic

    my question is core = brain or core = hand with finger?

    cheers
    Reply
  • Cyberius - Tuesday, September 20, 2011 - link

    I hope bulldozer for desktop has tdp power cap included in the amd overdrive utility like the radeon 6900 catalyst utility. That would be a great option for us. Reply

Log in

Don't have an account? Sign up now