TDP Power Cap

What makes these new Opterons truly intriguing is the fact that they will offer user-configurable TDP, which AMD calls TDP Power Cap. This means you can buy pretty much any CPU and then downscale the TDP to fit within your server’s power requirements. In the server market, the performance isn’t necessarily the number one concern like it is when building a gaming rig. As all the readers of our data center section are aware, what really counts is the performance per watt ratio. Servers need to be as energy efficient as possible while still providing excellent performance. 

John Fruehe (AMD) states, "With the new TDP Power Cap for AMD Opteron processors based on the upcoming 'Bulldozer' core, customers will be able to set TDP power limits in 1 watt increments." It gets even better: "Best of all, if your workload does not exceed the new modulated power limit, you can still get top speed because you aren’t locking out the top P-state just to reach a power level."

That sounds too good to be true: we can still get the best performance from our server while we limit the TDP of the CPU. Let's delve a little deeper.

Power Capping

Power capping is nothing new. The idea is not to save energy (kWh), but to limit the amount of power (Watt) that a server or a cluster of servers can use. That may sound contradictory, but it is not. If your CPU processes a task at maximum speed, it can return to idle very quickly and save power. If you cap your CPU, the task will take longer and your server will have used about the same amount of energy as the CPU spends less time in idle, where it can save power in a lower p-state or even go to sleep (C-states). So power capping does not make any sense in a gaming rig: it would reduce your fps and not save you any energy at all. Buying CPUs with lower maximum TDP is similar: our own measurements have shown that low power CPUs do not necessarily save energy compared to their siblings with higher TDP specs. 

In a data center, you have lots of servers connected to the same power lines that can only deliver a certain amount of current at a certain voltage (48, 115, 230 V...), e.g. amps. You are also limited by the heat density of your servers. So the administrator wants to make sure that the cluster of servers never exceeds the cooling capacity and the amps limitations of the power lines. Power capping makes sure that the power usage and the cooling requirements of your servers become predictable.

The current power capping techniques limit the processor P-states. Even under heavy utilization, the CPU never reaches the top frequency.  This is a rather crude and pretty poor way of keeping the maximum power under control, especially from a performance point of view. The thing to remember here is that high frequencies always improve processing performance, while extra cores only improve performance in ideal circumstances (no lock contention, enough threads, etc.). Limiting frequency in order to reduce power often results in a server running far below where it could in terms of performance and power use, just to be "safe".

Overview of Bulldozer Lineup Bulldozer's Power Management


View All Comments

  • duploxxx - Friday, July 15, 2011 - link

    according to many, anything which is branded "PENTIUM" is the uber CPU doesn't matter what is behind.... Reply
  • Broheim - Friday, July 15, 2011 - link

    >according to many


    don't have one? then gtfo.
  • formulav8 - Friday, July 15, 2011 - link

    Grow up. He was just messing around Reply
  • Broheim - Friday, July 15, 2011 - link

    no, he's a raging AMD fanboy. I have yet to see a single post from him that doesn't bash intel or praise AMD in some form or another. Reply
  • AnandThenMan - Friday, July 15, 2011 - link

    So he's the exact opposite of you. Reply
  • Broheim - Saturday, July 16, 2011 - link

    erm, I have nothing against AMD, this rig has an unlocked HD6950...

    are you just butthurt because I called you out on your bitching about Anand's benchmarking?
  • just4U - Saturday, July 16, 2011 - link

    Currently I am on a Sandy Bridge 2500k and in the last year I've been on a i7 920 a 1055T, and a few $60 amd cheapies. As far as I am concerned they are all good. I didn't notice night and day improvements like I did when I moved to the A64 and Core2. So I think we are sort of at a ceiling limit right now (excepting specific tasks) where just about any new cpu is good enough. Reply
  • JohanAnandtech - Friday, July 15, 2011 - link

    it is possible that your tests are using the x87 FPU. The Phenom can process up to 3 instructions per cycle out of order, while the P4 can hardly sustain one FP per cycle.

    Parallel, multithreaded software is of course much faster on a 6-core than a single P4 core :-).

    And it would be very hard to find a benchmark where P4 at 4 GHz is faster than a Phenom II 2.8 GHz. I can not imagine that anyone has published one. The P4 has a much slower memory interface (very high latency vs Phenom IMC), much smaller caches (16 KB vs 64 KB L1) and is outmatched in every aspect of FP processing power (64 vs 128 SIMD, Tripple fast x87 FPU vs single slow one) ...
  • SanX - Friday, July 15, 2011 - link

    Amazing was that performance increase by factor of two was per CPU of course. The whole 6-core not overclocked AMD CPU was 2.42/0.50 or almost 5 times faster then 2-core overclocked to 3.8GHz Intel E8400!

    Here are the numbers for the parallel algebra (you can download the test code from equation dot com or i have it too for different compilers) for Intel and AMD in seconds when i switch ON different amount of cores

    1 4.64 seconds
    2 2.42

    1 2.46
    2 1.22
    3 0.83
    4 0.67
    5 0.58
    6 0.50

    I invite anyone to do the test on their CPUs.
  • JarredWalton - Friday, July 15, 2011 - link

    Using 64-bit "bench1_gfortran_64.exe":

    Core 2 QX6700 @ 3.2GHz:
    1 CPU = 4.55s
    2 CPU = 2.33s
    3 CPU = 1.62s
    4 CPU = 1.34s

    Core i7-965 @ 3.6GHz:
    1 CPU = 3.93s
    2 CPU = 1.97s
    3 CPU = 1.33s
    4 CPU = 1.01s
    5 CPU = 0.87s
    6 CPU = 0.80s
    7 CPU = 0.72s
    8 CPU = 0.69s

    Of course, none of that really tells us much, because we don't know how the application was compiled or what optimizations are in place. There's only one 64-bit compiled version but there are four 32-bit compiled versions. Let's just see what happens with the 32-bit versions on the QX6700 for a second:

    Core 2 QX6700 @ 3.2GHz Absoft:
    1 CPU = 7.01s
    2 CPU = 3.54s
    3 CPU = 2.40s
    4 CPU = 1.90s

    Core 2 QX6700 @ 3.2GHz gfortran:
    1 CPU = 10.73s
    2 CPU = 5.40s
    3 CPU = 3.67s
    4 CPU = 2.87s

    Core 2 QX6700 @ 3.2GHz Intel Fortran:
    1 CPU = 4.70s
    2 CPU = 2.40s
    3 CPU = 1.76s
    4 CPU = 1.47s

    Core 2 QX6700 @ 3.2GHz Lahey/Fujitsu:
    1 CPU = 5.38s
    2 CPU = 2.73s
    3 CPU = 1.95s
    4 CPU = 1.56s

    What does that tell us? As expected, the Intel compiler version is the fastest in 32-bit mode. What's more, the gfortran 32-bit version is the slowest on Intel. Since the only 64-bit version is from gfortran, it would appear that a 64-bit Intel version would come in around twice as fast. That's only speculation based on the 32-bit compiled executables, but given your above numbers it looks like you're probably using the 64-bit version. (If not, why does my 3.2GHz quad-core outperform your 3.8GHz dual-core when looking at the 32-bit Intel speeds?)

    Anyway, there are certain types of code that AMD does quite well at running, but overall I'd say it's clear that Intel's Nehalem/Lynnfield/Sandy Bridge CPUs are significantly faster than the Phenom II X6 offerings.

Log in

Don't have an account? Sign up now