Not So Fast!

Power management, especially dynamic voltage and frequency scaling, does come with a performance cost. Since its introduction both Intel and AMD have been claiming that this performance cost is "negligible", but we all know better now. On dual-core Athlon X2 and Phenom I, it was for example impossible to use DVFS and get decent HD-video decoding. There are three important performance problems with dynamic power management:

  1. Transitioning from one P-state to another takes a while, especially if you scale up.
  2. Active cores will probe idle or lower P-state cores quite frequently.
  3. The OS power manager has to predict whether or not the process will need more processing power soon or not. As a result the OS transitions a lot slower than the hardware.

Suppose that the OS decides that the CPU can clock down to a lower P-state. Just a few ms later, a running process requires a lot more performance. The result is that the voltage must be increased and this takes a while. During that time, the CPU is wasting more power than it should: processing is suspended for a small time and the clock speed cannot increase unless the higher voltage is reached and is stable enough. If this scenario is repeated a lot, the small power savings of going to a lower P-state will be overshadowed by the power losses of scaling quickly back up to a higher clock and voltage. It is important to understand that each voltage increase results in a small period where power is wasted without any processing happening. The same problem is true for entering a C-state: enter it too quickly and performance is lowered as it takes some time to wake that core up again.

The last problem is a bit more subtle: if you lower the P-state of one core, another core that sends a snoop towards this "slow" core will get a much slower answer. As a result the performance of the active core will be lower. According to some researchers [5], this performance decrease is about 5% at 800MHz on a "Barcelona" Opteron. If P-states could go as low as 400MHz, the performance impact would be 30% and more! That is the reason why lower P-states are not used: a core with P-states lower than 800MHz would wreak havoc on the performance/watt ratio of the CPU. That is also why "Smart Fetch" dumps the L1 and L2 caches in the L3 cache. This avoids not only waking the idle core up too soon, but it also avoids the performance hit associated with snooping a "napping" core. Intel's CPUs do not have this problem: the inclusive nature of the L3 cache means that if data cannot be found in the L3 cache, you will not find that data in any core's L1 or L2 caches.

The bottom line is that power management is quite complex: there is no silver bullet. Go to low/idle states too quickly and you end up burning more power while delivering less performance. At the same time, if the OS keeps the clock speed too high, the CPU might never achieve decent power savings. The OS must take into account the most likely behavior of the application and the capabilities of the hardware.

Power Management Technologies Our Benchmark Choice
Comments Locked

35 Comments

View All Comments

  • UrQuan3 - Thursday, January 21, 2010 - link

    I'm trying to remember for 2008, but wasn't there a way to either force or suggest thread/core affinity? It looks like the scheduler was hopping all over the place on the Opterons.
  • JarredWalton - Thursday, January 21, 2010 - link

    You guys better pay attention and answer this post, or his species will try to enslave and/or wipe out the entire galaxy! ;-)
  • mino - Wednesday, January 20, 2010 - link

    I mean, not, why do you use them for this article.
    They are fine examples of low-power platforms, even if from vastly different markets.

    But,
    WHY ON EARTH DO YOU KEEP TALKING LIKE THEY WERE COMPARABLE THROUGHOUT THE ARTICLE ???
  • IntelUser2000 - Wednesday, January 20, 2010 - link

    By the way, I don't know if you have the settings wrong or that's how it works, the Turbo Boost mode is not affected on the Home PC versions of Windows. Balanced uses Turbo Boost just as well on my Windows 7 Home Premium with Core i5 661.

  • JarredWalton - Wednesday, January 20, 2010 - link

    I was wondering this as well, but I'm not familiar with Windows Server... what I do know is that Power Saver on consumer Windows OSes really limits the CPU frequency scaling features, and it sort of looks like Balanced on the Server OS has aspects of consumer "Power Saver" as well as some elements of "Balanced". Odd to see only two power settings available, where Win7 now has at least 3 and often 5.
  • mino - Wednesday, January 20, 2010 - link

    It seems a classic example of KISS strategy of choosing the most-sensible options and so reducing decision complexity for IT people.

    Modes like "Max battery" have anyway no reason for existence on a server box.
  • RobinBee - Tuesday, January 19, 2010 - link

    If you use your pc as a music server:

    Power saving methods ruin sound quality even if using a good sound card. The problem is »electronic« sound distortion. I do not know why this happens.

    Also: The chosen number of IRQ pr. second in a net card can ruin sound quality too. Why, I do not know.
  • Anato - Tuesday, January 19, 2010 - link

    I'm interested to see results from different operating systems which may be better at controlling processes in different CPU's. Namely no CPU hopping and is their power management as efficient as Windows is.

    Most interested at:
    Linux and Solaris
  • JohanAnandtech - Tuesday, January 19, 2010 - link

    Excellent suggestion :-). Problem is to keep the application the same. We currently tested SQL Server 2008 on Windows 2008 and of course this can not be done on Linux. However, I am not stranger to linux as a server.

    I am no fan of MySQL on Windows, but maybe this has improved. Would MySQL on Windows and Linux makes sense as a comparison?
  • maveric7911 - Tuesday, January 19, 2010 - link

    Why not use oracle ;)

Log in

Don't have an account? Sign up now