Saving Power at Low Load

Measuring idle power is important in some applications as operating system schedulers may choose to "race to idle", i.e. perform the task as quickly as possible so the CPU can return to an idle state. This strategy is only worthwhile if the idle state consumes very little power, but lots of server applications are running at relatively low but almost never "zero" load. One example is a web server that is visited all around the globe. Thus it is equally interesting to see how the processors deal with this kind of situation. We started Fritz Mark up with two threads to see how the operating system and hardware cope with this. First we look at the delivered performance.

Fritzmark integer processsing: 2 thread performance

In performance mode, the Xeon L3426 is capable of pushing clock speed up to 2.66GHz, but not always. Performance is equal to a similar Xeon at 2.5GHz. This in contrast with the Xeon X3470 which can almost always keep its clock speed at 3.33GHz, and as such delivers performance that is equal to a Xeon that would run always at that speed. The reason for this difference is that the PCU of the L3426 has less headroom: it cannot dissipate more than 45W while the X3470 is allowed to dissipate up to 95W. Still, the performance boost is quite impressive: Turbo Boost offers 34% better performance on the L3426 compared to the "normal" 1.86GHz clock.

Now let's confront the performance levels with the power consumption.

integer processing: 2 threads

The six-core Opteron is clearly a better choice than its faster clocked quad-core sibling. In power saving mode it is capable of reducing the power by 8W more while offering the same level of performance. It is a small surprise: do not forget that the "Istanbul" Opteron has twice as many idle cores that are leaking power than the "Shanghai" CPU.

The Nehalem based core offers very high performance per thread, about 40% higher than the Opteron's architecture is capable of achieving, but it does come with a price, as we see power shoot up very quickly. Part of the reason is of course is that the Nehalem is more efficient at idle. We assume - based on early component level power measurements - that the idle power of the Xeons is about 9W (power plan Balanced), the Opterons about 14W (power plan Balanced). Note that the exact numbers are not really important. Since the RAM is hardly touched, we assume that power is only raised by 1W per DIMM on average. Based on our previous assumptions we can estimate CPU + VRM power, measured at the outlet.

System Power Estimates
System Power Calculation CPU + VRM Power Notes
Xeon X3470 performance 119W - 4W (4 x 1W per DIMM) - 60W idle + 13W CPU = 68W (idle power of system was 73W = 13W CPU, 60W for the rest of the system)
Xeon L3426 performance 99W - 4W - 60W + 11W = 46W  
Xeon L3426 90W - 4W - 60W + 9W = 35W  
Opteron 2435 performance 102W - 4W - 70W idle + 18W = 42W (total idle power was 88W, 18W CPU)
Opteron 2435 balanced 100W - 4W - 70W idle + 14W = 40W  
Opteron 2389 performance 114W - 4W - 70W idle + 22W = 62W  

First of all, you might be surprised that the Turbo Boosted L3426 needs 46W. Don't forget this is measured at the power outlet, so 46W at 90% efficiency means that the CPU + VRMs got 41W delivered. Yes, these numbers are not entirely accurate, but that is not the point. Our component level power measurements still need some work, but we have reason to assume that the numbers above are close enough to draw some conclusions.

  1. AMD's platform consumes a bit too much at idle, but...
  2. The six-core Opteron CPUs are much more efficient than the quad-core in these circumstances
  3. Intel's 95W Xeons offer stellar performance but the high IPC requires quite a bit of power
  4. The low power versions offer an excellent performance / Watt ratio

So if we take the platform out of the picture, the low power Xeon with Turbo Boost consumes about the same as the "normal" six-core Opteron, but performance is 16% better. Is this a success or a failure? Did Intel's Power Controller Unit save a considerable amount of power? Or in other words, would the power of the Xeons be much higher if they didn't have a PCU? Let's dive deeper.

Our Benchmark Choice Analysis: What Happened?
Comments Locked

35 Comments

View All Comments

  • JohanAnandtech - Tuesday, January 19, 2010 - link

    Well, Oracle has a few downsides when it comes to this kind of testing. It is not very popular in the smaller and medium business AFAIK (our main target), and we still haven't worked out why it performs much worse on Linux than on Windows. So chosing Oracle is a sure way to make the projecttime explode...IMHO.
  • ChristopherRice - Thursday, January 21, 2010 - link

    Works worse on Linux then windows? You have a setup issue likely with the kernel parameters or within oracle itself. I actually don't know of any enterprise location that uses oracle on windows anymore. "Generally all Rhel4/Rhel5/Sun".
  • TeXWiller - Monday, January 18, 2010 - link

    The 34xx series supports four quad rank modules, giving today a maximum supported amount of 32GB per CPU (and board). The 24GB limit is that of the three channel controller with unbuffered memory modules.
  • pablo906 - Monday, January 18, 2010 - link

    I love Johan's articles. I think this has some implications in how virtualization solutions may be the most cost effective. When you're running at 75% capacity on every server I think the AMD solution could have possibly become more attractive. I think I'm going to have to do some independent testin in my datacenter with this.

    I'd like to mention that focusing on VMWare is a disservice to Vt technology as a whole. It would be like not having benchmarked the K6-3+ just because P2's and Celerons were the mainstream and SS7 boards weren't quite up to par. There are situations, primarily virtualizing Linux, where Citrix XenServer is a better solution. Also many people who are buying Server '08 licenses are getting Hyper-V licenses bundled in for "free."

    I've known several IT Directors in very large Health Care organization who are deploying a mixed Hyper-V XenServer environment because of the "integration" between the two. Many of the people I've talked to at events around the country are using this model for at least part of the Virtualization deployments. I believe it would be important to publish to the industry what kind of performance you can expect from deployments.

    You can do some really interesting HomeBrew SAN deployments with OpenFiler or OpeniSCSI that can compete with the performance of EMC, Clarion, NetApp, LeftHand, etc. NFS deployments I've found can bring you better performance and manageability. I would love to see some articles about the strengths and weaknesses of the storage subsystem used and how it affects each type of deployment. I would absolutely be willing to devote some datacenter time and experience with helping put something like this together.

    I think this article really lends itself well into tieing with the Virtualization talks and I would love to see more comments on what you think this means to someone with a small, medium, and large datacenter.
  • maveric7911 - Tuesday, January 19, 2010 - link

    I'd personally prefer to see kvm over xenserver. Even redhat is ditching xen for kvm. In the environments I work in, xen is actually being decommissioned for VMware.
  • JohanAnandtech - Tuesday, January 19, 2010 - link

    I can see the theoretical reasons why some people are excited about KVM, but I still don't see the practical ones. Who is using this in production? Getting Xen, VMware or Hyper-V do their job is pretty easy, KVM does not seem to be even close to being beta. It is hard to get working, and it nowhere near to Xen when it comes to reliabilty. Admitted, those are our first impressions, but we are no virtualization rookies.

    Why do you prefer KVM?
  • VJ - Wednesday, January 20, 2010 - link

    "It is hard to get working, and it nowhere near to Xen when it comes to reliabilty. "

    I found Xen (separate kernel boot at the time) more difficult to work with than KVM (kernel module) so I'm thinking that the particular (host) platform you're using (windows?) may be geared towards one platform.

    If you had to set it up yourself then that may explain reliability issues you've had?

    On Fedora linux, it shouldn't be more difficult than Xen.
  • Toadster - Monday, January 18, 2010 - link

    One of the new technologies released with Xeon 5500 (Nehalem) is Intel Intelligent Power Node Manager which controls P/T states within the server CPU. This is a good article on existing P/C states, but will you guys be doing a review of newer control technologies as well?

    http://communities.intel.com/community/openportit/...">http://communities.intel.com/community/...r-intel-...
  • JohanAnandtech - Tuesday, January 19, 2010 - link

    I don't think it is "newer". Going to C6 for idle cores is less than a year old remember :-).

    It seems to be a sort of manager which monitors the electrical input (PDU based?) and then lowers the p-states to keep the power at certain level. Did I miss something? (quickly glanced)

    I think personally that HP is more onto something by capping the power inside their server management software. But I still have to evaluate both. We will look into that.
  • n0nsense - Monday, January 18, 2010 - link

    May be i missed something in the article, but from what I see at home C2Q (and C2D) can manage frequencies per core.
    i'm not sure it is possible under Windows, but in Linux it just works this way. You can actually see each core at its own frequency.
    Moreover, you can select for each core which frequency it should run.

Log in

Don't have an account? Sign up now