The new methodology

At Anandtech, giving you real world measurements has always been the goal of this site. Contrary to the vast majority of IT sites out there, we don’t believe in letting some consultant or analyst spell it out for you.  We give you our measurements, as close to the real world as possible. We give you our opinion based on those measurements, but ultimately it is up to you to decide how to interpret the numbers.  You tell us in our comment box if we make a mistake in our thoughts somewhere. And we will investigate it, and get back to you. It is a slow process, but we firmly believe in it. And that is what happened in our article about  “dynamic power management”and “testing low power CPUs”

The former article was written to understand how the current power management techniques work. We needed a very easy, well understood benchmark to keep the complexity down. And it allowed us to learn a lot about the current Dynamic Voltage and Frequency Scaling (DVFS) techniques that AMD and Intel use. But as we admitted, our Fritz Chess benchmark was and is not a good choice if you wanted to apply this new insights to your own datacenter.

“Testing low power CPUs” went much less in depth,  but used a real world benchmark: our vApus Mark I, which simulates a heavy consolidated virtualization load. The numbers were very interesting, but the article had one big shortcoming: it only measured at 90-100% workload or idle. The reason for this is that the vApus benchmark score was based upon throughput. And to measure the throughput of a certain system, you have to stress it close to the maximum. So we could not measure performance accurately unless we went for the top performance. And that is fine for an HPC workload, but not for a commercial virtualization/database/web workload.

Therefore we went for a different approach based upon our reader's feedback. We launched “one tile” of the vApus benchmark on each of tested servers. Such a tile consists of a OLAP database (4 vCPUs), an OLTP database (4 vCPUs) and two web VMs (2 vCPUs). So in total we have 12 virtual CPUs. These 12 virtual CPUs are much less than what a typical high-end dual CPU server can offer. From the point of view of the Windows 2008, Linux or VMware ESX scheduler, the best Xeon 5600 (“Westmere”) and Opteron 6100 (“Magny-cours”) can offer 24 logical or physical cores. To the hypervisor, those logical or physical cores are Hardware Execution Contexts (HECs). The hypervisor schedules VMs onto these HECs.  Typically each of the 12 virtual cores needs somewhere between 50 and 90% of one core. Since we have twice the number of cores or HECs than required, we expect the typical load on the complete system to hover between 25 and 45%.  And although it is not perfect, this is much closer to the real world. Most virtualized servers never run idle for a long time: with so many VMs, there is always something to do. System administrators also want to avoid CPU loads over 60-70% as this might make the response time go up exponentially.

There is more. Instead of measuring throughput, we focus on response time. At the end of the day, the number of pages that your server can maximally serve is nice to know, but not important. The response time that your system offers at a certain load is much more important. Users will appreciate low response times. Nobody is going to be happy about the fact that your server can serve up to 10.000 request per second if each page takes 10 seconds to load.

Lowering the energy costs Hardware configuration and measuring power
Comments Locked

49 Comments

View All Comments

  • WillR - Thursday, July 15, 2010 - link

    10p per kWh may be low in the UK but it's not for residential in the US. $.12/kWh is average. Just pulled out a bill from earlier this year and we paid 7.3 cents per kWh at my house. What it comes down to is the data center potentially overcharges people in the, using your numbers, 19 to 38 cents per kWh range, but rates can be higher than $.20/kWh in high density areas like NYC or SF. The extra costs should go to paying for upgrades and expansion of their infrastructure so it's not unreasonable.

    Worth mentioning to put in perspective is 4 250 watt servers uses 720kWh/month and the average house in the US uses 920kWh/month, so it's not really as simple a setup as one might initially think.

    http://www.eia.doe.gov/cneaf/electricity/esr/table... provides a nice table of average rates and usages.
  • knedle - Thursday, July 15, 2010 - link

    I'm not sure if you're aware, but in most countries residential has much (at least twice) lower price per 1kWh, than commercial. Also commercial pays extra for using electricity during the day, and gets electricity cheaper during night.
    This is why there are some factories in Europe that work only during night.
  • WillR - Thursday, July 15, 2010 - link

    That is not the case in the US. Residential pays higher rates than either Commercial or Industrial users.

    http://www.eia.doe.gov/cneaf/electricity/epm/table...

    Average Retail Price (Cents/kWh)
    Items Mar-10 Mar-09
    Residential 11.2 11.33
    Commercial 10.03 10.07
    Industrial 6.5 6.79

    Industrial settings tend to use very large amounts of energy in a very small area or number of clients so they get cheaper bulk rates for purchasing a lot with little administrative overhead. It's also often the case they can get a high voltage line installed directly to the plant which is expensive to install but increases efficiency dramatically.

    These averages may reflect heavy use of off-peak consumption, but most plants I've experienced operate 24/7. Much of it is politics and bargaining for a better rate on the contract.
  • DaveSylvia - Thursday, July 15, 2010 - link

    Hey Johan, great article as usual! Always enjoyed and appreciated your articles including those from back in the day at Ace's Hardware!
  • JohanAnandtech - Thursday, July 15, 2010 - link

    Good memory :-). I have been part of Anand's team for 6 years now, that is the same amount of time that I spend at Ace's.
  • DaveSylvia - Thursday, July 15, 2010 - link

    Yeah! One of the first tech articles I recall reading was back in 1999. It was about how pipeline length influenced CPU clock speeds. You used Pentium II, K6, DEC Alpha's as examples :). All good stuff!
  • MrSpadge - Thursday, July 15, 2010 - link

    Or you could say that disabling turboboost (by using the power plan “balanced”) results in an 10% throughput disadvantage.


    Isn't there a power plan which lets the CPUs turbo up (as max performance does) and also lets them clock down if not needed (as balanced does)? It seems outright stupid to take turbo away from a Nehalem-like chip.

    MrS
  • has407 - Thursday, July 15, 2010 - link

    No reason you shouldn't be able to specify a power plan that does both, but for whatever reason it isn't provided out-of-the-box.

    I'd guess that given the relatively small difference in idle power between "performance" and "balanced" (which seems to be more of "power capped" plan), maybe they (presumably the OEM?) decided it wasn't worth it.

    There may also be stability issues with some system configurations or support concerns, as there's yet another set of variables to deal with.
  • has407 - Thursday, July 15, 2010 - link

    Johan -- That brings up an interesting question: How much of the underlying CPU's power management are you testing vs. a particular vendor or OS configuration? I'd expect them to closely reflect each other assuming everyone has their job.

    As you're using Win 2008, it would be interesting to see what powercfg.exe shows for the various parameters for different modes and systems; e.g., "Busy Adjust Threshold", "Increase Policy", "Time Check", "Increase Percent", "Domain Accounting Policy" etc. Are there significant differences across systems/CPUs for the same profile?
  • Whizzard9992 - Thursday, July 15, 2010 - link

    Heat dissipation is also a concern, no? It's expensive to cool a datacenter. Low power should bring cooling costs down.

    There's also a question of density. You can fit more low-power cores into 1U of space because of the heat dissipation. Multi-node blades are cheaper than 2U workhorses. Rack space is expensive for a lot of reasons. Just look at the Atom HPC servers: I bet the Atom would score pretty low in performance-per-watt versus even the LP XEON, but its sheer size and thermal envelope fit it in places the XEON can't.

    Frankly, I'd be surprised if the low-power XEON saved "energy" at the same workloads versus full-power, given that both are on the same architecture. LP XEONs are really an architecture choice, and greasing the transition to many-cores via horizontal scaling. A good desktop analogy would be, "Is one super-fast core better than a slower multi-core?" Fortunately for the datacenter most servers only need one or the other.

    Also, with physical nodes scaling out horizontally, entire nodes can be powered down during down times, with significant power savings. This is software-bound, and I haven't seen this in action yet, but it's a direction nonetheless.

    Without getting into all of the details, I think a proper TCO analysis is in order. This article seems to really only touch on the actual power-consumption, where there are really no surprises. The full-power peaks performance a little better, and the LP stays within a tighter thermal-envelope.

    The value of low-power is really low-heat in the datacenter. I'd like to see something that covers node density and cooling costs as well. A datacenter with all LP-servers is unrealistic, seeing as how some applications that scale vertically will dictate higher-performing processors. It would be nice to see what the total cost would be for, say a 2,000 node data center with 80% LP population versus 20% LP population. The TDP suggests a 1/3 drop in cooling costs and 1/3 better density.

Log in

Don't have an account? Sign up now