The new methodology
At Anandtech, giving you real world measurements has always been the goal of this site. Contrary to the vast majority of IT sites out there, we don’t believe in letting some consultant or analyst spell it out for you. We give you our measurements, as close to the real world as possible. We give you our opinion based on those measurements, but ultimately it is up to you to decide how to interpret the numbers. You tell us in our comment box if we make a mistake in our thoughts somewhere. And we will investigate it, and get back to you. It is a slow process, but we firmly believe in it. And that is what happened in our article about “dynamic power management”and “testing low power CPUs”.
The former article was written to understand how the current power management techniques work. We needed a very easy, well understood benchmark to keep the complexity down. And it allowed us to learn a lot about the current Dynamic Voltage and Frequency Scaling (DVFS) techniques that AMD and Intel use. But as we admitted, our Fritz Chess benchmark was and is not a good choice if you wanted to apply this new insights to your own datacenter.
“Testing low power CPUs” went much less in depth, but used a real world benchmark: our vApus Mark I, which simulates a heavy consolidated virtualization load. The numbers were very interesting, but the article had one big shortcoming: it only measured at 90-100% workload or idle. The reason for this is that the vApus benchmark score was based upon throughput. And to measure the throughput of a certain system, you have to stress it close to the maximum. So we could not measure performance accurately unless we went for the top performance. And that is fine for an HPC workload, but not for a commercial virtualization/database/web workload.
Therefore we went for a different approach based upon our reader's feedback. We launched “one tile” of the vApus benchmark on each of tested servers. Such a tile consists of a OLAP database (4 vCPUs), an OLTP database (4 vCPUs) and two web VMs (2 vCPUs). So in total we have 12 virtual CPUs. These 12 virtual CPUs are much less than what a typical high-end dual CPU server can offer. From the point of view of the Windows 2008, Linux or VMware ESX scheduler, the best Xeon 5600 (“Westmere”) and Opteron 6100 (“Magny-cours”) can offer 24 logical or physical cores. To the hypervisor, those logical or physical cores are Hardware Execution Contexts (HECs). The hypervisor schedules VMs onto these HECs. Typically each of the 12 virtual cores needs somewhere between 50 and 90% of one core. Since we have twice the number of cores or HECs than required, we expect the typical load on the complete system to hover between 25 and 45%. And although it is not perfect, this is much closer to the real world. Most virtualized servers never run idle for a long time: with so many VMs, there is always something to do. System administrators also want to avoid CPU loads over 60-70% as this might make the response time go up exponentially.
There is more. Instead of measuring throughput, we focus on response time. At the end of the day, the number of pages that your server can maximally serve is nice to know, but not important. The response time that your system offers at a certain load is much more important. Users will appreciate low response times. Nobody is going to be happy about the fact that your server can serve up to 10.000 request per second if each page takes 10 seconds to load.
I have been to a number of AMD web conferences and siminars were they state the above.