Server Clash: DELL's Quad Opteron DELL R815 vs HP's DL380 G7 and SGI's Altix UV10by Johan De Gelas on September 9, 2010 7:30 AM EST
At low 30% to 60% utilization, we cannot compare throughput. The throughput is more or less the same on all machines. Response times make the difference here. It is important to interprete the numbers carefully though.
This might come as a surprise, but the dual Xeon X5670 inside the HP DL380 G7 comes out as the best (fastest) server here. The Xeon X5670 extracts more parallelism out of the code of one thread and clocks one core quite a bit higher than the other cores. Response times are measured per URL/query, thus single threaded performance is the determining factor until all cores are working as hard as they can.
We are working on about 30 virtual CPUs, or “worlds” in the eye of the ESX scheduler. The dual Xeon X5670 can offer 24 Hardware Execution Contexts (HECs), the quad Opteron 6174 can offer 48. However, the Opteron cannot leverage the HEC advantage enough in this scenario. The Xeon X7560 has more or less the same core, but a lower clock but it does not suffer from the small scheduling overhead that the Xeon 5670 suffers having less HECs than VMs running. So that is why the 2.26 Xeon 7560 offers only 10-15% higher response times.
So how important is this? Is the Xeon twice as fast as the Opteron? Not really. Remember that we measured this over low latency LAN. A typical web request send from Europe to the AnandTech server in North Carolina will take up to 400 ms. In that scenario the extra 100 ms difference between the Xeon and Opteron will start to fade.
The higher the load, the more the Opteron will narrow the gap as it starts to leverage the higher throughput.
The difference in user experience is hardly as dramatic as the numbers indicate. Whether you will care or not will also depend on the application. Some web requests can take up to 2 seconds (220 ms is only an average), so it really depends on how complex your application is. If you run at a light load and the heaviest requests are answered within half a second, nobody will notice if it is 300 or 180 ms. But if some of your requests take more than a second even under "normal" load, this difference will be noticeable.
So response time under "normal" load might not be as important as under heavy load, but the numbers above also show you that throughput is not everything. Single threaded performance is still important, and we definitely feel that the UltraSparc T2 approach is the wrong one for most business applications out there. A good balance between single-threaded and multi-core is still advisable for our web applications that get heavier as we build upon feature rich Content Management Systems.
Once we load the systems close their maximum, a totally different picture emerges. Below you can see the response times with much higher concurrencies and the four tiles of full vApus Mark II testing. Remember that the concurrencies are 10 times higher and the OLTP test is included.
The quad Xeon wins in the web tests while the quad Opteron leads in the OLAP tests. The OLAP test is more bandwidth sensitive and that is one of the reasons that the quad Opteron configurations excel there.
The dual Xeon 5670 has only 24 HECs to offer and 72 worlds are constantly demanding CPU power. No wonder that the dual Xeon is completely swamped and as a result has the worst response times.