Response Times

At low 30% to 60% utilization, we cannot compare throughput. The throughput is more or less the same on all machines. Response times make the difference here. It is important to interprete the numbers carefully though.

This might come as a surprise, but the dual Xeon X5670 inside the HP DL380 G7 comes out as the best (fastest) server here. The Xeon X5670 extracts more parallelism out of the code of one thread and clocks one core quite a bit higher than the other cores. Response times are measured per URL/query, thus single threaded performance is the determining factor until all cores are working as hard as they can.

We are working on about 30 virtual CPUs, or “worlds” in the eye of the ESX scheduler. The dual Xeon X5670 can offer 24 Hardware Execution Contexts (HECs), the quad Opteron 6174 can offer 48. However, the Opteron cannot leverage the HEC advantage enough in this scenario. The Xeon X7560 has more or less the same core, but a lower clock but it does not suffer from the small scheduling overhead that the Xeon 5670 suffers having less HECs than VMs running. So that is why the 2.26 Xeon 7560 offers only 10-15% higher response times.

So how important is this? Is the Xeon twice as fast as the Opteron? Not really. Remember that we measured this over low latency LAN. A typical web request send from Europe to the AnandTech server in North Carolina will take up to 400 ms. In that scenario the extra 100 ms difference between the Xeon and Opteron will start to fade.

The higher the load, the more the Opteron will narrow the gap as it starts to leverage the higher throughput.

The difference in user experience is hardly as dramatic as the numbers indicate. Whether you will care or not will also depend on the application. Some web requests can take up to 2 seconds (220 ms is only an average), so it really depends on how complex your application is. If you run at a light load and the heaviest requests are answered within half a second, nobody will notice if it is 300 or 180 ms. But if some of your requests take more than a second even under "normal" load, this difference will be noticeable.

So response time under "normal" load might not be as important as under heavy load, but the numbers above also show you that throughput is not everything. Single threaded performance is still important, and we definitely feel that the UltraSparc T2 approach is the wrong one for most business applications out there. A good balance between single-threaded and multi-core is still advisable for our web applications that get heavier as we build upon feature rich Content Management Systems.

Once we load the systems close their maximum, a totally different picture emerges. Below you can see the response times with much higher concurrencies and the four tiles of full vApus Mark II testing. Remember that the concurrencies are 10 times higher and the OLTP test is included.

The quad Xeon wins in the web tests while the quad Opteron leads in the OLAP tests. The OLAP test is more bandwidth sensitive and that is one of the reasons that the quad Opteron configurations excel there.

The dual Xeon 5670 has only 24 HECs to offer and 72 worlds are constantly demanding CPU power. No wonder that the dual Xeon is completely swamped and as a result has the worst response times.

Real World Power Putting It All Together
Comments Locked

51 Comments

View All Comments

  • jdavenport608 - Thursday, September 9, 2010 - link

    Appears that the pros and cons on the last page are not correct for the SGI server.
  • Photubias - Thursday, September 9, 2010 - link

    If you view the article in 'Print Format' than it shows correctly.
    Seems to be an Anandtech issue ... :p
  • Ryan Smith - Thursday, September 9, 2010 - link

    Fixed. Thanks for the notice.
  • yyrkoon - Friday, September 10, 2010 - link

    Hey guys, you've got to do better than this. The only thing that drew me to this article was the Name "SGI" and your explanation of their system is nothing.

    Why not just come out and say . . " Hey, look what I've got pictures of". Thats about all the use I have for the "article". Sorry if you do not like that Johan, but the truth hurts.
  • JohanAnandtech - Friday, September 10, 2010 - link

    It is clear that we do not focus on the typical SGI market. But you have noticed that from the other competitors and you know that HPC is not our main expertise, virtualization is. It is not really clear what your complaint is, so I assume that it is the lack of HPC benchmarks. Care to make your complaint a little more constructive?
  • davegraham - Monday, September 13, 2010 - link

    i'll defend Johan here...SGI has basically cornered themselves into the cloud scale market place where their BTO-style of engagement has really allowed them to prosper. If you wanted a competitive story there, the Dell DCS series of servers (C6100, for example) would be a better comparison.

    cheers,

    Dave
  • tech6 - Thursday, September 9, 2010 - link

    While the 815 is great value where the host is CPU bound, most VM workloads seem to be memory limited rather than processing power. Another consideration is server (in particularly memory) longevity which is something where the 810 inherits the 910s RAS features while the 815 misses out.

    I am not disagreeing with your conclusion that the 815 is great value but only if your workload is CPU bound and if you are willing to take the risk of not having RAS features in a data center application.
  • JFAMD - Thursday, September 9, 2010 - link

    True that there is a RAS difference, but you do have to weigh the budget differences and power differences to determine whether the RAS levels of either the R815 (or even a xeon 5600 system) are not sufficient for your application. Keep in mind that the xeon 7400 series did not have these RAS features, so if you were comfortable with the RAS levels of the 7400 series for these apps, then you have to question whether the new RAS features are a "must have". I am not saying that people shouldn't want more RAS (everyone should), but it is more a question of whether it is worth paying the extra price up front and the extra price every hour at the wall socket.

    For virtualization, the last time I talked to the VM vendors about attach rate, they said that their attach rate to platform matched the market (i.e. ~75% of their software was landing on 2P systems). So in the case of virtualization you can move to the R815 and still enjoy the economics of the 2P world but get the scalability of the 4P products.
  • tech6 - Thursday, September 9, 2010 - link

    I don't disagree but the RAS issue also dictates the longevity of the platform. I have been in the hosting business for a while and we see memory errors bring down 2 year+ old HP blades in alarming numbers. If you budget for a 4 year life cycle, then RAS has to be high on your list of features to make that happen.
  • mino - Thursday, September 9, 2010 - link

    Generally I would agree except that 2yr old HP blades (G5) are the worst way to ascertain commodity x86 platform reliability.
    Reasons:
    1) inadequate cooling setup (you better keep c7000 input air well below 20C at all costs)
    2) FBDIMM love to overheat
    3) G5 blade mobos are BIG MESS when it comes to memory compatibility => they clearly underestimated the tolerances needed

    4) All the points above hold true at least compared to HS21* and except 1) also against bl465*

    Speaking about 3yrs of operations of all three boxen in similar conditions. The most clear thi became to us when building power got cutoff and all our BladeSystems got dead within minutes (before running out of UPS by any means) while our 5yrs old BladeCenter (hosting all infrastructure services) remained online even at 35C (where the temp platoed thanks to dead HP's)
    Ironically, thanks to the dead production we did not have to kill infrastructure at all as the UPS's lasted for the 3 hours needed easily ...

Log in

Don't have an account? Sign up now