vApus Mark II

vApus Mark II is our newest benchmarksuite that tests how well servers cope with virtualizing "heavy duty applications". We explained the benchmark methodology here.

vApus Mark II score - VMware ESX 4.1
* 2 tiles instead of 4 tiles test
** 128GB instead of 64GB

Before we can even start analyzing these numbers, we must elaborate about some benchmark nuances. We had to test several platforms in two different setups to make sure the comparison was as fair as possible. First, let's look at the Xeon 7560.

The Xeon 7560 has two memory controllers, and each controller has two serial memory interfaces. Each SMI connects to two memory buffers, and each buffer needs two DIMMs. Each CPU needs thus eight DIMMs to achieve maximum bandwidth. So our Quad Xeon X7560 needs 32 DIMMs. Now, we also want to do a performance/watt comparison of these servers. So to accomplish this, we decided to test with 16 DIMMs (64GB) in all servers. With 16 channels, bandwidth goes down from 58GB/s to 38GB/s and bandwidth has a tangible impact in a virtualized environment. Therefore, we tested with both 128GB and 64GB. The 128GB number represents the best performance of the quad Xeon 7560; the 64GB number will allow us to determine performance/watt.

Next the dual Opteron and dual Xeon numbers. We tested with both 2- and 4-tile virtualization scenarios. With 2-tiles we demand 36 virtual CPUs, which is more than enough to stress the dual socket servers. As these dual socket servers will be limited by memory space, we feel that the 2-tile numbers are more representative. By comparing the 2-tile numbers with the 4-tile numbers, we take into account that the quad socket systems will be able to leverage their higher number of DIMM slots. So comparing the 2-tile (Dual Socket) with the 4-tile (quad socket) is closest to the real world. However, if you feel that keeping the load the same is more important we added the 4-tile numbers. Four tile numbers result in slightly higher scores for the dual socket systems, and this is similar to how high VMmark scores are achieved. But if you look at the table below, you’ll see that there is another reason why this is not the best way to benchmark:

The four tiles benchmark achieves higher throughput, but the individual tiles perform very badly. If you remember, our reference scores (100%) are based on the quad-core Xeon 5570 2.93. You can see that the 4-tile benchmark runs achieve only 13% (Opteron) or 11% (Xeon) of a quad Xeon 5500 on the Oracle OLTP test. That means the OLTP VM gets less than a 1.5GHz Xeon 5570 (half a Xeon 5570). In the 2-tile test, the OLTP VM gets the performance of a full Xeon 5570 core (in the case of AMD, probably 1.5 Opteron “Istanbul” cores).

In the real world, getting much more throughput at the expense of the response times of individual applications is acceptable for applications such as underutilized file servers and authentication servers (an active directory server might only see a spike at 9 AM). But vApus always had the objective of measuring the performance of virtualized performance critical applications such as important web services, OLAP, and OLTP databases. So since performance matters, we feel that the individual response time of the VMs is more important than pure throughput. For our further performance analysis we will use the 2-tile numbers of the dual Xeon and dual Opteron.

The quad Xeon has a 15% advantage over the quad Magny-cours. In our last article, we noted that the quad Xeon 7560 might make sense even to the people who don’t feel that RAS is their top priority. The reason was that the performance advantage over the dual socket server was compelling enough to consider buying a few quad Xeons instead of 2/3 times more dual Xeons. However, the Dell R815 and the 48 AMD cores inside block the way downwards for the quad Intel platform. The price/performance of the Opteron platform is extremely attractive: you can almost buy two Dell R815 for the price of a quad Xeon server and you get 85% of the performance.

The performance advantage over the Dual Xeon X5670 is almost 80% for a price premium of about 30%. You need about twice as many dual Intel servers, so this is excellent value. Only power can spoil AMD’s value party. We’ll look into this later in this article.

Although the quad Opteron 6136 may not enjoy the same fame as its twelve-core 6174 sibling, it is worth checking out. A Dell R815 equipped with four 6136 Opterons and 128GB costs about $12000. Compared to the dual Xeon 5670 with 128GB, you save about $1000 and get essentially 40% more performance for free. Not bad at all. But won’t that $1000 dissipate in the heat of extra power? Let us find out!

VMmark Power Extremes: Idle and Full Load
Comments Locked

51 Comments

View All Comments

  • jdavenport608 - Thursday, September 9, 2010 - link

    Appears that the pros and cons on the last page are not correct for the SGI server.
  • Photubias - Thursday, September 9, 2010 - link

    If you view the article in 'Print Format' than it shows correctly.
    Seems to be an Anandtech issue ... :p
  • Ryan Smith - Thursday, September 9, 2010 - link

    Fixed. Thanks for the notice.
  • yyrkoon - Friday, September 10, 2010 - link

    Hey guys, you've got to do better than this. The only thing that drew me to this article was the Name "SGI" and your explanation of their system is nothing.

    Why not just come out and say . . " Hey, look what I've got pictures of". Thats about all the use I have for the "article". Sorry if you do not like that Johan, but the truth hurts.
  • JohanAnandtech - Friday, September 10, 2010 - link

    It is clear that we do not focus on the typical SGI market. But you have noticed that from the other competitors and you know that HPC is not our main expertise, virtualization is. It is not really clear what your complaint is, so I assume that it is the lack of HPC benchmarks. Care to make your complaint a little more constructive?
  • davegraham - Monday, September 13, 2010 - link

    i'll defend Johan here...SGI has basically cornered themselves into the cloud scale market place where their BTO-style of engagement has really allowed them to prosper. If you wanted a competitive story there, the Dell DCS series of servers (C6100, for example) would be a better comparison.

    cheers,

    Dave
  • tech6 - Thursday, September 9, 2010 - link

    While the 815 is great value where the host is CPU bound, most VM workloads seem to be memory limited rather than processing power. Another consideration is server (in particularly memory) longevity which is something where the 810 inherits the 910s RAS features while the 815 misses out.

    I am not disagreeing with your conclusion that the 815 is great value but only if your workload is CPU bound and if you are willing to take the risk of not having RAS features in a data center application.
  • JFAMD - Thursday, September 9, 2010 - link

    True that there is a RAS difference, but you do have to weigh the budget differences and power differences to determine whether the RAS levels of either the R815 (or even a xeon 5600 system) are not sufficient for your application. Keep in mind that the xeon 7400 series did not have these RAS features, so if you were comfortable with the RAS levels of the 7400 series for these apps, then you have to question whether the new RAS features are a "must have". I am not saying that people shouldn't want more RAS (everyone should), but it is more a question of whether it is worth paying the extra price up front and the extra price every hour at the wall socket.

    For virtualization, the last time I talked to the VM vendors about attach rate, they said that their attach rate to platform matched the market (i.e. ~75% of their software was landing on 2P systems). So in the case of virtualization you can move to the R815 and still enjoy the economics of the 2P world but get the scalability of the 4P products.
  • tech6 - Thursday, September 9, 2010 - link

    I don't disagree but the RAS issue also dictates the longevity of the platform. I have been in the hosting business for a while and we see memory errors bring down 2 year+ old HP blades in alarming numbers. If you budget for a 4 year life cycle, then RAS has to be high on your list of features to make that happen.
  • mino - Thursday, September 9, 2010 - link

    Generally I would agree except that 2yr old HP blades (G5) are the worst way to ascertain commodity x86 platform reliability.
    Reasons:
    1) inadequate cooling setup (you better keep c7000 input air well below 20C at all costs)
    2) FBDIMM love to overheat
    3) G5 blade mobos are BIG MESS when it comes to memory compatibility => they clearly underestimated the tolerances needed

    4) All the points above hold true at least compared to HS21* and except 1) also against bl465*

    Speaking about 3yrs of operations of all three boxen in similar conditions. The most clear thi became to us when building power got cutoff and all our BladeSystems got dead within minutes (before running out of UPS by any means) while our 5yrs old BladeCenter (hosting all infrastructure services) remained online even at 35C (where the temp platoed thanks to dead HP's)
    Ironically, thanks to the dead production we did not have to kill infrastructure at all as the UPS's lasted for the 3 hours needed easily ...

Log in

Don't have an account? Sign up now