vApus Mark II

vApus Mark II is our newest benchmarksuite that tests how well servers cope with virtualizing "heavy duty applications". We explained the benchmark methodology here.

vApus Mark II score - VMware ESX 4.1
* 2 tiles instead of 4 tiles test
** 128GB instead of 64GB

Before we can even start analyzing these numbers, we must elaborate about some benchmark nuances. We had to test several platforms in two different setups to make sure the comparison was as fair as possible. First, let's look at the Xeon 7560.

The Xeon 7560 has two memory controllers, and each controller has two serial memory interfaces. Each SMI connects to two memory buffers, and each buffer needs two DIMMs. Each CPU needs thus eight DIMMs to achieve maximum bandwidth. So our Quad Xeon X7560 needs 32 DIMMs. Now, we also want to do a performance/watt comparison of these servers. So to accomplish this, we decided to test with 16 DIMMs (64GB) in all servers. With 16 channels, bandwidth goes down from 58GB/s to 38GB/s and bandwidth has a tangible impact in a virtualized environment. Therefore, we tested with both 128GB and 64GB. The 128GB number represents the best performance of the quad Xeon 7560; the 64GB number will allow us to determine performance/watt.

Next the dual Opteron and dual Xeon numbers. We tested with both 2- and 4-tile virtualization scenarios. With 2-tiles we demand 36 virtual CPUs, which is more than enough to stress the dual socket servers. As these dual socket servers will be limited by memory space, we feel that the 2-tile numbers are more representative. By comparing the 2-tile numbers with the 4-tile numbers, we take into account that the quad socket systems will be able to leverage their higher number of DIMM slots. So comparing the 2-tile (Dual Socket) with the 4-tile (quad socket) is closest to the real world. However, if you feel that keeping the load the same is more important we added the 4-tile numbers. Four tile numbers result in slightly higher scores for the dual socket systems, and this is similar to how high VMmark scores are achieved. But if you look at the table below, you’ll see that there is another reason why this is not the best way to benchmark:

The four tiles benchmark achieves higher throughput, but the individual tiles perform very badly. If you remember, our reference scores (100%) are based on the quad-core Xeon 5570 2.93. You can see that the 4-tile benchmark runs achieve only 13% (Opteron) or 11% (Xeon) of a quad Xeon 5500 on the Oracle OLTP test. That means the OLTP VM gets less than a 1.5GHz Xeon 5570 (half a Xeon 5570). In the 2-tile test, the OLTP VM gets the performance of a full Xeon 5570 core (in the case of AMD, probably 1.5 Opteron “Istanbul” cores).

In the real world, getting much more throughput at the expense of the response times of individual applications is acceptable for applications such as underutilized file servers and authentication servers (an active directory server might only see a spike at 9 AM). But vApus always had the objective of measuring the performance of virtualized performance critical applications such as important web services, OLAP, and OLTP databases. So since performance matters, we feel that the individual response time of the VMs is more important than pure throughput. For our further performance analysis we will use the 2-tile numbers of the dual Xeon and dual Opteron.

The quad Xeon has a 15% advantage over the quad Magny-cours. In our last article, we noted that the quad Xeon 7560 might make sense even to the people who don’t feel that RAS is their top priority. The reason was that the performance advantage over the dual socket server was compelling enough to consider buying a few quad Xeons instead of 2/3 times more dual Xeons. However, the Dell R815 and the 48 AMD cores inside block the way downwards for the quad Intel platform. The price/performance of the Opteron platform is extremely attractive: you can almost buy two Dell R815 for the price of a quad Xeon server and you get 85% of the performance.

The performance advantage over the Dual Xeon X5670 is almost 80% for a price premium of about 30%. You need about twice as many dual Intel servers, so this is excellent value. Only power can spoil AMD’s value party. We’ll look into this later in this article.

Although the quad Opteron 6136 may not enjoy the same fame as its twelve-core 6174 sibling, it is worth checking out. A Dell R815 equipped with four 6136 Opterons and 128GB costs about $12000. Compared to the dual Xeon 5670 with 128GB, you save about $1000 and get essentially 40% more performance for free. Not bad at all. But won’t that $1000 dissipate in the heat of extra power? Let us find out!

VMmark Power Extremes: Idle and Full Load
Comments Locked

51 Comments

View All Comments

  • pablo906 - Saturday, September 11, 2010 - link

    High performance Oracle environments are exactly what's being virtualized in the Server world yet it's one of your premier benchmarks.

    /edit should read

    High performance Oracle environments are exactly what's not being virtualized in the Server world yet it's one of your premier benchmarks.
  • JohanAnandtech - Monday, September 13, 2010 - link

    "You run highly loaded Hypervisors. NOONE does this in the Enterprise space."

    I agree. Isn't that what I am saying on page 12:

    "In the real world you do not run your virtualized servers at their maximum just to measure the potential performance. Neither do they run idle."

    The only reason why we run with highly loaded hypervisors is to measure the peak throughput of the platform. Like VMmark. We know that is not realworld, and does not give you a complete picture. That is exactly the reason why there is a page 12 and 13 in this article. Did you miss those?
  • Per Hansson - Sunday, September 12, 2010 - link

    Hi, please use a better camera for pictures of servers that costs thousands of dollars
    In full size the pictures look terrible, way too much grain
    The camera you use is a prime example of how far marketing have managed to take these things
    10MP on a sensor that is 1/2.3 " (6.16 x 4.62 mm, 0.28 cm²)
    A used DSLR with a decent 50mm prime lens plus a tripod really does not cost that much for a site like this

    I love server pron pictures :D
  • dodge776 - Friday, September 17, 2010 - link

    I may be one of the many "silent" readers of your reviews Johan, but putting aside all the nasty or not-so-bright comments, I would like to commend you and the AT team for putting up such excellent reviews, and also for using industry-standard benchmarks like SAPS to measure throughput of the x86 servers.

    Great work and looking forward to more of these types of reviews!
  • lonnys - Monday, September 20, 2010 - link

    Johan -
    You note for the R815:
    Make sure you populate at least 32 DIMMs, as bandwidth takes a dive at lower DIMM counts.
    Could you elaborate on this? We have a R815 with 16x2GB and not seeing the expected performance for our very CPU intensive app perhaps adding another 16x2GB might help
  • JohanAnandtech - Tuesday, September 21, 2010 - link

    This comment you quoted was written in the summary of the quad Xeon box.

    16 DIMMs is enough for the R815 on the condition that you have one DIMM in each channel. Maybe you are placing the DIMMs wrongly? (Two DIMMs in one channel, zero DIMM in the other?)
  • anon1234 - Sunday, October 24, 2010 - link

    I've been looking around for some results comparing maxed-out servers but I am not finding any.

    The Xeon 5600 platform clocks the memory down to 800MHz whenever 3 dimms per channel are used, and I believe in some/all cases the full 1066/1333MHz speed (depends on model) is only available when 1 dimm per channel is used. This could be huge compared with an AMD 6100 solution at 1333MHz all the time, or a Xeon 7560 system at 1066 all the time (although some vendors clock down to 978MHz with some systems - IBM HX5 for example). I don't know if this makes a real-world difference on typical virtualization workloads, but it's hard to say because the reviewers rarely try it.

    It does make me wonder about your 15-dimm 5600 system, 3 dimms per channel @800MHz on one processor with 2 DPC @ full speed on the other. Would it have done even better with a balanced memory config?

    I realize you're trying to compare like to like, but if you're going to present price/performance and power/performance ratios you might want to consider how these numbers are affected if I have to use slower 16GB dimms to get the memory density I want, or if I have to buy 2x as many VMware licenses or Windows Datacenter processor licenses because I've purchased 2x as many 5600-series machines.
  • nightowl - Tuesday, March 29, 2011 - link

    The previous post is correct in that the Xeon 5600 memory configuration is flawed. You are running the processor in a degraded state 1 due to the unbalanced memory configuration as well as the differing memory speeds.

    The Xeon 5600 processors can run at 1333MHz (with the correct DIMMs) with up to 4 ranks per channel. Going above this results in the memory speed clocking down to 800MHz which does result in a performance drop to the applications being run.
  • markabs - Friday, June 8, 2012 - link

    Hi there,

    I know this is an old post but I'm looking at putting 4 SSDs in a Dell poweredge and had a question for you.

    What raid card did you use with the above setup?

    Currently a new Dell poweredge R510 comes with a PERC H700 raid card with 1GB cache and this is connect to a hot swap chassis. Dell want £1500 per SSD (crazy!) so I'm looking to buy 4 intel 520s and setup them up in raid 10.

    I just wanted to know what raid card you used and if you had a trouble with it and what raid setup you used?

    many thanks.

    Mark
  • ian182 - Thursday, June 28, 2012 - link

    I recently bought a G7 from www.itinstock.com and if I am honest it is perfect for my needs, i don't see the point in the higher end ones when it just works out a lot cheaper to buy the parts you need and add them to the G7.

Log in

Don't have an account? Sign up now