vApus Mark II

vApus Mark II is our newest benchmarksuite that tests how well servers cope with virtualizing "heavy duty applications". We explained the benchmark methodology here.

vApus Mark II score - VMware ESX 4.1
* 2 tiles instead of 4 tiles test
** 128GB instead of 64GB

Before we can even start analyzing these numbers, we must elaborate about some benchmark nuances. We had to test several platforms in two different setups to make sure the comparison was as fair as possible. First, let's look at the Xeon 7560.

The Xeon 7560 has two memory controllers, and each controller has two serial memory interfaces. Each SMI connects to two memory buffers, and each buffer needs two DIMMs. Each CPU needs thus eight DIMMs to achieve maximum bandwidth. So our Quad Xeon X7560 needs 32 DIMMs. Now, we also want to do a performance/watt comparison of these servers. So to accomplish this, we decided to test with 16 DIMMs (64GB) in all servers. With 16 channels, bandwidth goes down from 58GB/s to 38GB/s and bandwidth has a tangible impact in a virtualized environment. Therefore, we tested with both 128GB and 64GB. The 128GB number represents the best performance of the quad Xeon 7560; the 64GB number will allow us to determine performance/watt.

Next the dual Opteron and dual Xeon numbers. We tested with both 2- and 4-tile virtualization scenarios. With 2-tiles we demand 36 virtual CPUs, which is more than enough to stress the dual socket servers. As these dual socket servers will be limited by memory space, we feel that the 2-tile numbers are more representative. By comparing the 2-tile numbers with the 4-tile numbers, we take into account that the quad socket systems will be able to leverage their higher number of DIMM slots. So comparing the 2-tile (Dual Socket) with the 4-tile (quad socket) is closest to the real world. However, if you feel that keeping the load the same is more important we added the 4-tile numbers. Four tile numbers result in slightly higher scores for the dual socket systems, and this is similar to how high VMmark scores are achieved. But if you look at the table below, you’ll see that there is another reason why this is not the best way to benchmark:

The four tiles benchmark achieves higher throughput, but the individual tiles perform very badly. If you remember, our reference scores (100%) are based on the quad-core Xeon 5570 2.93. You can see that the 4-tile benchmark runs achieve only 13% (Opteron) or 11% (Xeon) of a quad Xeon 5500 on the Oracle OLTP test. That means the OLTP VM gets less than a 1.5GHz Xeon 5570 (half a Xeon 5570). In the 2-tile test, the OLTP VM gets the performance of a full Xeon 5570 core (in the case of AMD, probably 1.5 Opteron “Istanbul” cores).

In the real world, getting much more throughput at the expense of the response times of individual applications is acceptable for applications such as underutilized file servers and authentication servers (an active directory server might only see a spike at 9 AM). But vApus always had the objective of measuring the performance of virtualized performance critical applications such as important web services, OLAP, and OLTP databases. So since performance matters, we feel that the individual response time of the VMs is more important than pure throughput. For our further performance analysis we will use the 2-tile numbers of the dual Xeon and dual Opteron.

The quad Xeon has a 15% advantage over the quad Magny-cours. In our last article, we noted that the quad Xeon 7560 might make sense even to the people who don’t feel that RAS is their top priority. The reason was that the performance advantage over the dual socket server was compelling enough to consider buying a few quad Xeons instead of 2/3 times more dual Xeons. However, the Dell R815 and the 48 AMD cores inside block the way downwards for the quad Intel platform. The price/performance of the Opteron platform is extremely attractive: you can almost buy two Dell R815 for the price of a quad Xeon server and you get 85% of the performance.

The performance advantage over the Dual Xeon X5670 is almost 80% for a price premium of about 30%. You need about twice as many dual Intel servers, so this is excellent value. Only power can spoil AMD’s value party. We’ll look into this later in this article.

Although the quad Opteron 6136 may not enjoy the same fame as its twelve-core 6174 sibling, it is worth checking out. A Dell R815 equipped with four 6136 Opterons and 128GB costs about $12000. Compared to the dual Xeon 5670 with 128GB, you save about $1000 and get essentially 40% more performance for free. Not bad at all. But won’t that $1000 dissipate in the heat of extra power? Let us find out!

VMmark Power Extremes: Idle and Full Load
Comments Locked

51 Comments

View All Comments

  • cgaspar - Friday, September 10, 2010 - link

    The word you're looking for is "authentication". Is a simple spell check so much to ask?
  • JohanAnandtech - Friday, September 10, 2010 - link

    Fixed.
  • ESetter - Friday, September 10, 2010 - link

    Great article. I suggest to include some HPC benchmarks other than STREAM. For instance, DGEMM performance would be interesting (using MKL and ACML for Intel and AMD platforms).
  • mattshwink - Friday, September 10, 2010 - link

    One thing I would like to point out is that most of the customers I work with use VMWare in an enterprise scenario. Failover/HA is usually a large issue. As such we usually create (or at least recommend) VMWare clusters with 2 or 3 nodes. As such each node is limited to roughly 40% usage (memory/CPU) so that if a failure occurs there is minimal/0 service disruption. So we usually don't run highly loaded ESX hosts. So the 40% load numbers are the most interesting. Good article and lots to think about when deploying these systems....
  • lorribot - Friday, September 10, 2010 - link

    It would be nice to see some comparisons of blade systems in a similar vein to this article.

    Also you say that one system is better at say DBs whilst the the other is better at VMware, what about if you are running say a SQL database on a VMware platform? Which one would be best for that? How much does the application you are running in the VM affect the comparative performance figures you produce?
  • spinning rust - Saturday, September 11, 2010 - link

    is it really a question, anyone who has used both DRAC and ILO knows who wins. everyone at my current company has a tear come to their eyes when we remember ILO. over 4 years of supporting Proliants vs 1 year of Dell, i've had more hw problems with Dell. i've never before seen firmware brick a server, but they did it with a 2850, the answer, new motherboard. yay!
  • pablo906 - Saturday, September 11, 2010 - link

    This article should be renamed servers clash, finding alternatives to the Intel architecture. Yes it's slightly overpriced but it's extremely well put together. Only in the last few months has the 12c Opteron become an option. It's surprising you can build Dell 815's with four 71xx series and 10GB Nics for under a down payment on a house. This was not the case recently. It's a good article but it's clearly aimed to show that you can have great AMD alternatives for a bit more. The most interesting part of the article was how well AMD competed against a much more expensive 7500 series Xeon server. I enjoyed the article it was informative but the showdown style format was simply wrong for the content. Servers aren't commodity computers like desktops. They are aimed at a different type of user and I don't think that showdowns of vastly dissimilar hardware, from different price points and performance points, serve to inform IT Pros of anything they didn't already know. Spend more money for more power and spend it wisely......
  • echtogammut - Saturday, September 11, 2010 - link

    First off, I am glad that Anandtech is reviewing server systems, however I came away with more questions than answers after reading this article.

    First off, please test comparable systems. Your system specs were all over the board and there were way to many variables that can effect performance for any relevant data to be extracted from your tests.

    Second, HP, SGI and Dell will configure your system to spec... i.e. use 4GB dimms, drives, etcetera if you call them. However something that should be noted is that HP memory must be replaced with HP memory, something that is an important in making a purchase. HP, puts a "thermal sensor" on their dimms, that forces you to buy their overpriced memory (also the reason they will use 1GB dimms, unless you spec otherwise).

    Third, if this is going to be a comparison, between three manufactures offerings, compare those offerings. I came away feeling I should buy an IBM system (which wasn't even "reviewed")

    Lastly read the critiques others have written here, most a very valid.
  • JohanAnandtech - Monday, September 13, 2010 - link

    "First off, please test comparable systems."

    I can not agree with this. I have noticed too many times that sysadmins make the decision to go for a certain system too early, relying too much on past experiences. The choice for "quad socket rack" or "dual socket blade" should not be made because you are used to deal with these servers or because your partner pushes you in that direction.

    Just imagine that the quad Xeon 7500 would have done very well in the power department. Too many people would never consider them because they are not used to buy higher end systems. So they would populate a rack full of blades and lose the RAS, scalability and performance advantages.

    I am not saying that this gutfeeling is wrong most of the time, but I am advocating to keep an open mind. So the comparison of very different servers that can all do the job is definitely relevant.
  • pablo906 - Saturday, September 11, 2010 - link

    These VMWare benchmarks are worthless. I've been digesting this for a long long time and just had a light bulb moment when re-reading the review. You run highly loaded Hypervisors. NOONE does this in the Enterprise space. To make sure I'm not crazy I just called several other IT folks who work in large (read 500+ users minimum most in the thousands) and they all run at <50% load on each server to allow for failure. I personally run my servers at 60% load and prefer running more servers to distribute I/O than running less servers to consolidate heavily. With 3-5 servers I can really fine tune the storage subsystem to remove I/O bottlenecks from both the interface and disk subsystem. I understand that testing server hardware is difficult especially from a Virtualization standpoint, and I can't readily offer up better solutions to what you're trying to accomplish all I can say is that there need to be more hypervisors tested and some thought about workloads would go a long way. Testing a standard business on Windows setup would be informative. This would be an SQL Server, an Exchange Server, a Share Point server, two DC's, and 100 users. I think every server I've ever seen tested here is complete overkill for that workload but that's an extremely common workload. A remote environment such as TS or Citrix is another very common use of virtualization. The OS craps out long before hardware does when running many users concurrently in a remote environment. Spinning up many relatively weak VM's is perfect for this kind of workload. High performance Oracle environments are exactly what's being virtualized in the Server world yet it's one of your premier benchmarks. I've never seen a production high load Oracle environment that wasn't running on some kind of physical cluster with fancy storage. Just my 2 cents.

Log in

Don't have an account? Sign up now