Stress Testing the High End

Our previous vApus Mark I gave an idea on how well systems perform when running several virtualized “heavy duty applications”: complex network bandwidth gobbling web servers, large OLAP databases, and write intensive OLTP databases. Our benchmark was mostly based on vApus, a software client that fires off requests as if real users were stressing the server. Several client machines run with a vApus “slave” instance and a “master” vApus instance manages them (for example: start tests in sync) and collects the end results.

The first version of vApus had several limitations: it could simulate a maximum of about 1500 users per client (a limit of 32-bit Windows based software) and the number of clients to could be kept in sync was also limited. In the meantime, the core count of the servers that we test has been increasing at an almost ridiculous pace. When the first lines of vApus were written (at the end of 2006), octal core servers were considered the high-end. Only four years later we are now looking at 64-thread and 48-core monsters. Our ambitious way of benchmarking—simulating real-world users, not scripting benchmarks—resulted in scalability problems.

The lead developer of vApus, Dieter Vandroemme, decided to take all the lessons learned from 2.5 years of vApus development and apply them to a new vApus, built from scratch. Based on a new .Net 4.0 and 64-bit Windows foundation, and spending a lot of time on software tuning, Dieter came up with a new vApus Client that was capable of producing 10,000 threads in about 3.5 seconds; up to 15000 threads can be active on one client. If you know that every simulated user needs one thread, you’ll understand why this is very cool: we can now test extremely strong servers with only one humble client. A Core i7-750 (2.66GHz) needs only 20% CPU load to sustain 15000 “users” sending off SQL statements to the server. Our mighty 64-thread, 32-core quad Xeon X7560 at 2.26GHz was brought to its knees, as you can see below.

We were excited to see this happen: finally we tamed the beast with 64 threads. Yes, you can easily stress out a server with HPC benchmarks such as Linpack or SpecFP, but measuring the potential of a server using popular business software is no easy feat. We had to deal with severe thread contention at the client side for example. With several vApus instances, we are now ready to test the strongest servers including those coming out in the next few years. We are even able to stress test complete clusters of modern servers with just a few clients.

vApus' ultimate goal is not to stress servers to their maximum; we use it mostly for measuring response time at a given workload and to test stability of applications. But of course, we could not resist the chance to use it as a benchmark too. It was time to build a new benchmark, and vApus Mark II was born.

Nehalem EX Confusion vApus Mark II
Comments Locked

51 Comments

View All Comments

  • davegraham - Wednesday, August 11, 2010 - link

    which is actually why you should be using a Cisco C460 for this type of test.

    dave
  • MySchizoBuddy - Wednesday, August 11, 2010 - link

    Is there an exact correlation with number of cores and VMs. How many VMs can a 48 core system support.

    Let's assume you want 100 systems virtualized. What's the minimum number of cores that will handle those 100 VMs.
  • dilidolo - Wednesday, August 11, 2010 - link

    Depends on how many vCPU and memory you assign to each VM and how much physical memory your server has. CPU is rarely the bottleneck , memory and storage are.

    Then not all the VMs have the same workload. So no one can really answer your question.
  • davegraham - Wednesday, August 11, 2010 - link

    was going to say that a small amount of memory oversubscription is "ok" depending on the workload but you'd want that buffered with something a little more powerful than spinning disk (SSD, for example).
  • tech6 - Wednesday, August 11, 2010 - link

    The parameters for determining the optimal configuration for VMWare go well beyond just which CPU is faster. I like the AT stories about server tech but there need to be broader considerations of server features.

    1. Many applications are memory limited and not CPU bound so the memory flexibility may trump CPU power. That is why 256Gb with a dual 75xx or 6xxx series CPU in an 810 may well be the better choice than either a quad socket or dual socket 56xx configuration.

    2. Software licensing is a big part of choosing the server as it is often licensed per socket. Sometime more cores and more memory is cheaper than more sockets.

    3. Memory reliability is another major issue. Large amounts of plain ECC memory will most likely result in problems 2-3 years after deployment. The platforms available with the 6xxx and 75xx series CPUs support memory reliability features that often make it a better choice for VM data centers.

    4. Power and density is another major issue which drive data center costs that must be given consideration when reviewing servers.
  • don_k - Wednesday, August 11, 2010 - link

    Would like to see some non-windows VM benchmarks as well as a different virtualisation application used and by extension an SQL server that does not come from microsoft. Also would like to see benchmarks on para-virtualised VMs along with full hardware virtualised VMs.

    The review as is is quite meaningless to anyone that does not run windows VMs and/or does not use VMware.

    You do have oracle on a windows VM so maybe oracle on a solaris/bsd VM as well as oracle on a linux para-virtualised guest.

    There is also no mention of how, if at all, the VMs were optimised for the workloads they are running. In particular and most importantly how are the DBs using the disks? Where is the data and where are the logs? How are the disks passed on to the VM (local file, separate partition, virtual volume, full access to one/more drives etc etc).

    Way too many variables to make any kind of an accurate conclusion in my opinion.
  • phoenix79 - Wednesday, August 11, 2010 - link

    I'm curious as to why you didn't include a quad-socket Magny-Cours system. I would have been very interested to see how it would have stacked up in this article.
  • Stuka87 - Wednesday, August 11, 2010 - link

    Ditto, I would like to see the best from each CPU maker. To really see which has the best price:performance ratio.
  • davegraham - Wednesday, August 11, 2010 - link

    if vApus II was available i could run it on my Magny-Cours.

    dave
  • JohanAnandtech - Thursday, August 12, 2010 - link

    The Dell R815 and quad MC deserve an article on their own.

Log in

Don't have an account? Sign up now