Stress Testing the High End

Our previous vApus Mark I gave an idea on how well systems perform when running several virtualized “heavy duty applications”: complex network bandwidth gobbling web servers, large OLAP databases, and write intensive OLTP databases. Our benchmark was mostly based on vApus, a software client that fires off requests as if real users were stressing the server. Several client machines run with a vApus “slave” instance and a “master” vApus instance manages them (for example: start tests in sync) and collects the end results.

The first version of vApus had several limitations: it could simulate a maximum of about 1500 users per client (a limit of 32-bit Windows based software) and the number of clients to could be kept in sync was also limited. In the meantime, the core count of the servers that we test has been increasing at an almost ridiculous pace. When the first lines of vApus were written (at the end of 2006), octal core servers were considered the high-end. Only four years later we are now looking at 64-thread and 48-core monsters. Our ambitious way of benchmarking—simulating real-world users, not scripting benchmarks—resulted in scalability problems.

The lead developer of vApus, Dieter Vandroemme, decided to take all the lessons learned from 2.5 years of vApus development and apply them to a new vApus, built from scratch. Based on a new .Net 4.0 and 64-bit Windows foundation, and spending a lot of time on software tuning, Dieter came up with a new vApus Client that was capable of producing 10,000 threads in about 3.5 seconds; up to 15000 threads can be active on one client. If you know that every simulated user needs one thread, you’ll understand why this is very cool: we can now test extremely strong servers with only one humble client. A Core i7-750 (2.66GHz) needs only 20% CPU load to sustain 15000 “users” sending off SQL statements to the server. Our mighty 64-thread, 32-core quad Xeon X7560 at 2.26GHz was brought to its knees, as you can see below.

We were excited to see this happen: finally we tamed the beast with 64 threads. Yes, you can easily stress out a server with HPC benchmarks such as Linpack or SpecFP, but measuring the potential of a server using popular business software is no easy feat. We had to deal with severe thread contention at the client side for example. With several vApus instances, we are now ready to test the strongest servers including those coming out in the next few years. We are even able to stress test complete clusters of modern servers with just a few clients.

vApus' ultimate goal is not to stress servers to their maximum; we use it mostly for measuring response time at a given workload and to test stability of applications. But of course, we could not resist the chance to use it as a benchmark too. It was time to build a new benchmark, and vApus Mark II was born.

Nehalem EX Confusion vApus Mark II
POST A COMMENT

51 Comments

View All Comments

  • blue_falcon - Tuesday, August 10, 2010 - link

    The R715 is an AMD box. Reply
  • webdev511 - Tuesday, August 10, 2010 - link

    Yes, and the R715 has 2x AMD Opteron™ 6176SE, 2.3GHz with 12 cores per socket with an approx price of $8,000 Reply
  • fic2 - Tuesday, August 10, 2010 - link

    4. Part of the Anandtech 13 year anniversary giveaway?!! ;o) Reply
  • mino - Wednesday, August 11, 2010 - link

    Big Thanks for that ! Reply
  • Etern205 - Tuesday, August 10, 2010 - link

    *stares at cpu graph*
    ~Drrroooollllliiiieeeeeeee~~~~
    Reply
  • yuhong - Tuesday, August 10, 2010 - link

    The incorrect references to Xeon 7200 should be Xeon 7100.
    "Other reasons include the fact that some decision makers never really bothered to read the benchmarks carefully"
    You didn't even need to do that. Knowing the difference between NetBurst vs Core 2 vs Nehalem would have made it obvious.
    Reply
  • ELC - Tuesday, August 10, 2010 - link

    Isn't the price of software licenses a major factor in the choice of optimum server size? Reply
  • webdev511 - Tuesday, August 10, 2010 - link

    So does the NUMA barrier.

    I'd go for less sockets with more cores any day of the week and as a result Intel= second string.
    Reply
  • Ratman6161 - Wednesday, August 11, 2010 - link

    For the software licensing reasons I mentioned above, there is a distinct advantage to fewer sockets with more cores. Reply
  • davegraham - Wednesday, August 11, 2010 - link

    so NUMA is an interesting one. Intel's QPI bus is actually quite good and worth spending some time to get to know.

    dave
    Reply

Log in

Don't have an account? Sign up now