The buyer's market approach: our newest testing methods

Astute readers have probably understood what we'll change in this newest server CPU evaluation, but we will let one of the professionals among our readers provide his excellent feedback on the question of improving our evaluations at it.anandtech.com:

"Increase your time horizon. Knowing the performance of the latest and greatest may be important, but most shops are sitting on stuff that's 2-3 years old. An important data point is how the new compares to the old. (Or to answer management's question: what does the additional money get us vs. what we have now? Why should we spend money to upgrade?)"

To help answer this question, we will include a 3 year old system in this review: a dual Dempsey system, which was introduced in the spring of 2006. The Dempsey or Xeon 5080 server might even be "too young", but as it is based on the "Blackford" chipset, it allows us to use the same FB-DIMMs as can be found in new Harpertown (Xeon 54xx) systems. That is important as most of our tests require quite large amounts of memory.

A 3.73GHz Xeon 5080 Dempsey performed roughly equal to a 2.3GHz Xeon 51xx Woodcrest and 2.6GHz dual-core Opteron in SAP and TPC-C. That should give you a few points of comparison, even though none of them are very meaningful. After all, we are using this old reference system to find out if the newest CPU is 2, 5, or 10 times faster; a few percent or more does not matter in that case.

In our Shanghai review, we radically changed our benchmark methodology. Instead of throwing every software box we happen to have on the shelf and know very well at our servers, we decided that the "buyers" should dictate our benchmark mix. Basically, every software type that is really important should have at least one and preferably two representatives in the benchmark suite. In the table below, you will find an overview of the software types servers are bought for and the benchmarks you can find in this review. If you want more detail about each of these software packages, please refer to this page.

Benchmark Overview
Server Software Market Importance Benchmarks used
ERP, OLTP 10-14% SAP SD 2-tier (Industry Standard benchmark)
Oracle Charbench (Free available benchmark)
Dell DVD Store (Open Source benchmark tool)
Reporting, OLAP 10-17% MS SQL Server (Real world + vApus)
Collaborative 14-18% MS Exchange LoadGen (MS own load generator for MS Exchange)
Software Dev. 7% Not yet
e-mail, DC, file/print 32-37% MS Exchange LoadGen
Web 10-14% MCS eFMS (Real World + vApus)
HPC 4-6% LS-DYNA, LINPACK (Industry Standard)
Other 2%? 3DSMax (Our own bench)
Virtualization 33-50% VMmark (Industry standard)
vApus test (in a later review)

The combination of an older reference system and real world benchmarks that closely match the software that servers are bought for should offer you a new and better way of comparing server CPUs. We complement our own benchmarks with the more reliable industry standard benchmarks (SAP, VMmark) to reach this goal.

A look inside the lab

We had two weeks to test Nehalem, and tests like the exchange tests and the OLTP tests take more than half a day to set up and perform - not to mention that it sometimes takes months to master them. Understanding how to properly configure a mail server like Exchange is completely different from configuring a database server. It is clear that our testing is now clearly beyond what one person needs to know to perform all these tests. I would like to thank my colleagues at the Sizing Servers Lab for helping to perform all this complicated testing: Tijl Deneut, Liz Van Dijk, Thomas Hofkens, Joeri Solie, and Hannes Fostie. The Sizing Servers Lab is part of Howest, which is part of the Ghent University in Belgium. The most popular parts of our research are published here at it.anandtech.com.

 


Liz proudly showing that she was first to get the MS SQL Server testing done. Notice the missing parts: the Shanghai at 2.9GHz (still in the air) and the Linux Oracle OLTP test that we are still trying to get right.

 

The SQL Server and website testing was performed with vApus, or "Virtual Application Unique Stress testing" tool. This tool took our team led by Dieter Vandroemme two years of research and programming, but it was well worth it. It allows us to stress test real world databases, websites, and other applications with the real logs that applications produce. vApus simulates the behavior not just by replaying the logs, but by intelligently choosing the actions that real users would perform using the different statistical distributions.


You can see vApus in action in the picture above. Note that the errors are time-outs. For each selection of concurrent users we see the number of responses and the average response time. It is possible to dig deeper to examine the response time of each individual action. An action is one or more queries (Databases) or a number of URLs that for example are necessary to open one webpage.

The reason why we feel that it is important to use real world applications of lesser-known companies is that these kind of benchmarks are impossible to optimize for. Manufacturers sometimes include special optimizations in their JVM, compilers, and other developer tools with the sole purpose of gaining a few points in well-known benchmarks. These benchmarks allows us to perform a real world sanity check.

What Intel is Offering Benchmark Configuration
Comments Locked

44 Comments

View All Comments

  • usamaah - Monday, March 30, 2009 - link

    Is it me or is page 2 of this article missing some information? The title of that 2nd page is "What Intel and AMD are Offering," but in the body of the text there are only descriptions of Intel's Xeon chips? Perhaps a new title to reflect the body, or add AMD info?
  • JohanAnandtech - Monday, March 30, 2009 - link

    I moved the AMD vs Intel pricing data to the back of the article as the pricing info is more interesting once you have seen the results. But forgot to change the title.. fixed. Thanks.
  • usamaah - Monday, March 30, 2009 - link

    Cool, thank you. Next time I'll finish reading the article before I make a comment, sorry ;-) Anyway wonderful article.
  • Ipatinga - Monday, March 30, 2009 - link

    Very nice to see a comparison over some generations of Xeon platform, including the new one (yet to be released).

    I would like to see a new article with Core i7 vs Xeon 5500... to check out if my Core i7 @ 3,7GHz is good enough in Maya 2009 (Windows XP 64bit, 12GB DDR3), or if a Xeon 5500 (each at 2,4GHz, for instance) in dual processor configuration will be a much better buy.

Log in

Don't have an account? Sign up now