The buyer's market approach: our newest testing methods

Astute readers have probably understood what we'll change in this newest server CPU evaluation, but we will let one of the professionals among our readers provide his excellent feedback on the question of improving our evaluations at it.anandtech.com:

"Increase your time horizon. Knowing the performance of the latest and greatest may be important, but most shops are sitting on stuff that's 2-3 years old. An important data point is how the new compares to the old. (Or to answer management's question: what does the additional money get us vs. what we have now? Why should we spend money to upgrade?)"

To help answer this question, we will include a 3 year old system in this review: a dual Dempsey system, which was introduced in the spring of 2006. The Dempsey or Xeon 5080 server might even be "too young", but as it is based on the "Blackford" chipset, it allows us to use the same FB-DIMMs as can be found in new Harpertown (Xeon 54xx) systems. That is important as most of our tests require quite large amounts of memory.

A 3.73GHz Xeon 5080 Dempsey performed roughly equal to a 2.3GHz Xeon 51xx Woodcrest and 2.6GHz dual-core Opteron in SAP and TPC-C. That should give you a few points of comparison, even though none of them are very meaningful. After all, we are using this old reference system to find out if the newest CPU is 2, 5, or 10 times faster; a few percent or more does not matter in that case.

In our Shanghai review, we radically changed our benchmark methodology. Instead of throwing every software box we happen to have on the shelf and know very well at our servers, we decided that the "buyers" should dictate our benchmark mix. Basically, every software type that is really important should have at least one and preferably two representatives in the benchmark suite. In the table below, you will find an overview of the software types servers are bought for and the benchmarks you can find in this review. If you want more detail about each of these software packages, please refer to this page.

Benchmark Overview
Server Software Market Importance Benchmarks used
ERP, OLTP 10-14% SAP SD 2-tier (Industry Standard benchmark)
Oracle Charbench (Free available benchmark)
Dell DVD Store (Open Source benchmark tool)
Reporting, OLAP 10-17% MS SQL Server (Real world + vApus)
Collaborative 14-18% MS Exchange LoadGen (MS own load generator for MS Exchange)
Software Dev. 7% Not yet
e-mail, DC, file/print 32-37% MS Exchange LoadGen
Web 10-14% MCS eFMS (Real World + vApus)
HPC 4-6% LS-DYNA, LINPACK (Industry Standard)
Other 2%? 3DSMax (Our own bench)
Virtualization 33-50% VMmark (Industry standard)
vApus test (in a later review)

The combination of an older reference system and real world benchmarks that closely match the software that servers are bought for should offer you a new and better way of comparing server CPUs. We complement our own benchmarks with the more reliable industry standard benchmarks (SAP, VMmark) to reach this goal.

A look inside the lab

We had two weeks to test Nehalem, and tests like the exchange tests and the OLTP tests take more than half a day to set up and perform - not to mention that it sometimes takes months to master them. Understanding how to properly configure a mail server like Exchange is completely different from configuring a database server. It is clear that our testing is now clearly beyond what one person needs to know to perform all these tests. I would like to thank my colleagues at the Sizing Servers Lab for helping to perform all this complicated testing: Tijl Deneut, Liz Van Dijk, Thomas Hofkens, Joeri Solie, and Hannes Fostie. The Sizing Servers Lab is part of Howest, which is part of the Ghent University in Belgium. The most popular parts of our research are published here at it.anandtech.com.

 


Liz proudly showing that she was first to get the MS SQL Server testing done. Notice the missing parts: the Shanghai at 2.9GHz (still in the air) and the Linux Oracle OLTP test that we are still trying to get right.

 

The SQL Server and website testing was performed with vApus, or "Virtual Application Unique Stress testing" tool. This tool took our team led by Dieter Vandroemme two years of research and programming, but it was well worth it. It allows us to stress test real world databases, websites, and other applications with the real logs that applications produce. vApus simulates the behavior not just by replaying the logs, but by intelligently choosing the actions that real users would perform using the different statistical distributions.


You can see vApus in action in the picture above. Note that the errors are time-outs. For each selection of concurrent users we see the number of responses and the average response time. It is possible to dig deeper to examine the response time of each individual action. An action is one or more queries (Databases) or a number of URLs that for example are necessary to open one webpage.

The reason why we feel that it is important to use real world applications of lesser-known companies is that these kind of benchmarks are impossible to optimize for. Manufacturers sometimes include special optimizations in their JVM, compilers, and other developer tools with the sole purpose of gaining a few points in well-known benchmarks. These benchmarks allows us to perform a real world sanity check.

What Intel is Offering Benchmark Configuration
Comments Locked

44 Comments

View All Comments

  • Veteran - Wednesday, April 1, 2009 - link

    I didn't mean to offend you, because i can imagine how much time it takes to test hardware properly. And i personally think that OLTP/OLAP testing is very innovative and needed. Because otherwise people would have no idea what to buy for servers. You cannot let you server purchase be influenced with meaningless (for servers) simple benchmarks like 3D 2006/Vantage/FPS test etc.
    You guys always are doing a great a job at testing any piece of hardware, but it is just feeling to much biased towards Intel. For example, at the last page of this review you get a link to Intel resource Center (in the same place as the next button). If you have things like that, you are not (trying to be) objective IMO.
  • JohanAnandtech - Wednesday, April 1, 2009 - link

    Thank you for clarifying in a very constructive way.

    "the last page of this review you get a link to Intel resource Center"

    I can't say I am happy with that link as it creates the wrong impression. But the deal is: editors don't involve in ad management, ad sales people don't get involved when it comes to content.

    So all I can say is to judge our content, not our ads. And like I said, it didn't stop us from claiming that Shanghai was by far the best server CPU a few months ago. And that conclusion was not on many sites.
  • Veteran - Wednesday, April 1, 2009 - link

    Thanks for clarrifying this matter.

    But ad sales people should know this creates the wrong impression. A review site (for me at least) is all about objectivity and credibility. When you place a link to Intel's Resource Center at the end of every review, it feels weird. People on forums already call Anandtech, Inteltech. And i don't think this is what you guys want.

    I always liked Anandtech since when I was a kid, and I still do. You guys always have one of the most in-depth reviews (especially on the very technical side) and I like that. But you guys are gaining some very negative publicity on the net.
  • BaronMatrix - Tuesday, March 31, 2009 - link

    Unfortunately, I don't buy from or recommend criminals.
  • carniver - Wednesday, April 1, 2009 - link

    AMDZone is the biggest joke on the internet. I just went there to see how the zealots like abinstein are still doing their damage control; just like before he went on rambling how the Penryn is still weak against Shanghai, and the old and tired excuses like how if people all bought AMD they can drop in upgrades etc etc. ZootyGray...he's the biggest joke on AMDZone. None of them had the mental capacity to accept AMD has been DEFEATED, which is disappointing but funny to say the least
  • duploxxx - Wednesday, April 1, 2009 - link

    It's not just AMDZone, you are just the opposite. Its like in Woodcrest and conroe times, it's not because the high-end cpu is the best of all that the rest of the available cpu's in the line is by default better. It's all about price performance ratio. Like many who were buying the low-end and think they had bought the better system, well wrong bet.

    As mentioned before, why not test the mid range that is where the sales will be. Time to test 5520-5530 against 2380-82 after all those have the same price.
  • carniver - Wednesday, April 1, 2009 - link

    Your argument is valid, however, it just so happens that for low end 1S systems the Penryns are doing just fine against the Shanghais, for higher end 2S systems they used to be limited by memory bandwidth and AMD pulls ahead. No more is this the case, Intel now beats AMD in their own territory.
  • CHADBOGA - Tuesday, March 31, 2009 - link

    You probably also can't afford to buy a computer, so I doubt that Intel will be too concerned with your AMDZone insanity. LOL!!!!
  • smilingcrow - Tuesday, March 31, 2009 - link

    Those grapes you are chewing on sure sound sour to me. Try listening to a few tracks by The Fun Loving Criminals to help take away the bad taste.
  • cjcoats - Tuesday, March 31, 2009 - link

    There's more to HPC applications than you indicate: environmental modeling apps, particularly, tend to be dominated by memory access patterns rather than by I/O or pure computation. Give me a ring if you'd like some help with that -- I'm local for you, in fact...

Log in

Don't have an account? Sign up now