Virtualization and Consolidation

VMmark—which we discussed in detail here—tries to measure typical consolidation workloads: a combination of a light mail server, database, fileserver, and website with a somewhat heavier Java application. One VM is just sitting idle, representative of workloads that have to be online but which perform very little work (for example, a domain controller). In short, VMmark goes for the scenario where you want to consolidate lots and lots of smaller apps on one physical server.

VMware VMmark

Very little VMmark benchmark data has been available so far, but it is obvious that this is favorite playing ground of the Xeon 7500. It outperforms an octal 2.8GHz Opteron by a large margin. Granted, the octal Opterons scale pretty badly in most applications, but VMmark is not one of them. It is reasonable to expect that a quad twelve-core Opteron 6100 series will outperform older higher clocked octal six-core Opterons in many applications including SAP, OLTP and data mining benchmarks. After all, the communication between the cores has vastly improved. But VMmark is running many small independent applications, which usually run on the same node, so the chances are slim that the quad Opteron 6100 will come even close to the quad Xeon X7560.

vApus Mark I: Performance-Critical Virtualized Applications

As we've discussed previously, our vApus Mark I benchmark is due for a major overhaul. We found out that the 24 cores of the Opteron 6172 were not at the expected 85-95% CPU load, and thus the numbers reported were under the potential of the twelve-core Opteron. To get an idea of where the Xeon X7560 would land, we disabled Hyper-Threading, as our test is capable of stressing 16 cores/threads easily. The dual Xeon X7560 was about 5% slower than the Xeon X5670 with Hyper-Threading enabled, and about 13% faster than the dual octal-core Opteron 6136 2.4GHz. Considering that we found that performance is about 15% higher due to Hyper-Threading, we estimate that the dual Xeon X7560 at 2.26GHz is about 10% faster than a Xeon X5670 at 2.93GHz, and about 29% faster than the octal 2.4GHz Opteron 6136. So core per core, clock per clock the Xeon X7560 has probably in the neighborhood of a 30% performance advantage over the Opteron. Once vApus Mark II is ready, we'll provide more accurate numbers.

However, that is not enough to win the price/performance or performance/watt comparison. An octal-core Xeon X7560 costs four times more and the server consumes a lot more than a similar (clock speed, core count) Opteron 6136.

SAP S&D 2-Tier Power and Conclusion
POST A COMMENT

23 Comments

View All Comments

  • dastruch - Monday, April 12, 2010 - link

    Thanks AnandTech! I've been waiting for an year for this very moment and if only those 25nm Lyndonville SSDs were here too.. :) Reply
  • thunng8 - Monday, April 12, 2010 - link

    For reference, IBM just released their octal chip Power7 3.8Ghz result for the SAP 2 tier benchmark. The result is 202180 saps for approx 2.32x faster than the Octal chipNehalem-EX Reply
  • Jammrock - Monday, April 12, 2010 - link

    The article cover on the front page mentions 1 TB maximum on the R810 and then 512 GB on page one. The R910 is the 1TB version, the R810 is "only" 512GB. You can also do a single processor in the R810. Though why you would drop the cash on an R810 and a single proc I don't know. Reply
  • vol7ron - Tuesday, April 13, 2010 - link

    I wish I could afford something like this!

    I'm also curious how good it would be at gaming :) I know in many cases these server setups under-perform high end gaming machines, but I'd settle :) Still, something like this would be nice for my side business.
    Reply
  • whatever1951 - Tuesday, April 13, 2010 - link

    None of the Nehalem-EX numbers are accurate, because Nehalem-EX kernel optimization isn't in Windows 2008 Enterprise. There are only 3 commercial OSes right now that have Nehalem-EX optimization: Windows Server R2 with SQL Server 2008 R2, RHEL 5.5, SLES 11, and soon to be released CentOS 5.5 based on RHEL 5.5. Windows 2008 R1 has trouble scaling to 64 threads, and SQL Server 2008 R1 absolutely hates Nehalem-EX. You are cutting Nehalem-EX benchmarks short by 20% or so by using Windows 2008 R1.

    The problem isn't as severe for Magny cours, because the OS sees 4 or 8 sockets of 6 cores each via the enumerator, thus treats it with the same optimization as an 8 socket 8400 series CPU.

    So, please rerun all the benchmarks.
    Reply
  • JohanAnandtech - Tuesday, April 13, 2010 - link

    It is a small mistake in our table. We have been using R2 for months now. We do use Windows 2008 R2 Enterprise. Reply
  • whatever1951 - Tuesday, April 13, 2010 - link

    Ok. Change the table to reflect Windows Server 2008 R2 and SQL Server 2008 R2 information please.

    Any explanation for such poor memory bandwidth? Damn, those SMBs must really slow things down or there must be a software error.
    Reply
  • whatever1951 - Tuesday, April 13, 2010 - link

    It is hard to imagine 4 channels of DDR3-1066 to be 1/3 slower than even the westmere-eps. Can you remove half of the memory dimms to make sure that it isn't Dell's flex memory technology that's slowing things down intentionally to push sales toward R910? Reply
  • whatever1951 - Tuesday, April 13, 2010 - link

    As far as I know, when you only populate two sockets on the R810, the Dell R810 flex memory technology routes the 16 dimms that used to be connected to the 2 empty sockets over to the 2 center CPUs, there could be significant memory bandwidth penalties induced by that. Reply
  • whatever1951 - Tuesday, April 13, 2010 - link

    "This should add a little bit of latency, but more importantly it means that in a four-CPU configuration, the R810 uses only one memory controller per CPU. The same is true for the M910, the blade server version. The result is that the quad-CPU configuration has only half the bandwidth of a server like the Dell R910 which gives each CPU two memory controllers."

    Sorry, should have read a little slower. Damn, Dell cut half the memory channels from the R810!!!! That's a retarded design, no wonder the memory bandwidth is so low!!!!!
    Reply

Log in

Don't have an account? Sign up now