Virtualization & Consolidation

VMmark - which we discussed in great detail here - tries to measure typical consolidation workloads: a combination of a light mail server, database, fileserver, and website with a somewhat heavier java application. One VM is just sitting idle, representative of workloads that have to be online but which perform very little work (for example, a domain controller). In short, VMmark goes for the scenario where you want to consolidate lots and lots of smaller apps on one physical server.

VMWare VMmark
(*) preliminary benchmark data

Cisco has produced the first VMmark score for the Xeon X5600 series. The Cisco server with two X5680s at 3.3GHz achieved an impressive 35.83 score with 26 VMmark tiles. Twenty-six tiles, that is good for 156 VMs! Based on this number we can estimate where the Xeon X5670 will land. The 6174 numbers are based on AMD’s own preliminary data. 

VMmark is a clear victory for the Intel CPUs. Contrary to the SAP market, AMD can play the pricing card here. As long as you do not require dynamic resource scheduling, the software licences costs are nowhere like those of typical ERP projects. So the pricing of the hardware matters more. Also, contrary to other applications, there is no bonus for single threaded performance. The usage models of Databases, 3D Animation software and other all include scenarios where a number of cores will be idling while the others are working very hard. In a virtualization scenario where you are running tons of VMs, single threaded performance does not matter. So while Intel is clearly winning here, servers based on the newest Opteron might still be on the shortlist of those looking for good performance per dollar.

Decision Support benchmark: Nieuws.be vApus Mark I: Performance-Critical applications Virtualized
Comments Locked

58 Comments

View All Comments

  • zarjad - Friday, April 2, 2010 - link

    I understand that HT can be disabled in BIOS and that some benchmarks don't like HT.
  • elnexus - Wednesday, April 21, 2010 - link

    I can report that one of my customers, performing intensive image processing, found that DISABLING hyper-threading on a Nehalem-based workstation, actually IMPROVED performance considerably.

    It seems that certain applications don't like hyper-threading, while others do. I always recommend that my customers perform sensitivity analyses on their computing tasks with HT on and off, and then use whichever is best.
  • tracerburnout - Wednesday, March 31, 2010 - link

    How is it possible that Intel's Xeon X5670 rig returns 19k+ for a score while AMD's magny-cours returns only 2k+?? I only question the results of this benchmark chart because Intel's Xeon X5570 rig returns only around 1k. How can a X5670 be 19x faster than a X5570?? And I doubt the same is true for the magny-cours by being just 10.5% of what the X5670 can do.

    (is there an extra '0' by accident in there?)



    tracerburnout
    proud supporter of AMD, with a few Intel rigs for Linux only
  • JohanAnandtech - Thursday, April 1, 2010 - link

    No, it is just that Sisoft uses the new AES instructions of West-mere. It is a forward looking benchmark which tests only a small part of a larger website code base. So that 19x faster will probably result in 10 to 20% of the complete website being 19x faster. So the real performance impact will be a lot slower. It is interesting though to see how much faster these dedicated SIMD instructions are on these kinds of workloads.
  • alpha754293 - Thursday, April 1, 2010 - link

    If you guys need help with setting up or running the Fluent/LS-DYNA benchmarks let me know.

    I see that you don't really spend as much time writing or tweaking it as you do with some of the other programs, and that to me is a little concerning only because I don't think that it is showing the true potential of these processors if you run it straight out-of-the-box (especially with Fluent).

    Fluent tends to have a LOT of iterations, but it also tends to short-stroke the CPU (i.e. the time required to complete all of the calculations necessary is less than 1 second and therefore; doesn't make full use of the computational ability.)

    Also, the parallelization method (MPICH2 vs. HP MPI) makes a difference in the results.

    You want to make sure that the CPUs are fully loaded for a period of time such that at each iteration, there should be a noticable dwell time AT 100% CPU load. Otherwise, it won't really demonstrate the computational ability.

    With LS-DYNA, it also makes a difference whether it's SMP parallelization or MPP parallelization as well.
  • k_sarnath - Friday, April 2, 2010 - link

    The most baffling part is how linux could engage 12-CPUs much better than windows. I am obviously curious about the OS platform for other tests.. Similary MS SQL was able to scale well on multi-cores... In this context, I am not sure how we can look at the performance numbers... A badly scaling app or OS could show the 12-core one in bad light.
  • OneEng - Saturday, April 3, 2010 - link

    Hi Johan,

    I have followed your articles from the early day's at Ace's and have a good respect for the technical accuracy of your articles.

    It appears that the X5570 scaling between 4 and 8 cores has very little gain in the Oracle Calling Circle benchmark. Furthermore, the 24 cores of MC at 2.2Ghz are way behind. Westmere appears to do quite well, but really should not be able to best 8 cores in the X5570 with all else being equal.

    I have heard some state that the benchmark is thread bound to a low number of threads (don't know if I am buying this), but surely something fishy is going on here.

    It appears that there is either a real world application limit to core scaling on certain types of Oracle database applications (if there are, could you please explain what features an app has when these limits appear), or that the benchmark is flawed in some way.

    I have a good amount of experience in Oracle applications and have usually found that more cores and more memory make Oracle happy. My experience seems at odds with your latest benchmarks.

    Any feedback would be appreciated .... Thanks!
  • JohanAnandtech - Tuesday, April 6, 2010 - link

    I am starting to suspect the same. I am going to dissect the benchmark soon to see what is up. It is not disk related, or at least that surely it is not our biggest problem. Our benchmark might not be far from the truth though, I think Oracle really likes the big L3-cache of the Westmere CPU.

    If you have other ideas, mail at johanATthiswebsiteP
  • heliosblitz2 - Wednesday, April 7, 2010 - link

    You wrote
    Test-Setup:
    Xeon Server 1: ASUS RS700-E6/RS4 barebone
    Dual Intel Xeon "Gainestown" X5570 2.93GHz, Dual Intel Xeon “Westmere” X5670 2.93 GHz
    6x4GB (24GB) ECC Registered DDR3-1333

    "Also notice that the new Xeon 5600 handles DDR3-1333 a lot more efficiently. We measured 15% higher bandwidth from exactly the same DDR3-1333 DIMMs compared to the older Xeon 5570."

    That is not exactly the reason, I think.
    The reason ist you populated the second memory-bank in both setups.
    Intel specification:
    Westmere-1333MHZ-CPUs run with 1333 MHZ with second bank populated while
    Nehalem-1333MHZ-CPUs run with 1066 MHZ with second bank populated

    That could be updated.

    Compare tech docs on Intel site: datasheet Xeon 5500 Part 2 and datasheet Xeon 5600 Part 2

    Arnold.
  • gonerogue - Saturday, April 10, 2010 - link

    The Viper is a V10 and most certainly not a traditional muscle car ;)

Log in

Don't have an account? Sign up now