Back to Article

  • blasterrr - Thursday, January 28, 2010 - link

    how about itanium 2 benchmarks.
    we use itanium 2 in our company for our SAP Systems. i d like to compare itanium 2 performance with x86 performance.
    does anyone know which architecture is better for most sap applications?
  • joekraska - Thursday, October 08, 2009 - link


    I run a large virtualization enterprise for a fortune 500 company. The platform of choice for virtualization is two socket systems. There are several reasons for this. First, VMWare charges roughly $2600 per socket. Second, 4 socket systems don't generally double the performance of two 2 socket systems. Third, 4 socket systems cost significantly more than two socket systems. Finally, the best 2 socket systems for virtualization have a large number of DIMM slots per cpu (e.g., our choice: Dell M/R710, 9 DIMM slots per CPU, or theoretically CISCO UCS 250, 24 slots per cpu), and virtualization enterprises want memory. VMWare doesn't charge you for the amount of memory you install, and that's what you need: memory.

    As an aside I favor 2 socket systems categorically. If and only if someone has a high-count single system SMP need would I consider or permit anything else. 4 socket systems cost too much for what you get. It requires a problem that can not be solved without one to justify the investment.

    Joe Kraska
    San Diego CA
  • solori - Wednesday, October 07, 2009 - link

    The vAPUS tile graphics marked as 2345 are really 2435... What happended to the 2389 in those tests? Reply
  • solori - Wednesday, October 07, 2009 - link


    Good follow-up to your earlier comparisons. A lot of work goes into these things and your team's done compiling the information here. I have just a few comments:

    With respect to the VMmark reference, you've taken a vector value (X@Y) and made a scalar out of if. The performance number (X) is granted across a number of VMs running (Y, in tiles) which, in turn, helps to increase the scalar part you refer to a "speed" (i.e. 13% slower, etc.) In fact, your speed component could be determined by taking the X/Y and looking at the "tile ratio" to determine unit performance per tile. In doing so, you should see the "performance" gap close a bit.

    This evaluation method also lends itself to what VMmark was created to achieve - a determination of performance as the platform scales across VMs. In other words, the implication of VMmark is that a system cannot scale due to its constituent applications being thread bound. By employing virtualization, the net number of active threads is maximized with little degradation on the per-application performance. When resource availability is impacted, the number of application groups (tiles) is at its maximum.

    Perhaps a significant reason VMmark and vAPUS differ so widely is that VMmark creates a case for resource exhaustion and vAPUS use of resources is more arbitrary. Fitting a benchmark to the available resources for one system seems very hard to avoid, and your attention to hex-core versus quad-core scheduling is right on point -hence the significant difference in vAPUS.1 results. Kudos again for taking that into account - it is something that systems architects need to be more aware of and a lot of benchmarks step around.

    One interesting result of the 24 vCPU case is that the difference between 2P,12-core opteron and 2P,16-thread Xeon is down to the ratio of their clock speeds. Likewise, the difference between the 4P and 2P cases would indicate that the number of vAPUS tiles could have been increased for those systems.

    The issue still puzzling me about vAPUS is the sizing of the OLTP VMs. On the AMD machines you have in the lab, you could easily use the full database size with memory to spare and increase the size of RAM to the VMs accordingly. Doing so in the memory-cramped EP box would likely cripple its performance, but produce an admittedly more "real world" result. We don't see databases getting smaller out there anytime soon, nor do we see them being split-up to fit nominal hardware... The 24GB Xeon is kind of base-model compared to the 64GB Opterons - you might want to reconsider your testing policy where that's concerned.

    On the virtualization use case, you cannot divorce the CAPEX economics of "right-sizing" your memory component. Too little memory and the Xeon has more threads than you can practically use. Too much memory, and you either out-strip the thread capacity (AMD or Intel) and get into higher $/VM due to memory costs. With 8GB/DDR2 about 1/2 the cost of 8GB/DDR3 (reg, ecc for both) you are looking at memory being the largest single factor between 5500's and 2400's where $/VM is concerned. Mixing consolidation and performance workloads across a VMM cluster (i.e. DRS in VMware) make the value of additional GB/core per-platform important.

    Likewise, if you look at mature virtualization market approaches - rack or blade systems - you will not see many 2P systems force-fitted into 4P use cases. Likewise, you will not see 4P systems used where 2P systems would suffice. Therein, the advantage to having a 2P and 4P eco-system that supports seamless migration (i.e. vMotion or Live Migration) requires (today) coming down in one camp or the other. In this case, the advantage lies with AMD (for now), and your report shows that to be a decent choice.

    I agree that EX will create a significant price gulf between EP and likely not help the Intel case in the 4P use virtualization use case. With AMD's Magny-Cours on track for Q1/2010 in 2P and later 4P (same basic platform) use cases today that are solid 4P Istanbul contenders have a drop-in for 2P Magny-Cours with solid enhanced migration capabilities. This can't do anything but put pressure on Intel to create a 4P competitor in both capability and price for AMD's offering.

    We've done significant research in terms of CPU/memory pairings to find the "sweet spot" in $/VM which points to a lag in the market for consolidation utilization (or at least market intelligence). If the "typical" utilization scenario is 12-18 VM's, it is clear from your results that a significant amount of potential is wasted in either Nehalem or Istanbul platform. To maximize return, $/VM and Watt/VM must be considered in the deployment, pushing those numbers up by at least 50% per host. That said, memory re-enters the equation as a limiting factor - well beyond the 20GB in today's vAPUS test case.

    As for the hex-core Xeon, the writing was on the wall in the virtualization use case as 8P/quad-core Opterons have proven all but equal on performance (about 95%) to 8P/hex-core Xeons. Dunnington's power use did not help its cause either...

    Like you indicated in your piece, specialized systems like Twin2 and blades create a better performance/watt opportunity for both 5500 and 2400 platforms (especially with Fiorano and SR5600 socket-F options.) Perhaps as great follow-up for this series would be a Twin2 comparison of the 5500 and 2400 variants...

    Collin C. MacMillan
    Solution Oriented LLC">
  • JohanAnandtech - Thursday, October 08, 2009 - link

    Hi Collin,

    There is too much interesting stuff in your reaction to address every good point you make, so I will take a bit more time to digest this and send you an e-mail.

    A few things on top of my head. Yes, a 64 GB Quad Opteron machine using only 20 GB or so is not optimal. At the same time we verify DQL (Disk Queue Length) so we are pretty sure that you are not going to gain much from making the cache larger. I'll check, maybe we simplified too much there. The reason for doing this is keeping things simple, as it is already hard enough to control the complexity of virtualized benchmarking. It is good suggestion to increase the cache size of the OLTP component for systems with larger amounts of memory, I'll think about it.

    The resource exhaustion as done by VMmark is not perfect either as you might be going for maximum throughput at the cost of the response time of individual applications. It is a pretty hard exercise, I Guess we'll have to set a certain SLA: a max response time for each app and then measure total throughput.
  • skrewler2 - Wednesday, October 07, 2009 - link

    Why do you never use a Sun box for your benchmarks? Reply
  • JohanAnandtech - Wednesday, October 07, 2009 - link

    Which Sun box do you have in mind? And of course, like everyone, we are waiting to see what Oracle will do with Sun . While the Sun people used to send us testservers quite a few times a year or two ago, it is been very silent the past year. Reply
  • duploxxx - Wednesday, October 07, 2009 - link

    Great article, however it might have been a bit more interesting if you would also start to add priceranges, comparing the best at all time is nice, but many people start to think that whatever version thye might buy will always be a better choice for them because they saw the highest benchmarks. Reply
  • duploxxx - Wednesday, October 07, 2009 - link

    edit, wasn't finished yet :)

    knowing that you can buy a 2s E5530 2.4ghz system at the same price as a 2s 2435 2.6GHZ might bring already a whole different perspective.

    Also comparing 2s against 4s is nice and i really like your virtual benchmark, it gives much more realistic results just as we have seen in our own sw benchmarking that Vmmark is no longer representative to real world. You still can't compare power consumption. First of all LP dimms costs now as much as normal dimms and secondly you only require 20GB ram in your test as you mentioned so 44GB is wasted but is still consuming a lot of power.

    Choosing between 2s and 4s is a difficult choice, we deploy about 400-500 2s servers a year on VM, preferring more availability amount then bigger servers, 4s also needs a lot more fine tuning on IO then a 2s does, for sure if you use DRS, hitting the farm much harder on HA failure etc. Since 2004 we started on AMD and not moving back to intel just because they now have 1 decent server platform and as mentioned, check price/performance and think again if 55xx are so more far superior then 24xx series if you buy mid level servers like 90% of the server market does. Oh and we like Enhanced Vmotion off course.

  • SLIM - Tuesday, October 06, 2009 - link

    Any thoughts on the effect of AMD's new server chipset on vmmark performance ("> They claim it will help with I/O and particularly virtualized I/O performance. Reply
  • Photubias - Wednesday, October 07, 2009 - link

    This is surely to be tested, but the Fiorano platform (as this AMD Chipset is called), is yet to be released. Reply
  • solori - Wednesday, October 07, 2009 - link

    Fiorano (SR5690/SP5100, et al) are out now for Socket-F and really require an Istanbul to show their stuff (like IOV, etc). With a minor tweak on HT bus speeds, don't expect to see much improvement in memory bandwidth for Fiorano/Socket-F pairings. Where you should see improvement is in power consumption - pairing HE/EE Istanbul parts with Fiorano/Kroner should create a better performance/watt result in virtualization.

    Collin C. MacMillan">
  • bpdski - Tuesday, October 06, 2009 - link

    It is pretty amazing how fast the new 55xx chips are. Personally, I am holding out on any new server purchases and deployments until the EX systems come out next year. I am pretty excited about the performance potential of a dual or quad octal-core system. I feel for AMD, but if the EX systems scale as well as they should, they are really going to crush the Opterons. Reply
  • duploxxx - Wednesday, October 07, 2009 - link

    2 answers to that, first off all looking at the design EX will be way more expensive creating a gap between 2 socket-4 socket platform even when only deploying 2 octa will be a very expensive baseline due to the motherboard layout. To expensive actually and a lot of focus trying to get risc/sparc marketshare.

    Second don't you think AMD knows this? The c32 G34 platform launch is much closer then people think, AMD made a clear roadmap and since 45nm all looks like going well on shape, keep in mind the cpu for the new platform is almost ready since it is based on istanbul and the new platform chipset was also released few weeks ago for the socket F platform, you will also see much more OEM activity with this platform due to one brand supplier, no longer need of the old nvidia/broadcom.

    EX was delayed-delayed-delayed if it continues like this it will be launched more or less at the same time, so keep the feeling. BTW even if the 55xx sereis would be again a bad performing server part (which it is finally not thank you intel) 75% of the market would be still buying it just for the brand name.....:)
  • cosminliteanu - Tuesday, October 06, 2009 - link

    Many thanks for this article !
  • BrightCandle - Tuesday, October 06, 2009 - link

    A dual socket will easily fit in a 1U. But 1.25A is some serious extra cost within a colo.

    The 2U quad sockets on the other hand are a busting 500W+, again serious extra money in a colo.

    The Colo's want you using 0.5A per 1U, there is a major mismatch from these machines to the reality of the power you can actually get. Love the speed, not liking the cost of running them.
  • sonicdeth - Tuesday, October 06, 2009 - link

    Thanks for this. Personally I can't recommend any of the quad socket systems until we see Intels Nehalem-EX early next year. The dual core 55xx series is just fantastic for the price (especially with VMware). We've deployed several HP 380G6's and couldn't be happier. Reply
  • Bazili - Tuesday, October 06, 2009 - link

    Great article. Congrats!!!

    Could you pleas include a software price analysis? I guess it can show huge differences among a 24 core box and a 8 core box.

  • tobrien - Tuesday, October 06, 2009 - link

    these are amazing articles, you guys do such an awesome job with these.

    thanks a ton!
  • JohanAnandtech - Wednesday, October 07, 2009 - link

    Thanks for the kudos! much appreciated :-) Reply
  • rbbot - Tuesday, October 06, 2009 - link

    Surely the high price of 8GB Dimms isn't going to last very long, especially with Samsung about to launch 16GB parts soon. Reply
  • Calin - Wednesday, October 07, 2009 - link

    8GB DIMMs have two markets: one would be upgrade from 4GB or 2GB parts in older servers, the other would be more memory in cheaper servers. As the demand can be high, it all depends on the supply - and if the supply is low, prices are high.
    So, don't count on the price of 8GB DIMMs to decrease soon
  • Candide08 - Tuesday, October 06, 2009 - link

    One performance factor that has not improved much over the years is the decrease in percentage of performance gains for additional cores.

    A second core adds about 60% performance to the system.
    Third, fourth, fifth and sixth cores all add lower (decreasing) percentages of real performance gains - due to multi-core overhead.

    A dual socket dual core system (4 processors) seems like the sweet spot to our organization.
  • Calin - Wednesday, October 07, 2009 - link

    If your load is enough to fit into four processors, then this is great. However, for some, this level of performance is not enough, and more performance is needed - even if paying four times as much for twice as much performance Reply
  • hifiaudio2 - Tuesday, October 06, 2009 - link

    FYI the R710 can have up to 192gb of ram...


    not cheap :) but possible

  • JohanAnandtech - Tuesday, October 06, 2009 - link

    at $300 per GB, or the price of 2 times 4 GB DIMMs, I don't think 16 GB DIMMs are going to be a big success right now. :-) Reply
  • wifiwolf - Wednesday, October 07, 2009 - link

    for at least 5 years you mean Reply
  • mamisano - Tuesday, October 06, 2009 - link

    Great article, just have a question about the power supplies. Why do the quad-core servers need a 1200W PSU if the highest measured load was 512W? I know you would like to have some head-room but it looks to me that a more efficient 750 - 900W PSU may have provided better power consumption results... or am I totally wrong? :) Reply
  • JarredWalton - Tuesday, October 06, 2009 - link

    Maximum efficiency for most PSUs is obtains at a load of around 40-60% (give or take), so if you have a server running mostly under load you would want a PSU rated at roughly twice the load power. (Plus a bit of headroom, of course.) Reply
  • JohanAnandtech - Wednesday, October 07, 2009 - link

    Actually, the best server PSUs are now at maximum efficiency (+/- 3%) between 30 and 95% load.

    For example:">

    And the reason why our quads are using 1000W PSUs (not 1200) is indeed that you need some headroom. We do not test the server with all DIMM slots filled and you also need to take in account that you need a lot more power when starting up.
  • Casper42 - Tuesday, October 06, 2009 - link

    I know its late, but on page 4 of this article you say your using a Dual 2389 setup where each chip is Quad Core.

    Somehow that morphs into a "Quad Opteron 2389" on page 6 both in the text and in the graphic. Since a Quad 2xxx is not possible, is this a Dual 2389 or a Quad 8389?

    Then on page 7 it becomes a Quad Opteron 8389

    Am I losing my mind?

    I see now that both a Quad 8389 and a Dual 2389 are listed in Page 4, but why on earth did you guys bounce back and forth so much between them?
  • JohanAnandtech - Tuesday, October 06, 2009 - link

    You are not losing your mind. The Quad 2389 is a quad 8389 of course. I have fixed the error. Thanks.

    The dual sockets machines were mostly used to check how the software scales (MS SQL server, virtualization) and how the power consumption compares to the quad socket machines. I hope this makes it clear?

Log in

Don't have an account? Sign up now