AMD's dual and quad platform: consistency

AMD's PR is making a lot of noise about consistency, and rightly so. The quad socket and dual socket processors are - besides the obviously different multiprocessor capabilities - exactly the same. In the case of virtualization, this allows you to optimize your virtual machines and hypervisors once and then clone them as much as you like. There are fewer worries when moving virtual machines around, and there is no fiddling with masking processor capabilities. This is also well illustrated when you check what mode the VMware ESX virtual machines run. The table is pretty simple when you look at VMs running on top of an AMD processor: the virtual machines running on dual Opterons will run software virtualization, while the quad-cores will almost always run in the fastest mode (hardware virtualization combined with hardware assisted paging). The same is true for Hyper-V: it won't run on the dual-core Opterons and it will run at full speed on the quad-cores. It is remarkably simple compared to the complete mess Intel made: some of the old Pentium 4 based CPUs support VT-x, some don't. Some of the lower end Xeons launched in 2007 and 2008 don't and so on.

There is some inconsistency on HyperTransport and L3 cache speeds, but those will only cause small performance variations and no software management troubles. Of course, AMD's very consistent dual and quad socket platform is not without flaws either. The NVIDIA MCP55 Pro chipset was at times pretty quirky when installing new virtualization software. Most of the time, a patch took care of that, and the Opteron servers were running rock solid afterwards, but in the meantime a lot of valuable time was wasted. Also, the current platform has not evolved for years and is starting to show its age: we found out that the motherboards consume a bit more power than they should. In 2010, all Opteron server platforms will use AMD chipsets only.

The core part of the new hex-core Opteron is the identical to that of the quad-core, but the "uncore" part has some improvements. With the exception of the 2.8GHz 2387/8387 and 2.9GHz 2389/8389, most quad-core Opterons still connect with 1GHz HyperTransport links. The hex-core Opteron runs with speeds between 2 and 2.4GHz. The hex-core Opteron always connects to the other CPUs in the server via 2.4GHz HyperTransport links. That makes little difference in a 2P server, but performance gets quickly limited by interconnection speeds in 4P. Even at 2.4GHz (9.6GB/s interconnect), probe broadcasting can limit performance, and that is why you can reserve up to 1MB of cache for a snoop filter. These improvements make the hex-core Opteron a more interesting choice than the quad-core Opterons - even at lower clock speeds - for quad socket servers.

In fact, we feel that besides the very low power Opteron 2377 EE, the quad-core Opterons are of little use. If your application scales relatively badly, there is the X55xx series which offers much better "per thread" performance. If your application scales well, two 2.6GHz Opteron 2435 will offer 15% better (and sometimes more) performance than a 2.9GHz Opteron 2389 with the same power consumption. Using relatively "old" technology such as DDR2, the hex-core Opteron based servers are very affordable, especially if you compare them with similar Xeon servers.

The Intel Dual socket platform: pricey performance and performance/watt champion

We have already tested the new dual socket "Nehalem" Xeon platform. It is the platform with the fastest interconnects, the most threads per socket (thanks to Hyper-Threading), the most bandwidth (triple-channel) and the most modern virtualization features (Intel VT-D). Even the top models are far from power hogs: at full load, the X5570 offers an excellent performance/watt ratio. The low-power L5520 at 2.26GHz was a real champion in our performance per watt tests and is available at reasonable prices.

The relatively new platform (chipset, DDR3) is still on the expensive side: a similarly configured Dell R710 (two Xeon 5550 2.66GHz, 8 x 4GB 1066MHz DDR3) costs about one third more than a Dell R805 (Two Opteron 2435, 8 x 4GB 800MHz DDR2): $5047 versus $3838 (pricing at the end of September 2009). If you chose the Xeon platform, you should be aware of the fact that Intel's low end is much less interesting: the best Xeon 55xx CPUs have a clock speed between 2.26 and 2.93GHz. The low end models, the 5504 and 5506 are pretty crippled, with no Hyper-Threading, no Turbo Boost, and only half as much L3 cache (4MB). These crippled CPUs can keep up with the quad-core Opterons at about 2.5GHz, but they are the worst Xeons when you look at idle and full load power. The performance per Watt of the Xeon EE550x is pretty bad compared to the more expensive parts.

The Intel Quad socket platform

There is no quad socket version of Intel's excellent "Xeon Nehalem" platform. We will have to wait until the Nehalem-EX servers ship in the beginning of 2010. At that time, servers with the octal-core 24MB L3 cache CPU will almost certainly end up in a higher price class than the current quad socket servers. One indication is that Intel positions the Nehalem-EX as a RISC market killer. Then again, Intel might as well bring out quad-core versions too. We will have to wait and see.

So there's no Hyper-Threading, Turbo Boost, EPT, NUMA, or fast interconnects for the current Xeon "Dunnington" platform, which is still based on a "multi independent FSB" topology. It has massive amounts of bandwidth in theory (up to 21GB/s), but unfortunately less than 10GB/s is really available. Snooping traffic consumes lots of bandwidth and increases the latency of cache accesses. The 16MB L3 cache should lessen the impact of the relatively slow memory subsystem, but it is only clocked at half the clock speed of the core. A painful 100 cycle latency is the result, but luckily every two cores also have a shared and fast 3MB L3 cache.

When it was first launched, the Xeon MP defeated the AMD alternatives by a good margin in ERP and heavy database loads. It reigned supreme in TPC-C and broke a few new records. More importantly it took back 9% of market share in the quad socket market according to the IDC Worldwide Server Tracker. But at that time, the 2.66GHz hex-core had to compete with a 2.5GHz quad-core Opteron with a paltry 2M of shared L3, and AMD has been working hard on a comeback. The massive Intel chip (503 mm2) has to face a competitor that has three times as much L3 cache and 50% more cores at higher clock speeds, and that is not all: the DDR2-800 DIMMs deliver up to 42GB/s or four times as much bandwidth to the four AMD chips. At the same time, the Xeon behemoth has to outpace the ultra modern Dual Xeon platform by a decent margin to justify its much higher price.

Index What Intel and AMD Are Offering
Comments Locked

32 Comments

View All Comments

  • Photubias - Wednesday, October 7, 2009 - link

    This is surely to be tested, but the Fiorano platform (as this AMD Chipset is called), is yet to be released.
  • solori - Wednesday, October 7, 2009 - link

    Fiorano (SR5690/SP5100, et al) are out now for Socket-F and really require an Istanbul to show their stuff (like IOV, etc). With a minor tweak on HT bus speeds, don't expect to see much improvement in memory bandwidth for Fiorano/Socket-F pairings. Where you should see improvement is in power consumption - pairing HE/EE Istanbul parts with Fiorano/Kroner should create a better performance/watt result in virtualization.

    Collin C. MacMillan
    http://blog.solori.net">http://blog.solori.net
  • bpdski - Tuesday, October 6, 2009 - link

    It is pretty amazing how fast the new 55xx chips are. Personally, I am holding out on any new server purchases and deployments until the EX systems come out next year. I am pretty excited about the performance potential of a dual or quad octal-core system. I feel for AMD, but if the EX systems scale as well as they should, they are really going to crush the Opterons.
  • duploxxx - Wednesday, October 7, 2009 - link

    2 answers to that, first off all looking at the design EX will be way more expensive creating a gap between 2 socket-4 socket platform even when only deploying 2 octa will be a very expensive baseline due to the motherboard layout. To expensive actually and a lot of focus trying to get risc/sparc marketshare.

    Second don't you think AMD knows this? The c32 G34 platform launch is much closer then people think, AMD made a clear roadmap and since 45nm all looks like going well on shape, keep in mind the cpu for the new platform is almost ready since it is based on istanbul and the new platform chipset was also released few weeks ago for the socket F platform, you will also see much more OEM activity with this platform due to one brand supplier, no longer need of the old nvidia/broadcom.

    EX was delayed-delayed-delayed if it continues like this it will be launched more or less at the same time, so keep the feeling. BTW even if the 55xx sereis would be again a bad performing server part (which it is finally not thank you intel) 75% of the market would be still buying it just for the brand name.....:)
  • cosminliteanu - Tuesday, October 6, 2009 - link

    Many thanks for this article !
    :)
  • BrightCandle - Tuesday, October 6, 2009 - link

    A dual socket will easily fit in a 1U. But 1.25A is some serious extra cost within a colo.

    The 2U quad sockets on the other hand are a busting 500W+, again serious extra money in a colo.

    The Colo's want you using 0.5A per 1U, there is a major mismatch from these machines to the reality of the power you can actually get. Love the speed, not liking the cost of running them.
  • sonicdeth - Tuesday, October 6, 2009 - link

    Thanks for this. Personally I can't recommend any of the quad socket systems until we see Intels Nehalem-EX early next year. The dual core 55xx series is just fantastic for the price (especially with VMware). We've deployed several HP 380G6's and couldn't be happier.
  • Bazili - Tuesday, October 6, 2009 - link

    Great article. Congrats!!!

    Could you pleas include a software price analysis? I guess it can show huge differences among a 24 core box and a 8 core box.


  • tobrien - Tuesday, October 6, 2009 - link

    these are amazing articles, you guys do such an awesome job with these.

    thanks a ton!
  • JohanAnandtech - Wednesday, October 7, 2009 - link

    Thanks for the kudos! much appreciated :-)

Log in

Don't have an account? Sign up now