Expensive Quad Sockets vs. Ubiquitous Dual Socketsby Johan De Gelas on October 6, 2009 1:00 AM EST
- Posted in
- IT Computing
AMD's dual and quad platform: consistency
AMD's PR is making a lot of noise about consistency, and rightly so. The quad socket and dual socket processors are - besides the obviously different multiprocessor capabilities - exactly the same. In the case of virtualization, this allows you to optimize your virtual machines and hypervisors once and then clone them as much as you like. There are fewer worries when moving virtual machines around, and there is no fiddling with masking processor capabilities. This is also well illustrated when you check what mode the VMware ESX virtual machines run. The table is pretty simple when you look at VMs running on top of an AMD processor: the virtual machines running on dual Opterons will run software virtualization, while the quad-cores will almost always run in the fastest mode (hardware virtualization combined with hardware assisted paging). The same is true for Hyper-V: it won't run on the dual-core Opterons and it will run at full speed on the quad-cores. It is remarkably simple compared to the complete mess Intel made: some of the old Pentium 4 based CPUs support VT-x, some don't. Some of the lower end Xeons launched in 2007 and 2008 don't and so on.
There is some inconsistency on HyperTransport and L3 cache speeds, but those will only cause small performance variations and no software management troubles. Of course, AMD's very consistent dual and quad socket platform is not without flaws either. The NVIDIA MCP55 Pro chipset was at times pretty quirky when installing new virtualization software. Most of the time, a patch took care of that, and the Opteron servers were running rock solid afterwards, but in the meantime a lot of valuable time was wasted. Also, the current platform has not evolved for years and is starting to show its age: we found out that the motherboards consume a bit more power than they should. In 2010, all Opteron server platforms will use AMD chipsets only.
The core part of the new hex-core Opteron is the identical to that of the quad-core, but the "uncore" part has some improvements. With the exception of the 2.8GHz 2387/8387 and 2.9GHz 2389/8389, most quad-core Opterons still connect with 1GHz HyperTransport links. The hex-core Opteron runs with speeds between 2 and 2.4GHz. The hex-core Opteron always connects to the other CPUs in the server via 2.4GHz HyperTransport links. That makes little difference in a 2P server, but performance gets quickly limited by interconnection speeds in 4P. Even at 2.4GHz (9.6GB/s interconnect), probe broadcasting can limit performance, and that is why you can reserve up to 1MB of cache for a snoop filter. These improvements make the hex-core Opteron a more interesting choice than the quad-core Opterons - even at lower clock speeds - for quad socket servers.
In fact, we feel that besides the very low power Opteron 2377 EE, the quad-core Opterons are of little use. If your application scales relatively badly, there is the X55xx series which offers much better "per thread" performance. If your application scales well, two 2.6GHz Opteron 2435 will offer 15% better (and sometimes more) performance than a 2.9GHz Opteron 2389 with the same power consumption. Using relatively "old" technology such as DDR2, the hex-core Opteron based servers are very affordable, especially if you compare them with similar Xeon servers.
The Intel Dual socket platform: pricey performance and performance/watt champion
We have already tested the new dual socket "Nehalem" Xeon platform. It is the platform with the fastest interconnects, the most threads per socket (thanks to Hyper-Threading), the most bandwidth (triple-channel) and the most modern virtualization features (Intel VT-D). Even the top models are far from power hogs: at full load, the X5570 offers an excellent performance/watt ratio. The low-power L5520 at 2.26GHz was a real champion in our performance per watt tests and is available at reasonable prices.
The relatively new platform (chipset, DDR3) is still on the expensive side: a similarly configured Dell R710 (two Xeon 5550 2.66GHz, 8 x 4GB 1066MHz DDR3) costs about one third more than a Dell R805 (Two Opteron 2435, 8 x 4GB 800MHz DDR2): $5047 versus $3838 (pricing at the end of September 2009). If you chose the Xeon platform, you should be aware of the fact that Intel's low end is much less interesting: the best Xeon 55xx CPUs have a clock speed between 2.26 and 2.93GHz. The low end models, the 5504 and 5506 are pretty crippled, with no Hyper-Threading, no Turbo Boost, and only half as much L3 cache (4MB). These crippled CPUs can keep up with the quad-core Opterons at about 2.5GHz, but they are the worst Xeons when you look at idle and full load power. The performance per Watt of the Xeon EE550x is pretty bad compared to the more expensive parts.
The Intel Quad socket platform
There is no quad socket version of Intel's excellent "Xeon Nehalem" platform. We will have to wait until the Nehalem-EX servers ship in the beginning of 2010. At that time, servers with the octal-core 24MB L3 cache CPU will almost certainly end up in a higher price class than the current quad socket servers. One indication is that Intel positions the Nehalem-EX as a RISC market killer. Then again, Intel might as well bring out quad-core versions too. We will have to wait and see.
So there's no Hyper-Threading, Turbo Boost, EPT, NUMA, or fast interconnects for the current Xeon "Dunnington" platform, which is still based on a "multi independent FSB" topology. It has massive amounts of bandwidth in theory (up to 21GB/s), but unfortunately less than 10GB/s is really available. Snooping traffic consumes lots of bandwidth and increases the latency of cache accesses. The 16MB L3 cache should lessen the impact of the relatively slow memory subsystem, but it is only clocked at half the clock speed of the core. A painful 100 cycle latency is the result, but luckily every two cores also have a shared and fast 3MB L3 cache.
When it was first launched, the Xeon MP defeated the AMD alternatives by a good margin in ERP and heavy database loads. It reigned supreme in TPC-C and broke a few new records. More importantly it took back 9% of market share in the quad socket market according to the IDC Worldwide Server Tracker. But at that time, the 2.66GHz hex-core had to compete with a 2.5GHz quad-core Opteron with a paltry 2M of shared L3, and AMD has been working hard on a comeback. The massive Intel chip (503 mm2) has to face a competitor that has three times as much L3 cache and 50% more cores at higher clock speeds, and that is not all: the DDR2-800 DIMMs deliver up to 42GB/s or four times as much bandwidth to the four AMD chips. At the same time, the Xeon behemoth has to outpace the ultra modern Dual Xeon platform by a decent margin to justify its much higher price.