The new Xeon “Westmere” 5600 series, has arrived. Basically an improved 32nm version of the impressive Xeon 5500 series “Nehalem” CPU. The new Xeon won’t make a big splash like the Xeon 5500 series did back in March 2009. But who cares? Each core in the Xeon 5600 is a bit faster than the already excellent performing older brother, and you get an extra bonus. You choose: in the same power envelope you get two extra cores or  5-10% higher clockspeed. Or if you keep the number of cores and clockspeed constant, you can get lower power consumption. The most thrifty quadcore Xeon is now specced at a 40W TDP instead of 60W. 

westmeredie.png
The Westmere Die: an enlarged Nehalem. Trivia: Notice the unused space on the top left

Intel promises up to 40% better performance or up to 30% lower power. The Xeon 5600 can use the same servers and motherboards at the Xeon 5500 after a BIOS update, making the latter almost redundant. Promising, but nothing beats some robust independent benchmarking to check the claims.

So we plugged the Westmere EP CPUs in our ASUS server and started to work on a new Server CPU comparison. Only one real problem: our two Xeon X5670 together are good for 12 cores and 24 simultaneous threads. Few applications can cope with that, so we shifted our focus even more towards virtualization. We added Hyper-V to our benchmark suite, hopefully an answer to the suggestion that we should concentrate on other virtualization platforms than VMware. For those of you looking for Opensource benchmarks, we will follow up with those in April.

Platform Improvements

Westmere is more than just a die shrunk Nehalem. In this review we're taking a look at the Xeon X5670 2.93 GHz, the successor to the 2.93GHz Xeon X5570.

wmfull.png

The most obvious improvement is that the X5670 comes with six instead of four cores, and a 12MB L3 cache instead of an 8MB cache. But there are quite a few more subtle tweaks under the hood:

  • Virtualization : VMexit latency reductions
  • Power management: An “uncore” power gate and support for low power DDR-3
  • TLB improvements: Address Space IDs (ASID) and 1 GB pages
  • Yet another addition to the already incredible crowded x86 ISA (AES_NI).

Just a few years ago,  many ESX based servers used binary translation to virtualize their VMs. Binary translation used clever techniques to avoid transitions to the hypervisor. In the case of the Pentium 4 Xeons, using software instead of hardware virtualization was even a best practice.  As we explained earlier in “Hardware virtualization: the nuts and bolts”, hardware virtualization can be faster than software virtualization so long as VM to hypervisor transitions happen quickly. The new Xeon 5600 Westmere does this about 12% faster than Nehalem.

 

vmexit_wm.png

Pretty impressive,  if you consider that this makes Westmere switch between hypervisor and VM twice as fast as the “Xeon 5400” series (based on the Penryn architecture), which itself was fast. As the share of the VM-hypervisor-VM in hypervisor overhead gets lower, we don’t expect to see huge gains though. Hypervisor overhead is probably already dominated by other factors such as emulating I/O operations.

The Xeon 3400 “Lynnfield” was the first to get an un-core power gate (primarily the L3 cache). An un-core power gate will reduce the leakage power to a minimum if the whole CPU is in a deep sleep state. In typical server conditions, we don’t think this will happen often. Shutting down the un-core means after all that all your cores (even those at the other CPU) should be sleeping too. If only one core is even the slightest bit active, the L3-cache and memory controller must be working. For your information, we discussed server power management, including power gating in detail here.

The fact that Westmere's memory controller supports low power DDR3 might have a much larger impact on the your server’s power consumption. In a server with 32GB or more memory, it is not uncommon for the RAM power consumption to be about quarter of the total server power consumption. Moving to 40nm low power DDR3 drops DRAM voltage from 1.5V to 1.35V, which can make a big impact on that quarter of server power.

Samsung_lowpowerddr3.png

According to Samsung, 48 GB of 40nm low power DDR3 1066 should use on average about 28W (an average of 16 hour idle and 8 hours of load). This compares favorably with the 66W for the early 60nm DDR3 and the currently popular 50nm based DRAM which should consume about 50W. So in a typical server configuration, you could save – roughly estimated – 22W or about 10% of the total server power consumption.

AMD has more than once confirmed that they would not use DDR3 before low power DDR3 was available. So we expect this low power DDR3 to be quite popular.

There is more. The Xeon 5600 also supports more memory and higher clock speeds. You can now use up to two DIMMs at 1333MHz, while the Xeon 5500 would throttle back to 1066MHz if you did this. The Xeon 5500 was also limited to 12 x 16 GB or 192 GB. If you have very deep pockets, you can now cram 18 of those ultra expensive DIMMs in there, good for 288 GB of DDR3-1066!

Deeper buffers allow the memory controller of the Westmere to be more efficient: a dual Xeon 5670 reaches 43 GB/s while the older X5570 was stuck at 35 GB/s with DDR-3 1333. That will make the X5670 quite a bit faster than its older brother in bandwidth intensive HPC software.

Exotic Improvements and SKUs
POST A COMMENT

39 Comments

View All Comments

  • landerf - Tuesday, March 16, 2010 - link

    I always wish with these server/workstation part reviews that we could get a gaming page just for kicks. Specifically in this case I'm thinking of the upcoming dual socket EVGA board and if it will have any effect on games or if it will be only synthetics that show a benefit. I'd also like to see a modern workstation card vs it's mainstream counterpart to see if the gaming performance gap has gotten smaller or larger over the years. I think recently there's been a push to make workstation cards do better in 3d games so you can test your work on the same rig, cutting back on the number of systems. Reply
  • GeorgeH - Tuesday, March 16, 2010 - link

    I'd also be curious to see the E5620 overclocked in a consumer board, as its ~$400 price fills the hole between the ~$300 i7-920/930 and the ~$600 i7-950 rather nicely.

    Intel's PR people would probably get pissed, but screw 'em.
    Reply
  • jonup - Tuesday, March 16, 2010 - link

    I was about to post the same. There is a lot of people using Xeons in X58 and P55 boards. Some prefer the lower power consumption others beleive the Xeons oc better. Please show us the money! Reply
  • DigitalFreak - Tuesday, March 16, 2010 - link

    You do realize that the 55xx/56xx series Xeons only work in dual socket motherboards?!? Reply
  • GeorgeH - Tuesday, March 16, 2010 - link

    I think you've got that backwards. A dual socket motherboard needs 5-series chips, but a 5-series should work in a single socket board just fine. In general it'd be silly to run only one (a 2.66GHz W3520 costs ~$300 while a 2.66GHz X5550 costs ~$1000) but if the cheapest 32nm LGA-1366 chip is a 5-series Xeon it might be worth it. Reply
  • jonup - Wednesday, March 17, 2010 - link

    But you can get E5520 @2.26GHz for $390 and get a faster QPI. Reply
  • greylica - Tuesday, March 16, 2010 - link

    Blender 3D 2.50 in his Alpha 2 Stage supports 64 simultaneous Threads, and it's not hard to make benchmarks, and I am missing Blender 3D benchmarks in every processor launch, what happened with ''Blender 3D Character benchamrks'' ?
    Blender can extract blood from those ''beasts''...
    Reply
  • JohanAnandtech - Wednesday, March 17, 2010 - link

    I have indeed heard more than once that Blender is getting really popular. "Alpha 2" does not sound like the software is very stable. Any suggestion to what kind of scene I should use? The scene choice is very important as the parallel rendering part must be long enough compared to some of serial parts in the process. You can mail me at johan@anandtech.com if you like. I am open to suggestions. Reply
  • MySchizoBuddy - Tuesday, March 16, 2010 - link

    Also add HPC related benchmarks Reply

Log in

Don't have an account? Sign up now