The Best Server CPUs part 2: the Intel "Nehalem" Xeon X5570by Johan De Gelas on March 30, 2009 3:00 PM EST
- Posted in
- IT Computing
That means that Intel's Nehalem platform (and AMD's Shanghai/Opteron 23xx platform) has to convince people to replace their dual-core Opteron, dual-core Xeon 50xx ("Dempsey"), and Xeon "Irwindale" servers. There are two great ways to turn a much more powerful server into a moneymaking and cost saving machine. One is to use fewer servers in a cluster, which is not applicable to all companies. The other more popular approach is to consolidate more servers on the same physical machine by using virtualization. The most important arguments for upgrading your servers are performance/watt and support for virtualization.
Intel's newest platform holds the promise that it supports virtualization better by adding EPT and lower world switch times. However, probably the largest bottleneck in the past was the amount of available bandwidth. Bandwidth is frequently an overrated performance factor, as few applications - excluding the HPC world - get a boost from for example using three instead of two memory channels. That changes dramatically when you are running tens of virtual machines on top of a physical machine: many applications with medium bandwidth demands morph into one big bandwidth-hogging monster. The challenge is thus to provide access to the memory as fast as possible, lower energy consumption, and better support for virtualization. On paper, the Nehalem architecture definitely can play all those trump cards. Anand has provided a detailed description of the Nehalem architecture. The most important improvements for business applications are:
- The integrated memory controller talks to its own local memory or remote memory (NUMA). Memory access takes between 27 and 54 ns (80 to 161 cycles). Compare this to the Xeon 5450 at the same clock speed where memory access via the MC in the chipset can take up to 123 ns! The closest competitor (Opteron "Shanghai") needs between 32 and 71 ns.
- A native quad-core design with fast 33 cycle L3 cache make it easy for the L2 caches to exchange cache coherency information
- Fast CPU interconnects make sure that the rest of the snoops happen very fast and do not interfere with other traffic.
- The memory controller has up to three channels. A dual CPU configuration has access to 35GB/s of memory bandwidth (measured with stream) if you use DDR3-1333. The latest dual Opteron achieves 19.4GB/s with DDR2-800
Basically, Nehalem is Intel's version of the improvements found in the AMD Barcelona platform, only better (or at least that's the goal). Let's see what it can do in reality.