High-End x86: The Nehalem EX Xeon 7500 and Dell R810by Johan De Gelas on April 12, 2010 6:00 PM EST
The Uncore Power of the Nehalem EX
Feeding an octal-core is no easy feat. You cannot just add a bit of cache and call it a day. Much attention was paid to the uncore part. When you need to feed eight fast cores, the L3 cache bandwidth is critical. Intel used a 32 byte wide dual counter-rotating rings system and eight separate banks of 3MB to make sure that the L3 cache could deliver up to 200GB/s at a low 21ns load to use latency. The Last Level Cache is also a snoop filter to make sure that cache coherency traffic does not kill the performance scaling.
An 8-port router regulates the traffic between the memory controllers, the caches, and the QPI links. It adds 18ns of latency and should in theory be capable of routing 120GB/s. Each memory controller has two serial interfaces (SMI) working in lockstep to memory buffers (SMBs) which have the same tasks as the AMBs on FB-DIMMs (Fully Buffered DIMMs). The DIMMs send their bits out in parallel (64 per DIMM) and the memory buffer has to serialize the data before sending it to the memory controller. This allows Intel to give each CPU four memory channels. If the memory interface wasn't serial, the boards would be incredibly complex as hundreds of parallel lines would be necessary.
Each SMI can deliver 6.4GB/s in full duplex or 12.8GB/s of total bandwidth. Each SMB has two DDR3-1066 memory channels, which can deliver 17GB/s half duplex. To transform this 17GB/s half duplex data stream into a 6.4GB/s full duplex, the SMB needs about 10W at the most (TDP). In practice, this means that each SMB needs to dissipate about 7W, hence the small black fans that you will see on the Dell motherboard later.
So each CPU has two memory interfaces that connect to two SMBs that can each drive two channels with two DIMMS. Thus, each CPU supports eight registered DDR3 DIMMs at 1066MHz. By limiting the channels to two DIMMs per DDR channel, the system can support quad-rank DIMMs. So in total, a carefully designed quad-Xeon 7500 server can contain up to 64 DIMMs. As each DIMM can be a quad-ranked 16GB DIMM, a quad-CPU configuration can contain up to 1TB of RAM. So Intel's Nehalem EX platform offers high bandwidth and enormous memory capacity. The flipside of the coin is increased latency and—compared to the total system—a bit of power consumed by the SMBs.