The Uncore Power of the Nehalem EX

Feeding an octal-core is no easy feat. You cannot just add a bit of cache and call it a day. Much attention was paid to the uncore part. When you need to feed eight fast cores, the L3 cache bandwidth is critical. Intel used a 32 byte wide dual counter-rotating rings system and eight separate banks of 3MB to make sure that the L3 cache could deliver up to 200GB/s at a low 21ns load to use latency. The Last Level Cache is also a snoop filter to make sure that cache coherency traffic does not kill the performance scaling.

An 8-port router regulates the traffic between the memory controllers, the caches, and the QPI links. It adds 18ns of latency and should in theory be capable of routing 120GB/s. Each memory controller has two serial interfaces (SMI) working in lockstep to memory buffers (SMBs) which have the same tasks as the AMBs on FB-DIMMs (Fully Buffered DIMMs). The DIMMs send their bits out in parallel (64 per DIMM) and the memory buffer has to serialize the data before sending it to the memory controller. This allows Intel to give each CPU four memory channels. If the memory interface wasn't serial, the boards would be incredibly complex as hundreds of parallel lines would be necessary.

Each SMI can deliver 6.4GB/s in full duplex or 12.8GB/s of total bandwidth. Each SMB has two DDR3-1066 memory channels, which can deliver 17GB/s half duplex. To transform this 17GB/s half duplex data stream into a 6.4GB/s full duplex, the SMB needs about 10W at the most (TDP). In practice, this means that each SMB needs to dissipate about 7W, hence the small black fans that you will see on the Dell motherboard later.

So each CPU has two memory interfaces that connect to two SMBs that can each drive two channels with two DIMMS. Thus, each CPU supports eight registered DDR3 DIMMs at 1066MHz. By limiting the channels to two DIMMs per DDR channel, the system can support quad-rank DIMMs. So in total, a carefully designed quad-Xeon 7500 server can contain up to 64 DIMMs. As each DIMM can be a quad-ranked 16GB DIMM, a quad-CPU configuration can contain up to 1TB of RAM. So Intel's Nehalem EX platform offers high bandwidth and enormous memory capacity. The flipside of the coin is increased latency and—compared to the total system—a bit of power consumed by the SMBs.

Reliability Features AMD Opteron and Intel Xeon SKUs


View All Comments

  • klstay - Thursday, April 15, 2010 - link

    I agree. Being able to use all the DIMM slots in the R810 with only half the CPU sockets populated is a neat trick, and I do like having up to 16 drive bays in the R910, but overall the latest IBM 3850 is much more flexible than either of those systems. From a 2 socket 4 cores each system with 32GB RAM up to an 8 socket 8 cores each system with 3TB RAM. Barring some big surprises at HPs announcement in a couple of weeks IBM will be the one to beat in Nehalem EX for the foreseeable future. Reply
  • Etern205 - Thursday, April 15, 2010 - link

    The AMD Opteron 6128 isn't $523.
    It's $299.99!

    (credited to: zpdixon @ DT for providing the link)
  • yuhong - Tuesday, June 15, 2010 - link

    "but when a dual-CPU configuration outperforms quad-CPU configurations of your top-of-the-line CPU, something is wrong. "
    Remember Xeon 7100 vs Xeon 5300?

Log in

Don't have an account? Sign up now