Understanding Nehalem's Memory Architecture

Nehalem does spice things up a bit in the memory department, not only does it have an integrated memory controller (a first for an x86 Intel CPU) but the memory controller in question has an unusual three-channel configuration. All other AMD and Intel systems use dual channel DDR2 or DDR3 memory controllers; with each channel being 64-bits wide, you have to install memory in pairs for peak performance.

With a three-channel DDR3 memory controller, Nehalem requires the use of three DDR3 modules to achieve peak bandwidth - which also means that the memory manufacturers are going to be selling special 3-channel DDR3 kits made specifically for Nehalem. Motherboard makers will be doing one of two things to implement Nehalem's three-channel memory interface on boards; you'll either see boards with four DIMM slots or boards with six:


Four DDR3 slots, three DDR3 channels

In the four-slot configuration the first three slots correspond to the first three channels, the fourth slot is simply sharing one of the memory channels. The downside to this approach is that your memory bandwidth drops to single-channel performance as you start filling up your memory. For example, if you have 4 x 1GB sticks, the first 3GB of memory will be interleaved between the three memory channels and you'll get 25.6GB/s of bandwidth to data stored in the first 3GB. The final 1GB however won't be interleaved and you'll only get 8.5GB/s of bandwidth to it. Despite the unbalanced nature of memory bandwidth in this case, your aggregate bandwidth is still greater in this configuration than a dual-channel setup.

 


Six DDR3 slots, two slots per DDR3 channel

The more common arrangement will be six DIMM slots where each DDR3 channel is connected to a pair of DIMM slots. In this configuration as long as you install DIMMs in triplicate you'll always get the full 25.6GB/s of memory bandwidth.

That discussion is entirely theoretical however, the real question is: does Nehalem's triple-channel memory controller actually matter or would two channels suffice? I suspect that Hyper Threading simply improved Nehalem's efficiency not necessarily its need for more data. The three-channel memory controller is probably far more important for servers and will be especially useful in the upcoming 8-core version of Nehalem due out sometime next year. To find out we simply benchmarked Nehalem in a handful of applications with a 4GB/dual channel configuration and a 6GB/triple-channel configuration. Note that none of these tests actually used more than 4GB of memory so the size difference doesn't matter, we kept memory timings the same between all tests.

  Dual Channel DDR3-1066 (9-9-9-20) Triple Channel DDR3-1066 (9-9-9-20)
Memory Tests - Everest v1547    
Read Bandwidth 12859 MB/s 13423 MB/s
Write Bandwidth 12410 MB/s 12401 MB/s
Copy Bandwidth 16474 MB/s 18074 MB/s
Latency 37.2 ns 44.2 ns
Cinebench R10 (Multi-threaded test) 18499 18458
x264 HD Encoding Test (First Pass / Second Pass) 83.8 fps / 30.3 fps 85.3 fps / 30.3 fps
WinRAR 3.80 - 602MB Folder 118 seconds 117 seconds
PCMark Vantage 7438 7490
Vantage - Memories 6753 6712
Vantage - TV and Movies 5601 5637
Vantage - Gaming 10202 9849
Vantage - Music 5378 4593
Vantage - Communications 6671 6422
Vantage - Productivity 7589 7676
WinRAR (Built in Benchmark) 3283 3306
Nero Recode - Office Space - 7.55GB 131 seconds 130 seconds
SuperPI - 32M (mins:seconds) 11:55 11:52
Far Cry 2 - Ranch Medium (1680 x 1050) 62.1 fps 62.4 fps
Age of Conan - 1680 x 1050 51.5 fps 51.1 fps
Company of Heroes - 1680 x 1050 136.6 fps 133.6 fps

 

At DDR3-1066 speeds we found no real performance difference between the Core i7-965 running in two channel vs. three channel mode, the added bandwidth is simply not useful for most desktop applications. For some reason we were able to get better latency scores on the dual-channel configuration, but there's a good chance that may be due to the early nature of BIOSes on these boards. In benchmarks were the latency difference was noticeable we saw the dual-channel configuration pull ahead slightly, then in other tests where the added bandwidth helped we saw the triple-channel configuration do better. Honestly, it's mostly a wash between the two.

Our recommendation would be to stick with three channels, but if you have existing memory and can't populate the third channel yet it's not a huge deal, really, two is fine here for the time being.

Nehalem's Weakness: Cache What about the Impact of DDR3 Speeds?
Comments Locked

73 Comments

View All Comments

  • Gary Key - Monday, November 3, 2008 - link

    "The 920 to 3.6/3.8 is a nice overclock but I wonder what you mean by proper cooling and how close you came to crossing the 80C "boundary"?"

    It was actually quite easy to do with the retail cooler, in fact in our multi-task test playing back a BD title while encoding a BD title, the core temps hit 98C. Cinebench multi-core test and OCCT both had the core temps hit 100C at various points. Our tests were in a closed case loaded out with a couple of HD4870 cards, two optical drives, three hard drives, and two case fans.

    Proper cooling (something we will cover shortly) consisted of the Thermalright Xtreme120, Vigor Monsoon II, and Cooler Master V8 along with the Freezone Elite. We were able to keep temps under 70C with a full load on air and around 45C with the Freezone unit.
  • Th3Eagle - Tuesday, November 4, 2008 - link

    Wow, thats interesting. Can't wait to see the new article. Always nice to see an article about coolers.

    Thanks for the reply.
  • Anand Lal Shimpi - Monday, November 3, 2008 - link

    Gary did the i7-920 tests so I'll let him chime in there, we're also working on an overclocking guide that should help address some of these concerns.

    -A
  • whatthehey - Monday, November 3, 2008 - link

    Tom's? You might as well reference HardOCP....

    Okay, THG sometimes gets things right, but I've seen far too many "expose" articles where they talk about the end of the world to take them seriously. Ever since the i820 chipset fiasco, they seem to think everything is a big deal that needs a whistle blower.

    Anandtech got 3.8GHz with an i7-920, and I would assume due diligence in performance testing (i.e. it's not just POSTing, but actually running benchmarks and showing a performance improvement). I'm still running an overclocked Q6600, though, and the 3.6GHz I've hit is really far more than I need most of the time. I should probalby run at 3.0GHz and shave 50-100W from my power use instead. But it's winter now, and with snow outside it's nice to have a little space heater by my feet!
  • The0ne - Monday, November 3, 2008 - link

    TomHardware and Anandtech were the one websites I visited 13 years ago during my college years. Tom's has since been pushed far down the list of "to visit sites" mainly due to their poor articles and their ad littered, poorly designed website. If you have any type of no-script enable there's quite a bit to enable to have the website working. The video commentary is a joke as they're not professionals to get the job done professionally...visually anyhow.

    Anandtech has stayed true to it's root and although I find some articles a bit confusing I don't mind them at all. Example of this are camera reviews :)
  • GaryJohnson - Monday, November 3, 2008 - link

    Geez, calling a core 2 a space heater. How soon we forget prescott...
  • JarredWalton - Monday, November 3, 2008 - link

    I think overclocked Core 2 Quad is still very capable of rating as a space heater. The chips can easily use upwards of 150W when overclocked, which if memory serves is far more than any of the Prescott chips did. After all, we didn't see 1000W PSUs back in the Prescott era, and in fact I had a 350W PSU running a Pentium D 920 at 3.4 GHz without any trouble. :-)
  • Griswold - Tuesday, November 4, 2008 - link

    Funny comparison. If it was just for the space heater arguments sake (well, 150W is by far not enough to qualify as a real space heater to be honest), I could follow you but saying the 150W of a 4 core, more-IPC-than-any-P4-can-ever-dream-of, processor should or could be compared to the wattage of the infamous thermonuclear furnace AKA prescott, is a bit of a long stretch, dont you think? :p
  • Ryan Smith - Monday, November 3, 2008 - link

    Intel can call it supercalifragilisticexpialidocious until they're blue in the face, but take it from a local, it's Neh-Hay-Lem. Just see how it's pronounced in this news segment:

    http://www.katu.com/outdoors/3902731.html?video=YH...">http://www.katu.com/outdoors/3902731.html?video=YH...
  • mjrpes3 - Monday, November 3, 2008 - link

    Any chance we'll see some database/apache benchmarks based on Nehalem soon?

Log in

Don't have an account? Sign up now