Understanding Nehalem's Memory Architecture

Nehalem does spice things up a bit in the memory department, not only does it have an integrated memory controller (a first for an x86 Intel CPU) but the memory controller in question has an unusual three-channel configuration. All other AMD and Intel systems use dual channel DDR2 or DDR3 memory controllers; with each channel being 64-bits wide, you have to install memory in pairs for peak performance.

With a three-channel DDR3 memory controller, Nehalem requires the use of three DDR3 modules to achieve peak bandwidth - which also means that the memory manufacturers are going to be selling special 3-channel DDR3 kits made specifically for Nehalem. Motherboard makers will be doing one of two things to implement Nehalem's three-channel memory interface on boards; you'll either see boards with four DIMM slots or boards with six:


Four DDR3 slots, three DDR3 channels

In the four-slot configuration the first three slots correspond to the first three channels, the fourth slot is simply sharing one of the memory channels. The downside to this approach is that your memory bandwidth drops to single-channel performance as you start filling up your memory. For example, if you have 4 x 1GB sticks, the first 3GB of memory will be interleaved between the three memory channels and you'll get 25.6GB/s of bandwidth to data stored in the first 3GB. The final 1GB however won't be interleaved and you'll only get 8.5GB/s of bandwidth to it. Despite the unbalanced nature of memory bandwidth in this case, your aggregate bandwidth is still greater in this configuration than a dual-channel setup.

 


Six DDR3 slots, two slots per DDR3 channel

The more common arrangement will be six DIMM slots where each DDR3 channel is connected to a pair of DIMM slots. In this configuration as long as you install DIMMs in triplicate you'll always get the full 25.6GB/s of memory bandwidth.

That discussion is entirely theoretical however, the real question is: does Nehalem's triple-channel memory controller actually matter or would two channels suffice? I suspect that Hyper Threading simply improved Nehalem's efficiency not necessarily its need for more data. The three-channel memory controller is probably far more important for servers and will be especially useful in the upcoming 8-core version of Nehalem due out sometime next year. To find out we simply benchmarked Nehalem in a handful of applications with a 4GB/dual channel configuration and a 6GB/triple-channel configuration. Note that none of these tests actually used more than 4GB of memory so the size difference doesn't matter, we kept memory timings the same between all tests.

  Dual Channel DDR3-1066 (9-9-9-20) Triple Channel DDR3-1066 (9-9-9-20)
Memory Tests - Everest v1547    
Read Bandwidth 12859 MB/s 13423 MB/s
Write Bandwidth 12410 MB/s 12401 MB/s
Copy Bandwidth 16474 MB/s 18074 MB/s
Latency 37.2 ns 44.2 ns
Cinebench R10 (Multi-threaded test) 18499 18458
x264 HD Encoding Test (First Pass / Second Pass) 83.8 fps / 30.3 fps 85.3 fps / 30.3 fps
WinRAR 3.80 - 602MB Folder 118 seconds 117 seconds
PCMark Vantage 7438 7490
Vantage - Memories 6753 6712
Vantage - TV and Movies 5601 5637
Vantage - Gaming 10202 9849
Vantage - Music 5378 4593
Vantage - Communications 6671 6422
Vantage - Productivity 7589 7676
WinRAR (Built in Benchmark) 3283 3306
Nero Recode - Office Space - 7.55GB 131 seconds 130 seconds
SuperPI - 32M (mins:seconds) 11:55 11:52
Far Cry 2 - Ranch Medium (1680 x 1050) 62.1 fps 62.4 fps
Age of Conan - 1680 x 1050 51.5 fps 51.1 fps
Company of Heroes - 1680 x 1050 136.6 fps 133.6 fps

 

At DDR3-1066 speeds we found no real performance difference between the Core i7-965 running in two channel vs. three channel mode, the added bandwidth is simply not useful for most desktop applications. For some reason we were able to get better latency scores on the dual-channel configuration, but there's a good chance that may be due to the early nature of BIOSes on these boards. In benchmarks were the latency difference was noticeable we saw the dual-channel configuration pull ahead slightly, then in other tests where the added bandwidth helped we saw the triple-channel configuration do better. Honestly, it's mostly a wash between the two.

Our recommendation would be to stick with three channels, but if you have existing memory and can't populate the third channel yet it's not a huge deal, really, two is fine here for the time being.

Nehalem's Weakness: Cache What about the Impact of DDR3 Speeds?
Comments Locked

73 Comments

View All Comments

  • Clauzii - Thursday, November 6, 2008 - link

    I still use PS/2. None of the USB keyboards I've borrowed or tried out would work in 'boot'. Also I think a PS/2 keyboard/mouse don't lag so much, maybe because it has it's own non-shared interrupt line.

    But I can see a problem with PS/2 in the future, with keyboards like the Art Lebedev ones. When that technology gets more pocket friendly I'd gladly like to see upgraded but still dedicated keyboard/mouse connectors.
  • The0ne - Monday, November 3, 2008 - link

    Yes. I have the PS2 keyboard on-hand in case my USB keyboard can't get in :)
  • Strid - Monday, November 3, 2008 - link

    Ahh, makes sense. Thanks for clarifying!
  • Genx87 - Monday, November 3, 2008 - link

    After living through the hell that were ATI drivers back in 2003-2004 on a 9600 Pro AIW. I didnt learn and I plopped money down on a 4850 and have had terrible driver quality since. More BSOD from the ati driver than I have had in windows in the past 5 years combined from anything. Back to Nvidia for me when I get a chance.

    That said this review is pretty much what I expected after reading the preview article in August. They are really trying to recapture market in the 4 socket space. A place where AMD has been able to do well. This chip is designed for server work. Ill pick one up after my E8400 runs out of steam.
  • Griswold - Tuesday, November 4, 2008 - link

    You're just not clever enough to setup your system properly. I have two indentical systems sitting here side by side with the only difference being the video card (HD3870 in one and a 8800GT in the other) and the box with the nvidia cards gives me order of magnitude more headaches due to crashing driver. While that also happens on the 3870 machine now and then, its nowehere nearly as often. But the best part: none of the produces a BSOD. That is why I know you're most likely the culprit (the alternative is faulty hardware or a pathetic overclock).
  • Lord 666 - Monday, November 3, 2008 - link

    The stock speed of a Q9550 is 2.83ghz, not 2.66qhz.

    Why the handicap?
  • Anand Lal Shimpi - Monday, November 3, 2008 - link

    My mistake, it was a Q9450 that was used. The Q9550 label was from an earlier version of the spreadsheet that got canned due to time constraints. I wanted a clock-for-clock comparison with the i7-920 which runs at 2.66GHz.

    Take care,
    Anand
  • faxon - Monday, November 3, 2008 - link

    toms hardware published an article detailing that there would be a cap on how high you are allowed to clock your part before it would downclock it back to stock. since this is an integrated par of the core, you can only turn it off/up/down if they unlock it. the limit was supposedly a 130watt thermal dissipation mark. what effect did this have in your tests on overclocking the 920?
  • Gary Key - Monday, November 3, 2008 - link

    We have not had any problems clocking our 920 to the 3.6GHz~3.8GHz level with proper cooling. The 920, 940, and 965 will all clock down as core temps increase above the 80C level. We noticed half step decreases above 80C or so and watched our core multipliers throttle down to as low as 5.5 when core temps exceeded 90C and then increase back to normal as temperatures were lowered.

    This occurred with stock voltages or with the VCore set to 1.5V, it was dependent on thermals, not voltages or clock speeds in our tests. That said, I am still running a battery of tests on the 920 right now, but I have not seen an artificial cap yet. That does not mean it might not exist, just that we have not triggered it yet.

    I will try the 920 on the Intel board that Toms used this morning to see if it operates any differently than the ASUS and MSI boards.
  • Th3Eagle - Monday, November 3, 2008 - link

    I wonder how close you came to those temperatures while overclocking these processors.

    The 920 to 3.6/3.8 is a nice overclock but I wonder what you mean by proper cooling and how close you came to crossing the 80C "boundary"?

Log in

Don't have an account? Sign up now