A Quick Path to Memory

Our investigation begins with the most visibly changed part of Nehalem's architecture: the memory subsystem. Nehalem implements a very Phenom-like memory hierarchy consisting of small, fast individual L1 and L2 caches for each of its four cores and then a single, larger shared L3 cache feeding the entire chip.

 

Nehalem's L1 cache, despite being seemingly unchanged from Penryn, does grow in latency; it now takes 4 cycles to access vs. 3. The L2 cache is now only 256KB per core instead of being 24x the size in Penryn and thus can be accessed in only 11 cycles down from 15 (Penryn added an additional clock cycle over Conroe to access L2).

 CPU / CPU-Z Latency L1 Cache L2 Cache L3 Cache
Nehalem (2.66GHz) 4 cycles 11 cycles 39 cycles
Core 2 Quad Q9450 - Penryn - (2.66GHz) 3 cycles 15 cycles N/A

 

The L3 cache is quite possibly the most impressive, requiring only 39 cycles to access at 2.66GHz. The L3 cache is a very large 8MB cache, 4x the size of Phenom's L3, yet it can be accessed much faster. In our testing we found that Phenom's L3 cache takes a similar 43 cycles to access but at much lower clock speeds (2.0GHz). If we put these numbers into relative terms it takes 21.5 ns to get a request back from Phenom's L3 vs. 14.6 ns with Nehalem's - that's nearly 50% longer in Phenom.

While Intel did a lot of tinkering with Nehalem's caches, the inclusion of a multi-channel on-die DDR3 memory controller was the most apparent change. AMD has been using an integrated memory controller (IMC) since 2003 on its K8 based microprocessors and for years Intel has resisted doing the same, citing complexities in choosing what memory to support among other reasons for why it didn't follow in AMD's footsteps.

With clock speeds increasing and up to 8 cores (including GPUs) making their way into Nehalem based CPUs in the coming year, the time to narrow the memory gap is upon us. You can already tell that Nehalem was designed to mask the distance between the individual CPU cores and main memory with its cache design, and the IMC is a further extension of the philosophy.

The motherboard implementation of our 2.66GHz system needed some work so our memory bandwidth/latency numbers on it were way off (slower than Core 2), luckily we had another platform at our disposal running at 2.93GHz which was working perfectly. We turned to Everest Ultimate 4.50 to give us memory bandwidth and latency numbers from Nehalem.

Note that these figures are from a completely untuned motherboard and are using DDR3-1066 (dual-channel on the Core 2 system and triple-channel on the Nehalem system):

 CPU / Everest Ultimate 4.50 Memory Read Memory Write Memory Copy Memory Latency
Nehalem (2.93GHz) 13.1 GB/s 12.7 GB/s 12.0 GB/s 46.9 ns
Core 2 Extreme QX9650 - Penryn - (3.00GHz) 7.6 GB/s 7.1 GB/s 6.9 GB/s 66.7 ns

 

Memory accesses on Conroe/Penryn were quick due to Intel's very aggressive prefetchers, memory accesses on Nehalem are just plain fast. Nehalem takes a little over 2/3 the time to complete a memory request as Penryn, and although we didn't have time to run comparable Phenom numbers I believe Nehalem's DDR3 memory controller is faster than Phenom's DDR2 controller.

Memory bandwidth is obviously greater with three DDR3 channels, Everest measured around a 70% increase in read bandwidth. While we don't have the memory bandwidth figures here, Gary measured a 10% difference in WinRAR performance (a test that's highly influenced by memory bandwidth and latency) between single-channel and triple-channel Nehalem configurations.

While we didn't really expect Intel to somehow do wrong with Nehalem's memory architecture, it's important to point out that it is very well implemented. Intel managed to change the cache structure and introduce an integrated memory controller while making both significantly faster than what AMD managed despite a four-year headstart.

In short: Nehalem can get data out of memory quick like bunnies.

The Return of Hyper Threading Nehalem's Media Encoding Performance
POST A COMMENT

108 Comments

View All Comments

  • Poepstamper - Thursday, June 05, 2008 - link

    im not a fanboy but i like AMD better,i dont like big corporations anyways.
    but im pretty worried if AMD has no answer to this,then we would have to pay lots more for a processor.
    Reply
  • Genx87 - Thursday, June 05, 2008 - link

    Being an AMD fan and sometimes fanboi over the past 12 years. My last major game rig build was a Core 2 Duo 6600. I did an upgrade 3 weeks ago with an E8400. I built a new computer for a friend who has had AMD chips since 1999 with an E7200.

    AMD needs to start making a show.
    Reply
  • NullSubroutine - Thursday, June 05, 2008 - link

    First off, I am not a fan of either company, just to get that out of the way.

    You do realize that Nehalem is not or will not be a mainstream product for quite some time into 2009. Enthusiast may get a few chips in limited quantities, probably in the $1500+ range. Otherwise this is designed to be a high end Server processor. It will take some time for it to trickle down to be something most average people will buy and use.

    Intel is hitting back the same way AMD hit at Intel back with the K8. Making a great scaleable high performance server chip and letting it trickle its way down to the mainstream market.

    Trying to compare Nehalem to any AMD processor (or even most Core2/quad) is like trying to compare a Chevy Mailibu to a Formula 1 race car, its just not the same thing.

    What is exciting about Nehalem right now is the technological advancement of some of the stuff Intel has done, and the happiness that it will one day be availble mainstream.

    AMD is not going to be put in a bad position (other than the one its already in) in the mainstream desktop market with Nehalem - not for probably near another year. It will hurt it in the Server market, but at first Intel wont have many of these chips availible, so AMD will have a minor chance with a Shanghai or Bulldozer core - if they can actually execute a launch.

    AMD is also not trying to stay equal with Intel, it doesnt have the resources to do it. You are likely, in any near time frame, going to see AMD come out and just PWN Intel in performance numbers. You will see AMD put together what they call a good 'platform' meaning. You can buy your whole platform, MB/CPU/GPU/etc from AMD and it will be a solid platform.

    It's not going to win bragging rights to a bunch of 'nerds' running gaming websites claming how AMD sucks so much. You will probably see that actually, people saying how 1500 dollar processor pwns some 200 dollar one. AMD isnt currently trying to win performance crowns or win over enthusiast that spend boat loads of money on a CPU (or GPU) they are trying to push the mainstream market, which actually has the largest number of people to sell to. However I am sure they would like to keep their server side doing well (it makes good margins).

    I don't think you are going to see an AMD come back to any pure performance crowns. You may some crowns for price/performance/power for the whole platform.
    Reply
  • NullSubroutine - Thursday, June 05, 2008 - link

    supposed to say You are NOT likely to see AMD come PWN Intel...

    And you could compare 8 series Opteron to Nehalem...
    Reply
  • AmberClad - Thursday, June 05, 2008 - link

    That picture of the socket -- I only recall a single board with that colored PCB in the INQ's coverage of the wall of Nehalem boards. Maybe that picture is giving away more than intended, as far as the identity of the company that provided the sample? (I suppose it's possible that whoever leaked the Nehalem sample isn't the same person that provided the motherboard.) Reply
  • RaynorWolfcastle - Thursday, June 05, 2008 - link

    These benches are mighty impressive for such an immature platform. There had better be some serious performance and clock speed bumps in store for AMD's K10.5 or they will be dead in the water.

    Also, is there any indication as to when Intel will start transitioning Nehalem to the mobile space? I have a 1st gen (Core Duo) MacBook Pro that's getting a little long in the tooth and I'm debating whether to jump on the Montevina train or wait for Nehalem mobile. I'd love to get a mobile Nehalem if it launches any time in 1H09.
    Reply
  • emboss - Friday, June 06, 2008 - link

    Non-Extremely-Expensive-Edition single-socket Nehalems now aren't coming until sometime in 2H09, so you'll probably be lucky to see any mobile Nehalems in 2009 at all.

    As such, I'd say Intel failed to tick on time. Conroe hit mainstream July 06 (eg: E6400). We should be getting mainstream Nehalems in 1 month, not 12+ months.

    Then again, Intel has been futzing around with the release dates quite a bit, so it may get pulled forward.
    Reply
  • piroroadkill - Thursday, June 05, 2008 - link

    Because Nehalem is frankly so much faster than the already rapid Core 2, that as already said, AMD is going to be struggling for a long time to come.

    Unless some miracle occurs, all I know is right now I want Nehalem.
    Reply
  • TonyB - Thursday, June 05, 2008 - link

    can it play crysis Reply
  • PeteRoy - Thursday, June 05, 2008 - link

    Exactly, where are the gaming performance? It's the first thing I care about by a long way.

    I don't do all the other stuff you benched on my PC.
    Reply

Log in

Don't have an account? Sign up now