A Quick Path to Memory

Our investigation begins with the most visibly changed part of Nehalem's architecture: the memory subsystem. Nehalem implements a very Phenom-like memory hierarchy consisting of small, fast individual L1 and L2 caches for each of its four cores and then a single, larger shared L3 cache feeding the entire chip.

 

Nehalem's L1 cache, despite being seemingly unchanged from Penryn, does grow in latency; it now takes 4 cycles to access vs. 3. The L2 cache is now only 256KB per core instead of being 24x the size in Penryn and thus can be accessed in only 11 cycles down from 15 (Penryn added an additional clock cycle over Conroe to access L2).

 CPU / CPU-Z Latency L1 Cache L2 Cache L3 Cache
Nehalem (2.66GHz) 4 cycles 11 cycles 39 cycles
Core 2 Quad Q9450 - Penryn - (2.66GHz) 3 cycles 15 cycles N/A

 

The L3 cache is quite possibly the most impressive, requiring only 39 cycles to access at 2.66GHz. The L3 cache is a very large 8MB cache, 4x the size of Phenom's L3, yet it can be accessed much faster. In our testing we found that Phenom's L3 cache takes a similar 43 cycles to access but at much lower clock speeds (2.0GHz). If we put these numbers into relative terms it takes 21.5 ns to get a request back from Phenom's L3 vs. 14.6 ns with Nehalem's - that's nearly 50% longer in Phenom.

While Intel did a lot of tinkering with Nehalem's caches, the inclusion of a multi-channel on-die DDR3 memory controller was the most apparent change. AMD has been using an integrated memory controller (IMC) since 2003 on its K8 based microprocessors and for years Intel has resisted doing the same, citing complexities in choosing what memory to support among other reasons for why it didn't follow in AMD's footsteps.

With clock speeds increasing and up to 8 cores (including GPUs) making their way into Nehalem based CPUs in the coming year, the time to narrow the memory gap is upon us. You can already tell that Nehalem was designed to mask the distance between the individual CPU cores and main memory with its cache design, and the IMC is a further extension of the philosophy.

The motherboard implementation of our 2.66GHz system needed some work so our memory bandwidth/latency numbers on it were way off (slower than Core 2), luckily we had another platform at our disposal running at 2.93GHz which was working perfectly. We turned to Everest Ultimate 4.50 to give us memory bandwidth and latency numbers from Nehalem.

Note that these figures are from a completely untuned motherboard and are using DDR3-1066 (dual-channel on the Core 2 system and triple-channel on the Nehalem system):

 CPU / Everest Ultimate 4.50 Memory Read Memory Write Memory Copy Memory Latency
Nehalem (2.93GHz) 13.1 GB/s 12.7 GB/s 12.0 GB/s 46.9 ns
Core 2 Extreme QX9650 - Penryn - (3.00GHz) 7.6 GB/s 7.1 GB/s 6.9 GB/s 66.7 ns

 

Memory accesses on Conroe/Penryn were quick due to Intel's very aggressive prefetchers, memory accesses on Nehalem are just plain fast. Nehalem takes a little over 2/3 the time to complete a memory request as Penryn, and although we didn't have time to run comparable Phenom numbers I believe Nehalem's DDR3 memory controller is faster than Phenom's DDR2 controller.

Memory bandwidth is obviously greater with three DDR3 channels, Everest measured around a 70% increase in read bandwidth. While we don't have the memory bandwidth figures here, Gary measured a 10% difference in WinRAR performance (a test that's highly influenced by memory bandwidth and latency) between single-channel and triple-channel Nehalem configurations.

While we didn't really expect Intel to somehow do wrong with Nehalem's memory architecture, it's important to point out that it is very well implemented. Intel managed to change the cache structure and introduce an integrated memory controller while making both significantly faster than what AMD managed despite a four-year headstart.

In short: Nehalem can get data out of memory quick like bunnies.

The Return of Hyper Threading Nehalem's Media Encoding Performance
POST A COMMENT

108 Comments

View All Comments

  • SiliconDoc - Monday, July 28, 2008 - link

    Oh yeah, and we're getting the knocked down lesser pins version probably, though not set in stone they won't be able to resist bending us all over and making all the massive die and tool and cuting restructurings required to pump out the lesser pinned models... while they tell us "it's cost effective" ( means they can charge 18 different rates and swirl the markets in confusion and gigantic price differences for mere few percentage performance differences).
    They sure have a lot of time to diggle around with it all, don't they- and a lot of capacity, a lot of marketers, a lot of board makers/changers...
    Oh gawd it's a multi-tentacled monster... just realize they had their group megaspam session and have figured the most confusing, confounding, and master profiteering into it all. It's got nothing to do with practicality or delivering us the performance we desire. NOTHING.
    Reply
  • gochichi - Friday, June 06, 2008 - link

    Someone mentioned the breaking laws in the past (intel did).

    Just look at the distress that AMD is under. While they had the superior products, they couldn't make deals with Dell and so on. As soon as they were finally able to make deals fairly, Intel obliterated them on performance.

    So while they should have been piling up an R&D fund during their "crown years" they hardly grew. To the extent that even thought their CPUs are not competitive they are still growing in overall market share.

    I gotta balance my desire for performance now, and my ongoing desire for performance. I can't imagine how having AMD wiped out would be good for the long term. Performance is moving up surely enough but why can't we have the full rate of improvement? I mean, lets stop poluting the world with obsolete brand new equipment. I think the legal battle between Intel and AMD prevents Intel from eliminating AMD. The more they beat up on AMD, the higher the damages of their breaking the law and the higher the penalty for Intel.

    I think AMD can make a strong comeback though. They had a sloppy start with the AMD-ATI merger but ATI is actually not far at all from NVIDIA in terms of design and performance. These pendulums do swing, and perhaps AMDs chips will be better next time. I think the price-point wars are the most important. If you can deliver a nice quad-core or 3x core for about $100.00 you're gonna be in business or at least have market share.

    Reply
  • BSMonitor - Friday, June 06, 2008 - link

    Giving a company incentives to exclusively sell your products is not a violation of any law. Aka, is E.A. Sports in violation of the law by signing an exclusive contract with the NFLPA ? No. How many GM dealers sell more than GM lines of cars? Not many... There are many other reasons to be excluse besides a "monopoly deal".

    Were Dell customers complaining about not having the choice of AMD processors? Not enough of them, clearly. You think for a second Dell would lose market share for Intel? Sorry, the answer is Hell No.

    When AMD did have a strong processor lineup, they also hit manufacturing capacity walls.... Quite simply, AMD does not have the capacity to fill Intel's market share. Its not like there were AMD processors on the shelves because Dell was exclusively Intel...

    Intel has more Fabs. Fabs don't get built overnight to meet demand... Now, AMD has inferior products and a couple more Fabs... Too little too late as they say...
    Reply
  • hs635 - Tuesday, June 17, 2008 - link

    Get aids and die painfully cunt Reply
  • Justin Case - Sunday, June 08, 2008 - link

    [quote]Giving a company incentives to exclusively sell your products is not a violation of any law.[/quote]

    Actually, it is, if you control more than a certain share (typically 50%) of the market.

    You can give volume discounts but you cannot make the cost depend on what other products your client sells.

    If you're under that "critical" market share, you can do pretty much anything you want. Above it, the rules change (and there are very good reasons for that, as anyone who's studied macroeconomy knows).

    There's really no need to come up with "examples" or ill-fitting "analogies". That's just the way the law is, and everyone who studied trade law knows that (including Intel's legal department). They've already been fined in Korea, they're on their way to being fined in the EU and Japan, and they'll probably be fined in the US too.

    Unless they bribe the right people like Microsoft did, of course.
    Reply
  • SiliconDoc - Monday, July 28, 2008 - link

    I caught a couple articles on how Nvidia was hammering vendors for price structures - and how they were going to do it, a bit ahead of time of when it hit. Yeah, it hit, I saw it, eggs (hint) were broken all over the place.
    It's a kind of tyranny... lol
    Uhh, thank computers I guess, since they've made everything like that so easy to track and enforce ("private" enforcement not law enforcement)...
    Expect a lot more of it, too. Everything moves so fast in business, and courts move so slowly.
    Reply
  • The Zerg - Friday, June 06, 2008 - link

    Guys... here's an example of bad luck, bad tech or both:

    I work in a corporation. A very large one, the largest in a specific industry.
    We use Intel-based CPUs. Worldwide.
    My Centrino (in its Dell Latitude incarnation) died two days ago (causes unknown - and this caused a lot of trouble). Be sure that I had some nice words for Intel in that moment.
    I use AMD at home (it was the best bang for the buck at that time). One week ago (and Hell YES, this is the bare truth) my ASUS motherboard died, together with an Athlon 3500+.
    See? Nobody's perfect. Maybe 2 strong CPU players (makers) are better than just one. Maybe I will not use an ASUS motherboard next time, because I have another 3-4 serious options...
    For the AMD/Intel fans: I am a Canon fan, but I really respect Nikon, Leica and Sony for their outstanding products. And: I can buy a 1Ds Mark III, but I currently own a 40D - "because I can 95% of the games with it"
    And there is never too little too late for a World Press Photo award :)
    Reply
  • Barack Obama - Friday, June 06, 2008 - link

    Nehalem is looking to be beastly good. Let's see if it can combo well with Windows 7 and its multi-touch capabilities. Reply
  • Egglick - Thursday, June 05, 2008 - link

    Here is my biggest question: Will these chips work with DDR2? In my opinion, DDR3 still isn't worth the price premium by a long shot. Reply
  • coldpower27 - Friday, June 06, 2008 - link

    This shouldn't be much of an issue by the time this thing ships for mainstream platforms ala LGA1160, sometime in Early-Mid 2009.

    DDR3 is still cost prohibitive now, your looking at about 2x as much for the same amount of memory. However in 6-9 months prices can change alot.


    Reply

Log in

Don't have an account? Sign up now