Cache and Memory Controller Comparison

Now that you know what parts to compare, let's drill a little deeper. Since cache is a major element separating Phenom II from its precessor, let's start there.

Phenom II, like its predecessor, maintains a 3-cycle 64KB L1 cache. With Nehalem, Intel had to move to a 4-cycle cache, so Phenom II retains the hit rate and performance benefits of a larger, faster L1. The L2 cache latency is where Phenom II and Intel’s architectures really differ.

Phenom II, like the original, has a 512KB L2 cache per core, but the cache is a high latency 15 cycle cache. Compared to the Athlon X2’s 20 cycle L2, Phenom II looks pretty good, but now look at Penryn. Penryn’s 15 cycle L2 is the same speed as Phenom’s L2, but it’s 2-6x larger. Core i7 trumps them all with a very fast 11 cycle L2, although it achieves this by having the smallest L2 cache per core out of the bunch - only 256KB in size.

AMD asserts that Phenom II’s L3 cache is now 2-cycles faster than Phenom’s L3. At 3x the size but with improved access time, Phenom II’s L3 is closer to where it should have been in the first place. Everest measures Phenom II’s L3 as having a 55-cycle latency, while Core i7 has a 35 cycle L3. Sandra puts Core i7 and the original Phenom at 55 cycles, but Phenom II at 71 cycles. I checked with Intel and AMD, and it appears neither application is reporting the correct L3 access latencies for either processor. Intel confirmed Core i7’s L3 as a 42 cycle L3 and I’m still waiting to hear back from AMD on the time to access its cache, but I suspect it will be around 50 cycles.

Processor L1 Latency L2 Latency L3 Latency
AMD Phenom II X4 920 (2.80GHz) 3 cycles 15 cycles AMD won't tell me
AMD Phenom @ 2.8GHz 3 cycles 15 cycles AMD won't tell me
Athlon X2 5400 (2.80GHz) 3 cycles 20 cycles -
Intel Core 2 Quad QX9770 (3.2GHz) 3 cycles 15 cycles -
Intel Core 2 Quad Q9400 (2.66GHz) 3 cycles 15 cycles -
Intel Core i7-965 (3.2GHz) 4 cycles 11 cycles 42 cycles

Main memory access time is more telling. A trip down memory lane will cost you 107 ns on an original Phenom processor, 100 ns on an Athlon X2, and now only 95 ns on a Phenom II. The 11% improvement in memory access performance is due to improvements AMD made when it redesigned the memory controller to include support for DDR3.

L2: It’s the New L1

I think I finally get it. When Nehalem launched I spoke with lead architect Ronak Singal at great length about its L2 cache being too small. I even made this graph to illustrate my point:


Click to Enlarge

With only 256KB per core, Core i7’s L2 cache was a large step back. Ronak argued that its 11-cycle load latency was more important than size. But it took Phenom II for me to understand why.

The original Phenom suffered because not only did it have very little L2 cache per core (512KB compared to as much as 6MB with Penryn), but it also had a very small L3 cache. Four cores sharing a 2MB L3 cache just wasn’t enough. The problem is AMD was die constrained; Phenom needed more L3 cache but AMD needed to keep the die size manageable to avoid bankruptcy. Architecturally, Phenom was ahead of its time.

If we were to live in the dual-core era forever, Intel had the right design - two cores could easily sit behind one large shared L2 cache. Move to four cores and the shared L2 design stops making sense. In some situations you’ll have cores operating on independent threads with no spatial locality, and for these scenarios each core will need its own L2 cache. In other scenarios you’ll have multiple cores working on the same data, in which case you’ll need a large cache shared by all cores. Again, Phenom was the right quad-core design, it just didn’t have enough cache (not to mention its other shortcomings).

In a way, Intel recognized that Conroe and Penryn were designed to win the dual-core race - over the life of both CPUs less than 5% of its desktop shipments were quad-core chips. Intel’s last tick and tock dominated the dual-core market. Nehalem and Westmere on the other hand are more interested in winning the multi-core races.

Phenom II addresses the cache deficiency. With a 6MB L3 cache, it nearly has the same size L3 as Core i7. The L2 caches remain larger at 512KB per core but I suspect that’s because AMD didn’t have the time/resources to redesign its cores for Phenom II. It takes 15 cycles to access AMD’s 512KB L2; that’s the same amount of time it takes to access Penryn’s 2x6MB L2. I’ll gladly wait 15 cycles if I have the hit rate of a 6MB cache, but not for a 512KB cache. AMD too will pursue a faster L2, that will most likely come in 2011 with Bulldozer (Orochi and Llano CPUs).

With a very large L3 cache, it no longer makes sense to have a large L2. Instead the L2 needs to be as fast as possible, acting as spillover from L1. Look at what happened to L1 cache sizes as CPUs got wider and faster. The L1 cache grew from 1KB, 8KB, 16KB and eventually up to 32 and 64KB in today’s designs. However L1 sizes haven’t increased beyond that point; instead we saw L2 caches grow and grow. Eventually they too hit a stopping point; for AMD that was Phenom, and for Intel that was Core i7.

With the number of cores growing, we need a large cache shared between all of the cores. Imagine a 12-core processor; would it have a massive 36MB shared L2 cache? Definitely not. It’d be too slow for starters, and the penalty for not finding something in L1 would be tremendous. Remember the point of the memory hierarchy: to hide latency between the software and the processor. A pyramid doesn’t work if the base fattens out too quickly. In the future, as we move to four, eight and more cores, L2 caches will have to be motherly figures to a core’s L1, feeding them individually, rather than a mess hall to feed everyone. That role will fall to the L3 cache.Carrying that further, we may even see future CPUs with more cores add a forth level of cache.

With the role of the L2 cache redefined from being service-all to a service-one, it makes sense for it to be small and fast. The original Phenom had the right idea, it needed a larger L3. Core i7 perfected that idea, and Phenom II took a step towards that. Cache sizes must continue to grow, but as they do, the number of levels of cache must increase as well to avoid a single, large penalty being paid as you go from one level of cache to the next.

Clock for Clock, Still Slower than Core 2 & Core i7 Finally, Cool 'n' Quiet You Can Use
POST A COMMENT

93 Comments

View All Comments

  • Walkeer - Thursday, October 15, 2009 - link

    Super, so because MS Vista has a really bad and stupid CPU scheduler, AMD had to disable perfectly legit and smart power saving feature = CnC per core rather than per chip. I really love windows! I expect that CnC per-core caused no problems under linux for example.... Reply
  • CuE0083 - Sunday, April 26, 2009 - link

    I have been a reader of this site for a few years (first time commenting) and I just wanted to know how you guys determine that a particular processor is a good overclocker.

    1) Do you guys try overclocking multiple chips?
    2) Do you just walk into the store, pick a random chip, and try overclocking it?
    3) Or does AMD send you a chip?
    Reply
  • v12v12 - Thursday, July 23, 2009 - link

    All this bickering and nick picking—when to me the solution seems simple.

    All the poor folks clamoring about numbers they COULD NOT EVER POSSIBLY tell the difference if using Intel Vs AMD in a dboule-blind test! None of you can tell the measurable diffs in FPS and temp. It's all little programs with numbers telling you there's a difference. So wtf is all the fus about?

    Phenom-II is for people that already have an old AM2 rig and want to upgrade. But you forget that your old, slow ass mobo chipset and antiquated ram wouldn't even come close of a newer Intel system period.
    A Brand NEW Phenom-II would "compete," but it barely does that. And as prices drop Phenom-II is losing even more ground as someone with an intel 775 can spring for a fast Quad-core, while you're stuck with the SAME OLD MOBO and RAM DERRRRR?
    Stop all the nit-picking and bemoaning over Intel.

    Does it make sense to scrap your current AMD rig for a completely new Intel unit?

    YES = If you're doing video/AV editing and plan on getting an i7/i5 or if you’re not broke!

    NO = If you currently have an AMD and need some extra horse-power.

    But to falsely rationalize your purchase/mindset by suddenly putting the i7 into the "it's SO expensive" BS category; you're BROKE, you have no say about price. Get a real job and stop spending money on other nonsense and SAVE up like smart people do. It's YOUR own fault you cannot afford a damn $1100-1400 computer: that's NOT a lot. Just b/c YOU cannot afford it doesn't mean there's something "wrong" with i7.

    You're comparing a 2yr old Q6600 against AMD's newest unit LOL? That's like a car magazine comparing the newest lambo to a 2 year old Ferrari etc. BUT PRICE OMG... Prices steadily go DOWN, thus folks with 775 can still upgrade to 6700, 6800 and so forth.

    I'm glad AMD is "sort of" showing a rally to CATCH UP... BUT... when you buy into INTEL you're buying into a PROVEN ROADMAP OF PERFORMANCE VS AMD: you're buying into a mystery grab-bag of performance PROMISES.

    Geesh. Just get the Phenom-II if you cannot afford the i7. Nobody with sense is talking about going from a Q6600/9xxx to 2 year behind the pack Phenom. This is just sophomoric nonsense.

    Common-sense would tell you:

    1) GET A BETTER JOB (education/certs etc)

    2) Stop spending money on other hobbies and misc junk

    3) STFU already and improve your financial situation, THEN you have a say. It's YOUR fault you don't have enough for a paltry $1200 machine. WHO doesn't have $1200? If you don't you haven't EARNED the right to complain. Complain b/c it's someone else's fault - I'm betting it's mostly your own lack of saving & discipline that's the problem.

    None of you may like or agree w/me, but guess what? I don't care b/c I HAVE $1200 to spend so Fsck it I'm happy. Stop drinking, doing drugs, going out, blowing money on cable-TV and crap, for a change? Most of you are guilty of 1 or more of these frivolities.

    Honestly THINK about what you’re saying here? You’re complaining about a superior i7 that is too expensive to do WHAT— play some damn video games? So your rationale is to do what? Buy a new or CPU upgrade to do the same? So THUS instead of continually saving to get the best… You BLOW your loads for inferior technology… and so the cycle continues. You’re NOW BROKE AGAIN and behind. Maybe you’ll start saving once again and come out of the wood work 2-3yrs later and STILL be complain once again “OMG it’s TOO EXPENSIVE” “I’ll by the cheap crap instead!”

    LMFAO NOW THAT IS Ludicrous!
    Reply
  • goofbud - Tuesday, December 06, 2011 - link

    Are you serious dude?

    It ain't the money. I know. I have money. I also have a lambo a porche and an evo. I like testing AMD because they give us "certified" techs something to tinker with and work on. AMD is a brand for builders and true techs like to tinker with a processor and see how far it can go. Even when I was in high school I owned 486's which were the latest and greatest that time. I had an INTEL PC and it sucked dirt once Microsoft came out with windows. Maybe Intel is ahead now but AMD is catching up. They can create the ultimate processor but they don't have to. Not yet.

    BTW, watch how you talk. Be considerate. It ain't the money man. I can afford to buy as many alienware pc's I want. But I don't. Am I a gamer? Yes! I have a powerful system now and am happy I did not spend a lot of money on it. See, this is the thing. If you are smart you just don't want to buy the fastest CPU and fastest RAM that comes out. It's like buying an PS3 for $6,000.00 on ebay just because you want to be the first to play it. That is stupid.

    People buy AMD because they are tweakable. They try to buy the cheapest parts out there, tweak it, and see how far it can go. Makes sense?

    So what if you have the fastest computer in the world. If you don't use it everyday you just wasted money.

    Understand now kid. Now STFU and Go to your room!
    Reply
  • sandstones - Wednesday, March 25, 2009 - link

    I know that we should look at relative sysmark scores, but I'm still puzzled by the higher scores in this batch of tests, compared to those done in April 2008.

    For example the top performer from April - Core 2 Duo E8400 got a score of 161 on Overall in April 2008, and 191 in Jan 2009. The X4 Phenom 9750 went from 126 to 148. Other CPU's in both tests had similar differences. That's a bigger percentage difference than what gets used to debate whether Intel or AMD is better.

    Anand - any comments on what caused such a large difference?
    Reply
  • Amitjakhar - Friday, February 20, 2009 - link

    http://www.overclockersclub.com/reviews/phenomii94...">http://www.overclockersclub.com/reviews/phenomii94...
    After overclocking it really comes near and sometime it gets better performance them Core i7. Which is good. AMD has done superb job and they are in the right direction. Next black edition will make Intel so worry they have to go to work again.
    Reply
  • Amitjakhar - Friday, February 20, 2009 - link

    Phenom II is showing power much better then here. To me it seems they have not done the testing properly. You better check out this link and find how its performing genuinely
    http://www.guru3d.com/article/amd-phenom-ii-x4-920...">http://www.guru3d.com/article/amd-phenom-ii-x4-920...
    Reply
  • salem80 - Tuesday, January 27, 2009 - link

    The Q9400 are 126W~174W not like what Intel said 95W ?
    even E8600 (124W~157W) while they say 65W ?
    their huge deferent in numbers here .
    Reply
  • pcuser123 - Saturday, January 24, 2009 - link

    I think the new i7 core sucks compare Phenom II. Just look at the pricing vs performance on those two.
    Here is the benchmarks http://www.overclockersclub.com/reviews/phenomii94...">http://www.overclockersclub.com/reviews/phenomii94...
    Reply
  • gipper - Monday, January 19, 2009 - link

    You do the overclocks but don't show us the results? Following overclocking, those stock processors have WIDELY different capabilities.

    I'd love to see those video encode charts redone with the overclocked processors. That would tell me the TRUE value of the 64x2BE, C2D, Phenom, PhenomII, and i7 relative to one another.

    Otherwise, your overclock information borders on worthless.
    Reply

Log in

Don't have an account? Sign up now