Cache and Memory Controller Comparison

Now that you know what parts to compare, let's drill a little deeper. Since cache is a major element separating Phenom II from its precessor, let's start there.

Phenom II, like its predecessor, maintains a 3-cycle 64KB L1 cache. With Nehalem, Intel had to move to a 4-cycle cache, so Phenom II retains the hit rate and performance benefits of a larger, faster L1. The L2 cache latency is where Phenom II and Intel’s architectures really differ.

Phenom II, like the original, has a 512KB L2 cache per core, but the cache is a high latency 15 cycle cache. Compared to the Athlon X2’s 20 cycle L2, Phenom II looks pretty good, but now look at Penryn. Penryn’s 15 cycle L2 is the same speed as Phenom’s L2, but it’s 2-6x larger. Core i7 trumps them all with a very fast 11 cycle L2, although it achieves this by having the smallest L2 cache per core out of the bunch - only 256KB in size.

AMD asserts that Phenom II’s L3 cache is now 2-cycles faster than Phenom’s L3. At 3x the size but with improved access time, Phenom II’s L3 is closer to where it should have been in the first place. Everest measures Phenom II’s L3 as having a 55-cycle latency, while Core i7 has a 35 cycle L3. Sandra puts Core i7 and the original Phenom at 55 cycles, but Phenom II at 71 cycles. I checked with Intel and AMD, and it appears neither application is reporting the correct L3 access latencies for either processor. Intel confirmed Core i7’s L3 as a 42 cycle L3 and I’m still waiting to hear back from AMD on the time to access its cache, but I suspect it will be around 50 cycles.

Processor L1 Latency L2 Latency L3 Latency
AMD Phenom II X4 920 (2.80GHz) 3 cycles 15 cycles AMD won't tell me
AMD Phenom @ 2.8GHz 3 cycles 15 cycles AMD won't tell me
Athlon X2 5400 (2.80GHz) 3 cycles 20 cycles -
Intel Core 2 Quad QX9770 (3.2GHz) 3 cycles 15 cycles -
Intel Core 2 Quad Q9400 (2.66GHz) 3 cycles 15 cycles -
Intel Core i7-965 (3.2GHz) 4 cycles 11 cycles 42 cycles

Main memory access time is more telling. A trip down memory lane will cost you 107 ns on an original Phenom processor, 100 ns on an Athlon X2, and now only 95 ns on a Phenom II. The 11% improvement in memory access performance is due to improvements AMD made when it redesigned the memory controller to include support for DDR3.

L2: It’s the New L1

I think I finally get it. When Nehalem launched I spoke with lead architect Ronak Singal at great length about its L2 cache being too small. I even made this graph to illustrate my point:


Click to Enlarge

With only 256KB per core, Core i7’s L2 cache was a large step back. Ronak argued that its 11-cycle load latency was more important than size. But it took Phenom II for me to understand why.

The original Phenom suffered because not only did it have very little L2 cache per core (512KB compared to as much as 6MB with Penryn), but it also had a very small L3 cache. Four cores sharing a 2MB L3 cache just wasn’t enough. The problem is AMD was die constrained; Phenom needed more L3 cache but AMD needed to keep the die size manageable to avoid bankruptcy. Architecturally, Phenom was ahead of its time.

If we were to live in the dual-core era forever, Intel had the right design - two cores could easily sit behind one large shared L2 cache. Move to four cores and the shared L2 design stops making sense. In some situations you’ll have cores operating on independent threads with no spatial locality, and for these scenarios each core will need its own L2 cache. In other scenarios you’ll have multiple cores working on the same data, in which case you’ll need a large cache shared by all cores. Again, Phenom was the right quad-core design, it just didn’t have enough cache (not to mention its other shortcomings).

In a way, Intel recognized that Conroe and Penryn were designed to win the dual-core race - over the life of both CPUs less than 5% of its desktop shipments were quad-core chips. Intel’s last tick and tock dominated the dual-core market. Nehalem and Westmere on the other hand are more interested in winning the multi-core races.

Phenom II addresses the cache deficiency. With a 6MB L3 cache, it nearly has the same size L3 as Core i7. The L2 caches remain larger at 512KB per core but I suspect that’s because AMD didn’t have the time/resources to redesign its cores for Phenom II. It takes 15 cycles to access AMD’s 512KB L2; that’s the same amount of time it takes to access Penryn’s 2x6MB L2. I’ll gladly wait 15 cycles if I have the hit rate of a 6MB cache, but not for a 512KB cache. AMD too will pursue a faster L2, that will most likely come in 2011 with Bulldozer (Orochi and Llano CPUs).

With a very large L3 cache, it no longer makes sense to have a large L2. Instead the L2 needs to be as fast as possible, acting as spillover from L1. Look at what happened to L1 cache sizes as CPUs got wider and faster. The L1 cache grew from 1KB, 8KB, 16KB and eventually up to 32 and 64KB in today’s designs. However L1 sizes haven’t increased beyond that point; instead we saw L2 caches grow and grow. Eventually they too hit a stopping point; for AMD that was Phenom, and for Intel that was Core i7.

With the number of cores growing, we need a large cache shared between all of the cores. Imagine a 12-core processor; would it have a massive 36MB shared L2 cache? Definitely not. It’d be too slow for starters, and the penalty for not finding something in L1 would be tremendous. Remember the point of the memory hierarchy: to hide latency between the software and the processor. A pyramid doesn’t work if the base fattens out too quickly. In the future, as we move to four, eight and more cores, L2 caches will have to be motherly figures to a core’s L1, feeding them individually, rather than a mess hall to feed everyone. That role will fall to the L3 cache.Carrying that further, we may even see future CPUs with more cores add a forth level of cache.

With the role of the L2 cache redefined from being service-all to a service-one, it makes sense for it to be small and fast. The original Phenom had the right idea, it needed a larger L3. Core i7 perfected that idea, and Phenom II took a step towards that. Cache sizes must continue to grow, but as they do, the number of levels of cache must increase as well to avoid a single, large penalty being paid as you go from one level of cache to the next.

Clock for Clock, Still Slower than Core 2 & Core i7 Finally, Cool 'n' Quiet You Can Use
Comments Locked

93 Comments

View All Comments

  • Shadowmaster625 - Friday, January 16, 2009 - link

    Why didnt you include an overclocked E5200 in the testing?!?!?!

    omg this is horrid. How do these $230-$270 CPUs compare to an $85 E5200 coupled with a $105 Gigabyte GA-EP45-UD3R? That combo will easily overclock to 3.8 Ghz on stock cooling. CPU, mobo + RAM all for less than the cost of a Phenom II. And better performance too.
  • Reynod - Monday, January 12, 2009 - link

    Another excellent article Anand.

    Would you be able to write a short piece on the AM3 socket and the "likely" impact on performance once you have some samples please?

  • R4F43LZiN - Saturday, January 10, 2009 - link

    I wanted to see some Phenom II overclocked gaming benchmarks...
  • zagortenay - Saturday, January 10, 2009 - link

    To correct my mistake in above post:
    "And check the link of very respectable "Guru of 3D" yourself, X4 940 beats Core i7 920 in higher resolutions." Not Core i7 940.
  • zagortenay - Saturday, January 10, 2009 - link

    Great comments Aranthos! AMD did a great job with Phenom II, no doubt about that.
    Anandtech review is kind of fair and balanced when it comes to giving the final verdict, but the tests are deceiving and unfair as usual.
    First of all, as somebody else has already pointed out, they used an average mobo to test Phenom II, while they used expensive enthusiast level mobos for Core2 Quad and Core i7 (230 and 250 Dollars respectively). They could not find an Asus M3A79-T (which is much cheaper at 185$) There is no excuse for that! Either a deliberate move not to show what Phenom II can deliver at its best, or Anand needs to learn a lot from us. Just check the below link to see the performance difference between 790GX and 790FX. Quite some performance difference in some benchmarks and consider that it is still a close competition with an average AMD motherboard.
    There IS a difference apparently: http://www.legitreviews.com/article/795/5/">http://www.legitreviews.com/article/795/5/ and it would change some conclusions
    And now introducing a classic: Why Core2 Quads run on DDR3 (Yeah, all Core2 Quad users definitely switched to DDR3, lol!) and Phenom IIs run on DDR2? To show the best of Core 2 Quads... So what happens if DDR2 is used also on Core 2 Quads? See yourself below. X4 940 beats Q9550 most of the time and even Q9650 in some applications.
    http://www.bit-tech.net/hardware/2009/01/08/amd-ph...">http://www.bit-tech.net/hardware/2009/0.../amd-phe...
    I wander what trick they will do with RAMs, when Phenom II AM3 (with DDR3) comes this February.
    Anand says X4 940 trails Q9650 and by 28.4% when it comes to Far Cry 2. Is it so? Or? Check the above link again. Even with DDR3 RAMs Q9650 leads by only 4.4%. With DDR2, X4 940 leads this time, with a little margin. Same resolution! Who to beleive?
    And check the link of very respectable "Guru of 3D" yourself, X4 940 beats Core i7 940 in higher resolutions. He he!
    http://www.guru3d.com/article/amd-phenom-ii-x4-920...">http://www.guru3d.com/article/amd-phenom-ii-x4-920...
    So what? Don't get fooled, don't get deceived by the "big brother connections".
    Final words: Yes I am a fan boy and I don't pay a penny for Intel!
  • Aranthos - Saturday, January 10, 2009 - link

    I wonder why so many people keep saying "What happened to the AMD from the Athlon64 era? It was whupping Intel!" etc.

    That AMD is still here. The same AMD that so long ago brought us Hypertransport, the integrated memory controller, native dual-core and the like brought us native quad-core and a three level cache heirarchy a full year before Intel did either. As it turned out, Intel did it better - a fact with which I won't even try to argue. However, AMD is still working.

    P1 flopped. It was the most hyped chip in years, and brought all sorts of false promises. All Deneb promised was better overclocking, lower power consumption and more clock-for-clock performance. It did all 3.

    I'm not going to say Intel ripped off AMD by using an IMC and a HT-esque high speed interconnect. Granted, AMD did it first, but Intel would have ended up doing it ANYWAY because it is a good idea.

    Back to the original topic - we still have the old AMD with us. They're still innovating as always. But, we have a new Intel. One that isn't peddling crappy Netburst chips. New Intel is going out guns blazing, and they have the money to make sure that another P4 doesn't happen.

    AMD got lucky back in the P3 -> P4 era. They're gonna have to either pull out a win of epic proportions, or stick to razor thin margins on their chips. Intel has seriously deep pockets, and can easily afford to destroy AMD's prices.

    i7 is epic win. But I'm buying a Deneb anyway. Yeah, people are gonna call me a fanboy, and so what. I'm buying a chip made by a company that is facing a company over 50x their size. While they're not a little family run business, I will support them to their dying breath because they need every sale they can get. I like high performance as much as the next guy, but if buying higher performance (in my case at a higher price [I have an AM2+ motherboard]) puts the business a step closer to being one-sided, then Intel can suck it. They don't need my money.
  • Mathos - Saturday, January 10, 2009 - link

    Well, it's not quite a 4800 series equivelent release this time. More like being around the same as the 3800 series was. A good improvement over the hd2000 series cards, in this case a good improvement over the phenom 1.

    On the other hand I am wanting to see what will happen with the AM3 versions. Should improve scores quite a bit on anything that likes memory speed and bandwidth. I'm also wondering what other optimizations will come with AM3. Gonna wait for that, since I should be getting another nice quarterly bonus around the time those come out. Use a pII 945 on my old k9a2 plat till I can get an AM3 board and DDR3 memory.
  • Maroon - Saturday, January 10, 2009 - link

    This was a great step forward by AMD. After the Phenom flop they had to transition successfully to 45nm or basically fold. They have done that. AMD doesn't have the resources to compete with Intel on the highend right now, but with this release they can compete in the mainstream market where most processors are sold.

    They're using the same strat that worked for the 4xxx series video cards, performance per dollar.

  • Megaknight - Friday, January 9, 2009 - link

    Have the Intel fanboys noticed that they're comparing a DDR 2 Phenom II to a DDR 3 i7? I know AMD will be slower anyway, but it should close the gap a bit, right?
  • aeternitas - Friday, January 9, 2009 - link

    Are you serious? DDR3 might be able to give AMD an advantage overall compared to the C2. But then going DD3 and spending that much money, you might as well go i7 anyway. (unless you're a fanboy)

    Seriously guys, if you're going t go around comparing P2 to the i7, you need to focus on price! Else you'll look really bad.

    P2 is against C2. End of story. You cant sit around theorizing some magical item is going to get a P2 anywhere close to the i7.

    Stop defending AMD. They have 18 months of catchup to do and dont need pats on the back and excuses from people like you. They have a good chip and alot of good things in the chip that they dont need to do in the future as they come up with new architecture.

Log in

Don't have an account? Sign up now