Cache and Memory Controller Comparison

Now that you know what parts to compare, let's drill a little deeper. Since cache is a major element separating Phenom II from its precessor, let's start there.

Phenom II, like its predecessor, maintains a 3-cycle 64KB L1 cache. With Nehalem, Intel had to move to a 4-cycle cache, so Phenom II retains the hit rate and performance benefits of a larger, faster L1. The L2 cache latency is where Phenom II and Intel’s architectures really differ.

Phenom II, like the original, has a 512KB L2 cache per core, but the cache is a high latency 15 cycle cache. Compared to the Athlon X2’s 20 cycle L2, Phenom II looks pretty good, but now look at Penryn. Penryn’s 15 cycle L2 is the same speed as Phenom’s L2, but it’s 2-6x larger. Core i7 trumps them all with a very fast 11 cycle L2, although it achieves this by having the smallest L2 cache per core out of the bunch - only 256KB in size.

AMD asserts that Phenom II’s L3 cache is now 2-cycles faster than Phenom’s L3. At 3x the size but with improved access time, Phenom II’s L3 is closer to where it should have been in the first place. Everest measures Phenom II’s L3 as having a 55-cycle latency, while Core i7 has a 35 cycle L3. Sandra puts Core i7 and the original Phenom at 55 cycles, but Phenom II at 71 cycles. I checked with Intel and AMD, and it appears neither application is reporting the correct L3 access latencies for either processor. Intel confirmed Core i7’s L3 as a 42 cycle L3 and I’m still waiting to hear back from AMD on the time to access its cache, but I suspect it will be around 50 cycles.

Processor L1 Latency L2 Latency L3 Latency
AMD Phenom II X4 920 (2.80GHz) 3 cycles 15 cycles AMD won't tell me
AMD Phenom @ 2.8GHz 3 cycles 15 cycles AMD won't tell me
Athlon X2 5400 (2.80GHz) 3 cycles 20 cycles -
Intel Core 2 Quad QX9770 (3.2GHz) 3 cycles 15 cycles -
Intel Core 2 Quad Q9400 (2.66GHz) 3 cycles 15 cycles -
Intel Core i7-965 (3.2GHz) 4 cycles 11 cycles 42 cycles

Main memory access time is more telling. A trip down memory lane will cost you 107 ns on an original Phenom processor, 100 ns on an Athlon X2, and now only 95 ns on a Phenom II. The 11% improvement in memory access performance is due to improvements AMD made when it redesigned the memory controller to include support for DDR3.

L2: It’s the New L1

I think I finally get it. When Nehalem launched I spoke with lead architect Ronak Singal at great length about its L2 cache being too small. I even made this graph to illustrate my point:


Click to Enlarge

With only 256KB per core, Core i7’s L2 cache was a large step back. Ronak argued that its 11-cycle load latency was more important than size. But it took Phenom II for me to understand why.

The original Phenom suffered because not only did it have very little L2 cache per core (512KB compared to as much as 6MB with Penryn), but it also had a very small L3 cache. Four cores sharing a 2MB L3 cache just wasn’t enough. The problem is AMD was die constrained; Phenom needed more L3 cache but AMD needed to keep the die size manageable to avoid bankruptcy. Architecturally, Phenom was ahead of its time.

If we were to live in the dual-core era forever, Intel had the right design - two cores could easily sit behind one large shared L2 cache. Move to four cores and the shared L2 design stops making sense. In some situations you’ll have cores operating on independent threads with no spatial locality, and for these scenarios each core will need its own L2 cache. In other scenarios you’ll have multiple cores working on the same data, in which case you’ll need a large cache shared by all cores. Again, Phenom was the right quad-core design, it just didn’t have enough cache (not to mention its other shortcomings).

In a way, Intel recognized that Conroe and Penryn were designed to win the dual-core race - over the life of both CPUs less than 5% of its desktop shipments were quad-core chips. Intel’s last tick and tock dominated the dual-core market. Nehalem and Westmere on the other hand are more interested in winning the multi-core races.

Phenom II addresses the cache deficiency. With a 6MB L3 cache, it nearly has the same size L3 as Core i7. The L2 caches remain larger at 512KB per core but I suspect that’s because AMD didn’t have the time/resources to redesign its cores for Phenom II. It takes 15 cycles to access AMD’s 512KB L2; that’s the same amount of time it takes to access Penryn’s 2x6MB L2. I’ll gladly wait 15 cycles if I have the hit rate of a 6MB cache, but not for a 512KB cache. AMD too will pursue a faster L2, that will most likely come in 2011 with Bulldozer (Orochi and Llano CPUs).

With a very large L3 cache, it no longer makes sense to have a large L2. Instead the L2 needs to be as fast as possible, acting as spillover from L1. Look at what happened to L1 cache sizes as CPUs got wider and faster. The L1 cache grew from 1KB, 8KB, 16KB and eventually up to 32 and 64KB in today’s designs. However L1 sizes haven’t increased beyond that point; instead we saw L2 caches grow and grow. Eventually they too hit a stopping point; for AMD that was Phenom, and for Intel that was Core i7.

With the number of cores growing, we need a large cache shared between all of the cores. Imagine a 12-core processor; would it have a massive 36MB shared L2 cache? Definitely not. It’d be too slow for starters, and the penalty for not finding something in L1 would be tremendous. Remember the point of the memory hierarchy: to hide latency between the software and the processor. A pyramid doesn’t work if the base fattens out too quickly. In the future, as we move to four, eight and more cores, L2 caches will have to be motherly figures to a core’s L1, feeding them individually, rather than a mess hall to feed everyone. That role will fall to the L3 cache.Carrying that further, we may even see future CPUs with more cores add a forth level of cache.

With the role of the L2 cache redefined from being service-all to a service-one, it makes sense for it to be small and fast. The original Phenom had the right idea, it needed a larger L3. Core i7 perfected that idea, and Phenom II took a step towards that. Cache sizes must continue to grow, but as they do, the number of levels of cache must increase as well to avoid a single, large penalty being paid as you go from one level of cache to the next.

Clock for Clock, Still Slower than Core 2 & Core i7 Finally, Cool 'n' Quiet You Can Use
POST A COMMENT

93 Comments

View All Comments

  • Beno - Monday, January 12, 2009 - link

    fanboys help keep them alive.
    if more ppl started looking at AMD again, then Intel will be scared, so us the consumers will be happy because of prices.

    intel has been greedy and overpriced their c2 because there was no competetion at that time.
    Reply
  • garydale - Friday, January 09, 2009 - link

    I generally buy AMD processors for two reasons. The first is that I am not a gamer so I'm looking for cost-effective business application solutions. I'd rather double the memory than increase the processor speed, so AMD works well at the price points I build to.

    Secondly, I believe in the need for competition. With the power PC processor virtually absent from the consumer market and there being little else to choose from for the desktop market, AMD is Intel's only real competitor. So long as AMD has chips that are good enough to compete with Intel's on price/performance, I prefer to buy them.

    If Via got their Cyrix processors up to a decent speed, I might be tempted to switch to them, but let's face it, they don't really compete in this market. So in a two-way race, we need to put our money behind the underdog to prevent a monopoly.

    I've been buying ATI cards too for similar reasons. Nice to see that AMD's making advances in both areas.

    To be clear, I've got nothing against Intel, at least not since the Pentium fiasco, but I think everyone will agree that having multiple firms competing is better for consumers than having one company dominate (Windows 95, 98, Millenium Edition, Vista come to mind). :)
    Reply
  • aeternitas - Friday, January 09, 2009 - link

    Much of your post should go next to the Webster definition of "AMDfanboi"

    If you want true competition, buy the better product. I got my sweet A64. I will now consider P2 over a C2D, but because of price/performance/watt alone.
    Reply
  • Certified partner - Friday, January 09, 2009 - link

    "Blender is one of the few tests that doesn't strongly favor the Core i7, in fact it does not favor them at all. Here the Core 2 Quad Q9650 is the fastest processor, followed by the Phenom II X4 940 and the Phenom II X4 920."
    http://www.techspot.com/review/137-amd-phenom2-x4-...">http://www.techspot.com/review/137-amd-phenom2-x4-...

    "Blender shows Phenom II less competitive than the other 3D rendering tests we've seen thus far."
    http://www.anandtech.com/cpuchipsets/showdoc.aspx?...">http://www.anandtech.com/cpuchipsets/showdoc.aspx?...

    Both can't be true. Explanations would be highly appreciated. I suggest, that anandtech ask techspot about the test settings. Blender is capable of using several threads but I'm not sure wether the optimization is automated. Please, play with the settings. For example, 8*8 (render) tiles can benefit from 8 threads while 1*1 can't.
    Reply
  • Max1 - Friday, January 09, 2009 - link

    How much money has paid Intel to you for this "testing"? You have tested productivity of processors only on two games. In both games productivity of Core 2 is above. In one of them much more above, but it happens seldom. Other tests show, that productivity of Core 2 Quad in part of games is above. In part of games productivity Phenom II of same frequency is above. Why there is so a lot of coding and synthetic tests where Intel is faster, and as always there are no other applications? Why you continue to say lies, as earlier liars for money of Intel that Northwood is ostensibly faster, than Barton. Reply
  • strikeback03 - Friday, January 09, 2009 - link

    I'm surprised how long it took the fanbois to start commenting on this article. Didn't really get rolling until several pages into the comments. Reply
  • JimmiG - Friday, January 09, 2009 - link

    Bit disappointing that it's still slower than Core2 clock for clock. But given the performance of the original Phenom, I think the CPU performs as expected. A big leap for AMD. Unfortunately for them, Intel made an even bigger leap when they switched from Netbust to Core2.

    Also a bit concerned about this supposed "backwards compatibility". Many of the original 790FX boards, my M3A32-MVP Deluxe in particular, will not work with AM3 CPUs because Asus does not plan on releasing a BIOS update. Of course that's the fault of second-rate mobo companies like Asus, and not the fault of AMD. I'll probably end up getting a DDR2 PII-940 to replace my X4 9650, but I'll wait until the prices have dropped some.
    Reply
  • anandtech02148 - Friday, January 09, 2009 - link

    Just wait till Am3 socket comes out, Intel will have to make a slight cheaper version of x58chipset. Is that sweat i see on their forehead?
    Amd buy Via's Nano and give them a 2 prong attack.
    Reply
  • RogueAdmin - Friday, January 09, 2009 - link

    AMD has gone a long way to improving the performance of its processors, and everyone should go out and buy them. They need our support, and without it we will have to put up with whatever Intel decide to give us. And everyone here I think remembers the P4 days, let them not come again!
    The AM2 /2+ /3 platform is by far the easiest upgrade option. No need to worry if your NB chipset supports the latest FSB or RAM, because its all integrated into the CPU. A feature than Intel has copied in its new i7. Along with the monlithic quad core design, and level 3 cache. Also do not forget that AMD released the first x86-64 CPU, and intel basically complied to its x86-64 code to be compatible with the software developed for it.
    i7 is fast, very fast. But do you need that kind of performance in your everyday life? i7 is designed for workstation's hence the benchmarks of video encoding and 3D applications. Gamers would be better off getting a top of the line GPU. Is your CPU 100% utilized 24/7?
    I saw a comment about the 2 year old Core 2 Quad being faster, only in Far Cry 2, that one test. And I would rather play Crysis anyday.
    Since its release Intel has tweaked the performance of these with new cores no end.

    Sorry I digest.... lol

    Keep things competitive, buy AMD. Fanboy or no, let the price wars rage on.

    Reply
  • aeternitas - Friday, January 09, 2009 - link

    1. Most people use their everyday system to *work* too.
    2. Dont compare i7 to P2. Youll just look like youre really reaching and a fanboy.
    3. You dont -Need- anything better than a A64 for everyday tasks. Depending on how long you wanna wait though, you will go to a better system. That point about not -needing- better hardware has always been ridiculous and only applies to grandmas and people that use the computer for browsing and music. Those people dont care about this area in computer so its moot!

    P2 is great, but be realistic. Its competing against C2 right now. Comparing technicalities and -who was firsts- doesnt provide more FPs in anything. It just makes for flame fodder. The numbers speak for themselves and I think this article did a good job in putting the P2 in its place. As a great alternative for people looking to upgrade from older than C2 hardware.
    Reply

Log in

Don't have an account? Sign up now