Brisbane Performance Issues Demystified: Higher Latencies to Blame

As you'll remember from Part 1, for some reason, our 65nm Athlon 64 X2 5000+ performed slower than our 90nm part. We had contacted AMD before publication of the article but didn't receive a response until after we were well underway with Part 2. AMD's explanation for the reduced performance? Higher memory latencies.

We wanted to investigate exactly how much higher, thus we turned to CPU-Z's latency benchmark to give us a quick indication of how things had changed.

CPU CPU-Z Latency (8192KB, 128-byte)
AMD Athlon 64 X2 5000+ (65nm) 122 cycles (46.92 ns)
AMD Athlon 64 X2 5000+ (90nm) 121 cycles (46.54 ns)

A single cycle increase in memory access latency, or 0.4ns, is a slight increase but not enough to cause the sort of performance deltas we saw in Quake 4 and Half Life 2, something else was amiss. Luckily it was another metric that CPU-Z's latency test reported that helped us understand the cause of the poor performance: L2 cache access latency.

CPU CPU-Z L2 Cache Latency ScienceMark 2.0 L2 Cache Latency
AMD Athlon 64 X2 5000+ (65nm) 20 cycles 20 cycles
AMD Athlon 64 X2 5000+ (90nm) 12 cycles 12 cycles

Updated - 1/5/07: Although AMD previously did not mention any issues with our findings, we were contacted today and informed that the latency information both ScienceMark and CPU-Z produced is incorrect. The Brisbane core's L2 latency should be 14 cycles, up from 12 cycles and not 20 cycles. This would help explain the relatively low impact on application performance that we've seen across the board. We are still waiting to hear back from AMD on a handful of other issues regarding Brisbane and will update you as soon as we have more information.

The original K8 core, in both 130nm and 90nm flavors, had a 12-cycle L2 cache. With Brisbane, as reported by both CPU-Z and ScienceMark, 65nm K8 now has a 20-cycle L2 cache. Generally speaking you move to a higher latency cache if you're planning on introducing a larger cache size, but a quick glance at AMD's roadmaps doesn't show anything larger than a 1MB L2 per core for the next year. The argument for higher clock speeds isn't valid either as the highest clock speed on AMD's roadmaps thus far is only 3.2GHz.

Luckily the performance impact of the higher latency L2 cache isn't noticeable in all applications, thanks to the K8's on-die memory controller, but make no mistake - the new core is slower. We couldn't figure out why AMD made the change and with most of our key AMD contacts on vacation due to the holidays, we still have no official response on the matter. Rest assured that if/when we learn more we will let you know.

Updated: AMD has given us the official confirmation that L2 cache latencies have increased, and that it purposefully did so in order to allow for the possibility of moving to larger cache sizes in future parts. AMD stressed that this wasn't a pre-announcement of larger cache parts to come, but rather a preparation should the need be there to move to a vastly larger L2. Thankfully the performance delta isn't huge, at least in the benchmarks that we saw, so AMD's decision isn't too painful - especially as it comes with the benefit of a cooler running core that draws less power; ideally we'd like the best of all worlds but we'll take what we can get. Note that none of AMD's current roadmaps show any larger L2 parts (other than the usual 2x1MB offerings), which tells us one of two things: either AMD has some larger L2 parts that it's planning on releasing or AMD is being completely honest with the public in saying that the larger L2 parts will only be released if necessary.

Of Die Sizes, Voltages and Power The Test
POST A COMMENT

52 Comments

View All Comments

  • mino - Thursday, December 21, 2006 - link

    RD580 is even lower than P965 ... NF i680 and NF 590 are both power hogs.
    They are not ideal (as well as 8800GTX) for power-comparison but they are BOTH pretty hot in their respective markets.
    Reply
  • JackPack - Thursday, December 21, 2006 - link

    Where did you pull that "90%" figure out of? If a PC is idling more than 90% of the time without going into standby or hibernate, the user is an idiot.

    Hardly any PCs operate at pure idle. No real-time antivirus scan, no file indexing in the background, no email autochecking, no IE7 open with at least one Flash ad, etc.
    Reply
  • mino - Thursday, December 21, 2006 - link

    Well, how would you like Your PC to standby(not to mention hibernate) while typing or listening to MP3's ???
    At these moments (most common usage of a PC BTW) the average CPU use is 1% to 5%.

    ... ;-)
    Reply
  • mino - Thursday, December 21, 2006 - link

    Sorry fo no reading the second sentence, the first one was too crazy to continue reading back then ... So"

    Wwhat is "pure idle" ? CPU is able to go between C-states in (micro-to-mili)seconds, How fast can you type?
    AV checking? when you type? to check whether one is coding some exploit? :)
    Backgroung file-idexing? no thanks, I prefer on-MY-demand search to on OS's demand.
    Email-autocheck? done in 0.1s at 5% CPU used, once in 5 minutes...
    IE7? no, thanks, not required for Windows Update...
    Flash ad open? no, thanks, flash enabled only for reasonable sites or the ones requiring it(a few). Also, an usuall Flash is only up to 10% K8 core at 1000MHz
    etc.
    You may ask, why X2/C2D then when no background BS? Well, as of now I'm pretty happy with my Q1 install of Win2k on A1.66/512M/R9200/dualUXGA backed up by ~ 2TB NAS(with 3G P4C :). The system is more responsive than nearby mate's X2/1G with all that "necessary" bloat you mentioned.
    Me having loaded 50+ webpages and 5-10 active apps a common sight...
    Reply
  • mino - Thursday, December 21, 2006 - link

    Now I figure, maybe, maybe, the average PC has become so bloated and unmaintained as to not even be able to put CPU's to Sleep states?
    I have not seen this except outrageously malwared machives yet. However my sample size may be unrepresentative a bit too much.

    If it is so, to abandon PC and return to calculus at primary may be a good idea.
    Reply
  • JackPack - Thursday, December 21, 2006 - link

    That's not idling.

    Nice strawman, BTW.
    Reply
  • mino - Thursday, December 21, 2006 - link

    Well, wrote "90% of time" ... did not write how big the chunks of time are - they vary pretty much from tens of microseconds to tens of minutes.

    P.S. that post of mine from 10:19 was written before yours 10:13.
    Reply
  • JackPack - Thursday, December 21, 2006 - link

    ...and AMD wants to accelerate their transition to 45nm? Maybe they have a magic lamp somewhere in their Sunnyvale office.

    Seems like the increase in L2 latency might be a contingency plan for GHz or more cache, in the event Agena doesn't meet its Q3 target.
    Reply
  • Locutus465 - Thursday, December 21, 2006 - link

    I upgraded to an S939 X2 earlier this year, so I'm going to be out of the serious upgrade market for a while (might pick up a better CPU or graphics card that's about it). So personally I'm waiting for K8-L and co-processors to see how things shake out. I do have to say I had hoped better from AMD, but after 3 years of dominance I think a stumble like this is just what they need to get them back on the war path of innovation. Reply
  • peldor - Thursday, December 21, 2006 - link

    AMD's vision of coprocessors is 2009 stuff. You'll be out of the market a long time if you're waiting on that. Reply

Log in

Don't have an account? Sign up now