DDR/NetBurst Memory Bandwidth and Latency

One of the most talked-about AMD advantages of the last couple of years has been their on-processor memory controller. This has allowed, according to popular theories, the Athlon64 to significantly outperform Intel NetBurst processors. The fact is NetBurst DDR2 bandwidth has recently been similar or wider in bandwidth than Athlon64 - even when the DDR is overclocked. You can see this clearly when we compare Buffered and Unbuffered Bandwidth of a NetBurst 3.46EE to an AMD 4800+ x2(2.4GHz, 2x1MB Cache) running DDR400 2-2-2 and running overclocked memory at DDR533 3-3-3.

The green bars represent DDR memory performance, while the beige to red are increasing DDR2 speed on NetBurst. Light green represents DDR400 2-2-2 while Dark Green is overclocked memory at the same CPU speed, DDR533 at 3-3-3.

Standard (Buffered) Memory Test


In buffered performance, Fast DDR400 is only faster than DDR2-400 and slower than DDR2-533, 667 and 800. Overclocked memory at DDR533 3-3-3 is faster than any of the DDR2 bandwidths on NetBurst.

The Sandra Unbuffered Memory Test, which turns off features that tend to artificially boost performance, is generally a better measure of how memory will behave comparatively in gaming. The same green for DDR applies here.

Unbuffered Memory Test


Without Buffering, DDR400 has the smallest bandwidth of tested memory speeds and timings. Even overclocking to DDR533 allows the DDR to barely beat DDR2-400. DDR2-533, 667, and 800 all have greater Unbuffered bandwidth than the DDR overclocked to 533. NetBurst DDR2 memory bandwidth is generally wider than the bandwidth supplied by DDR memory on Athlon64. Despite the wider bandwidth, the deep pipelines and other inefficiencies in the NetBurst design did not allow the NetBurst processors to outperform Athlon64. Keep this in mind later, when we look at AM2 and Core 2 Duo Memory Bandwidth.

Latency

The other area where AMD has had an advantage over NetBurst DDR2 performance is memory latency, the result of the on-processor memory controller. Comparison of the AMD DDR Memory controller and the Intel DDR2 Memory controller in the Intel chipset shows AMD DDR with latency about 35% lower than Intel NetBurst in Science Mark 2.0.

Memory Latency Comparison - DDR & NetBurst


While memory bandwidth was very similar between AMD and NetBurst, the deep pipes of the NetBurst design still behaved as if they were bandwidth starved. On the other hand the AMD architecture made use of the bandwidth available and the much lower latency to outperform NetBurst across the board.

Index AM2/Core 2 Duo Latency and Memory Bandwidth
Comments Locked

118 Comments

View All Comments

  • Wesley Fink - Tuesday, July 25, 2006 - link

    We purchased Everest and believed we were using the latest release version. The version we used for testing does support Core 2 Duo, but not with all the features supported in 3.01. We will update to version 3.01, rerun Everest tests, and post the new results in the review. This was an honest oversight, not a conspiracy.
  • Wesley Fink - Tuesday, July 25, 2006 - link

    Version 3.01 was just released July 16, 2006.
  • saratoga - Tuesday, July 25, 2006 - link

    quote:

    The introduction of AM2 merely increased the AMD latency advantage. AM2 latency was slightly lower than DDR latency on AMD.

    However, Core 2 Duo did what most believed was impossible in Latency. One of AMD's advantages is the on-processor memory controller, which Intel has avoided. It should not be possible to use a Memory Controller in the chipset on the motherboard instead and achieve lower latency.


    Your instincts are right, those latency numbers are impossible. The prefeter is outsmarting your benchmark, and you're not measuring memory latency because a lot of the fetches are coming out of the L2. The reason latency numbers look so low is because L2 is quite a bit faster then memory ;)

    quote:

    Intel developed read-ahead technologies that don't really break this rule, but to the system, in most situations, the Intel Core 2 Duo appears to have lower latency than AM2, and the memory controller functions as if it were lower latency.


    I'm not sure that Intel actually developed prefetching. Its been used by the entire industry (Sun, Intel, AMD, IBM) for many, many years. What you have here is just a poor benchmark, not a breakthrough.

    Which is the problem here. The page says "memory latency" which is flat out wrong. You simply are not getting 36 ns access time out of the combined FSB, memory controller and DRAM cells. It is not possible, and its a pretty big oversite IMO to claim otherwise.

    If you want to test prefetcher, then go for it. Just don't call it memory latency, because the two are not the same thing.
  • sld - Tuesday, July 25, 2006 - link

    So now ScienceMark is a turncoat too, after showing AMD superiority for the past year or so?

    Get a grip, people, as Everest has shown, memory latency on the Core 2 Duo is still higher than that on the K8.

    What ScienceMark shows is that the prefetchers on the Core 2 Duo are brilliant, the results bear out and the Core 2 Duo beats the K8 in most (if not all) games convincingly (if I use superlatives, even if they are true, I'll contract rabies, you know) and does that clock-for-clock (not to mention $-for-$ until AMD generates a significant price crash).

    All the while, I've been tolerating Intel fanboys crapping about non-existent benefits about their Netbust crap, and now you want me to tolerate the beginnings of AMD fanboyism too?
  • mino - Tuesday, July 25, 2006 - link

    These numbers clearly show that siencemark developer does not tak into account 4M caches.
    It quite simple, at a the time of its creation 512k was considered a big cache.
    As for SuprePI 1M - the same, 1M was chosen as standard no because it is a nice number, instead it was chosen because at the time there was no need for bigger datasets. Datasets for 1M _were_ simply big enough to show also memory perf., not just the size of the cache.
  • redpriest_ - Tuesday, July 25, 2006 - link

    Actually, ScienceMark tests up to 16 MB of memory. See, I forsaw a day where this would be a problem and sized the test appropriately. What you're seeing is that the prefetcher is clever enough to pick up all the patterns we use to "fool" hardware prefetchers. This in itself is a great indicator of performance.

    Of course, our next revision will have a harder algorithm to fool, but I think we'll also keep the old one (because it still shows an important data point).

    1 of 3 authors of SM2.0
  • mino - Tuesday, July 25, 2006 - link

    Boys give us EDIT ..
  • duploxxx - Tuesday, July 25, 2006 - link

    Last months have given drastic changes in the cpu world for performance king and price.

    but the conclusion quality and review headers of anand are going further down day by day. (as a review site you should bring info to the world not fanboyism and sponser)come on such a statement to begin "Core 2 Duo (Conroe) launched about twelve days ago with a lot of fanfare. With the largest boost in real performance the industry has seen in almost a decade it is easy to understand the big splash Core 2 Duo has made in a very short time" do you really believe what you are saying or are you paid for each overmarketing comment you publish.

    the article concept is nice looking at memory latency, but what cas level do you use at a rated speed, we al know that conroe is a better performer with any memory used but things change when the real cas latency is used, so a whole article without additional cas info is just a half article off no use. When things get realy interesting in page 7, you just stop performance comparisson on ddr2 1067 and 1112. i know its difficult because for now there is no devider to bring this memory to equal speed but you can clearly see that k8 is much more memory dependent that the conroe. conroe is saturated. in half life and quake bringing the memory to 1200 would give an equal performance.

    looking at page 8 you convert the chart of page 7 to a nicer way to look :)
    but for sure you changed the cas on the oc'ed fx down because looking at the linear performance gain from ddr-2 400 till ddr2-800, there is only gain due to the oc'ed fx and not due to the added memory bandwith. the gain is not linear to the performance gain on page 7.

    I will hold my comments on for example the superpi... we all know wat its worth.

    A review on 64bit or multithread performance would be nice to see how well the design is made for the future or are you as a review site restricted to some type of benchmarks?

    same i asked on the server woodcrest. you bring benchmarks that are unable to compare. here you bring reviews where in the first line of the review you already have a conclusion and the benchmark contains not enough info to reflect in real life what to buy for memory and cas latency for example to get the desired performance.
  • OcHungry - Tuesday, July 25, 2006 - link

    I concur with you. It is interesting that memory timings used was not mentioned in the review.
    Using 266x11 helps but it is not enough for AMD’s powerful IMC . If 330x9 was used instead it would have given AMD a considerable boost in performance. Anandtech is trying to show Core 2 duo is a better cpu, but fails to notice any improvement through utilization of IMC, memory latency and mem speed. What's funnier, is that the data provided, clearly shows a substantial increase in performance if memory speed is changed to 266mhz, 1:1 ratio. The increase in performance is at least by 5% across the benchmarks. My calculations show: for every 10% increase in memory speed, performance increases by 2%. Hence, running FX62 @ 330x9 amounts to 11% increase in performance. We know AMD is releasing an Improved IMC capable of utilizing memory speed in1T. 1T will increase performance by another 3-5%. The memory latency is another crucial factor for AMD. According to the author of this review: “ memory speed does not affect Conroe’s performance”. Great. But since latency tremendously affects AMD’s platforms it should have been considered to give us meaningful comparison. Most probably The timings used were: 4-4-4-12 (corsair DDR2 800). Tightening the timings to 3-3-3-8 would have given another 7% to 10% boost in FX62's performance.
    If we add all the mentioned optimal settings for FX62: (11%+3%+7%) = 21% increase in performance should have resulted- which is fair, accurate, and represent the true comparison of FX62 against E6800.
    Further more, 64bit OS and window vista is reported to give AMD about 16% improvement compared to 10% for conroe. This will give another 6% boost in performance for AMD’s platform that brings the total increase in performance to 27% washing clean all the superficial benchmarks(that were tilted ).
    As far as gaming is concern, we know vid card is the deciding facto. So I believe AMD has position itself perfectly for every corner of the market.
    To say Intel has captured the mid and high end of the market, is as erroneous as when Intel was pushing Netburst to capture gaming enthusiasts. We know that marchitecture and All that PR failed then, and will fail agin now.
  • IntelUser2000 - Tuesday, July 25, 2006 - link

    quote:

    I concur with you. It is interesting that memory timings used was not mentioned in the review.
    Using 266x11 helps but it is not enough for AMD’s powerful IMC . If 330x9 was used instead it would have given AMD a considerable boost in performance. Anandtech is trying to show Core 2 duo is a better cpu, but fails to notice any improvement through utilization of IMC, memory latency and mem speed. What's funnier, is that the data provided, clearly shows a substantial increase in performance if memory speed is changed to 266mhz, 1:1 ratio. The increase in performance is at least by 5% across the benchmarks. My calculations show: for every 10% increase in memory speed, performance increases by 2%. Hence, running FX62 @ 330x9 amounts to 11% increase in performance. We know AMD is releasing an Improved IMC capable of utilizing memory speed in1T. 1T will increase performance by another 3-5%. The memory latency is another crucial factor for AMD. According to the author of this review: “ memory speed does not affect Conroe’s performance”. Great. But since latency tremendously affects AMD’s platforms it should have been considered to give us meaningful comparison. Most probably The timings used were: 4-4-4-12 (corsair DDR2 800). Tightening the timings to 3-3-3-8 would have given another 7% to 10% boost in FX62's performance.
    If we add all the mentioned optimal settings for FX62: (11%+3%+7%) = 21% increase in performance should have resulted- which is fair, accurate, and represent the true comparison of FX62 against E6800.
    Further more, 64bit OS and window vista is reported to give AMD about 16% improvement compared to 10% for conroe. This will give another 6% boost in performance for AMD’s platform that brings the total increase in performance to 27% washing clean all the superficial benchmarks(that were tilted ).
    As far as gaming is concern, we know vid card is the deciding facto. So I believe AMD has position itself perfectly for every corner of the market.
    To say Intel has captured the mid and high end of the market, is as erroneous as when Intel was pushing Netburst to capture gaming enthusiasts. We know that marchitecture and All that PR failed then, and will fail agin now.


    My calculation says you are wrong. As much as my calculation is wrong, your logic/calculation is even moreso.

    Have you looked at the performance gain that Athlon 64's gain with lower latencies?? Granted it was a single core Athlon 64, but people were still spreading bullshit that Athlon 64 will gain a lot from lower latencies. Yes it was in Anandtech forums!!

    Tech-Report: http://techreport.com/etc/2005q4/mem-latency/index...">http://techreport.com/etc/2005q4/mem-latency/index...

    Looking at Core 2 Duo, you can get BETTER performance with LOWER COST memory modules than A64. That means price/performance ratio goes FURTHER in Core 2's favor.

    Not very many people think super ultra low latency is important, they are not stupid enough to spend extra $200 on memory that gives MAYBE 5% performance increase.

    XS also telling you not to be stupid: http://www.xtremesystems.org/forums/showthread.php...">http://www.xtremesystems.org/forums/showthread.php...

Log in

Don't have an account? Sign up now