AMD was faced with a tradeoff during the development of the dual core Athlon 64 X2. In order to maintain backwards compatibility with earlier Socket-939 motherboards, they could not change the pinout of their dual core processors. While maintaining the same pinout resulted in the ability to upgrade virtually any Socket-939 platform to a dual core Athlon 64 X2, it meant that the dual core processors were left with no more memory bandwidth than their single core counterparts. The single-core Socket-939 Athlon 64s feature a 128-bit wide DDR memory controller, which when operating at DDR400 speeds, it gives the A64 a maximum of 6.4GB/s of memory bandwidth. Sharing the same memory controller, the dual core Athlon 64 X2s also feature the same 6.4GB/s of memory bandwidth, despite the fact that there are now twice as many cores vying for the same amount of memory bandwidth.

Luckily for AMD, the single core Athlon 64 was not very memory bandwidth limited, and thus, the move to dual core still allowed AMD to scale relatively well. In fact, based on the results that we saw in our Athlon 64 X2 3800+ review, AMD continues to consistently scale better from one to two cores than Intel, despite the reduction in memory bandwidth per core.

Meanwhile, AMD quietly introduced a handful of new memory dividers in the latest revisions of their Athlon 64 and Athlon 64 X2 processors. These new memory dividers allow for memory clock speeds above DDR400 to be enabled without overclocking the Hyper Transport bus. The beauty of these new memory dividers is that owners of faster-than-DDR400 memory can take advantage of the extra bandwidth offered by their modules, without overclocking their CPUs or the rest of their system.

Last month, we took a look at the performance benefit, or honestly, the lack thereof with using higher bandwidth memory and Athlon 64/X2 processors. For the most part, we saw a 0 - 3% improvement in real world performance, with the vast majority of benchmarks showing us a 0 or 1% increase in performance, thanks to the higher bandwidth memory. There were some isolated cases where having more memory bandwidth translated into higher performance, in particular things like video encoding, gaming and heavy multitasking environments, but for the most part, the performance gains were negligible.

The performance gains in video encoding and gaming were to be expected, and we theorized that there would be some significant gains in multitasking environments. In a multitasking environment, particularly with an Athlon 64 X2, the overall memory bandwidth requirements of the two combined cores should be at their peak, well above and beyond the demands of a single-core Athlon 64. We saw this in our original article where one of our heavier multitasking tests yielded a 6.5% increase in performance when using DDR480 with an Athlon 64 X2 4800+. At the same time, some of our lighter multitasking tests yielded absolutely no performance increase when paired with higher bandwidth DDR memory. So, the point of this article is to find out if multitasking Athlon 64 X2 owners can benefit any more than single-core users from employing these new faster-than-DDR400 memory speeds.

Given the very specific nature of this article, we’re only going to be focusing on one processor - the Athlon 64 X2 4800+. As we found in our last piece, slower X2s weren’t impacted any differently than the fastest of the bunch, so anything we find here should be just as applicable in the real world to all other X2 processors.

We also only focused on two memory speeds: the base DDR400 and the fastest possible setting on the 4800+, DDR480. The details of how to select these speeds and the hardware we used to do so can be found in our first article .

Multitasking Office Performance
Comments Locked

23 Comments

View All Comments

  • Araemo - Friday, August 12, 2005 - link

    I'm curious, does windows XP support NUMA?

    A quick google on the topic gives me conflicting info.

    People seem to think it does, if you manually turn on PAE(Which has its own performance overhead, right?), but MS's website says "NUMA is supported only on Windows Server 2003, Enterprise Edition and Windows Server 2003, Datacenter Edition."

    What I've read recently suggests that in the A64 X2 cpus, each core has one memory controller enabled, which suggests that NUMA could be usefull for performance reasons. However, what I read originally when the X2's were coming out was that one core simply had both its memory controllers disabled.. Does anyone know which of these two is correct?

    In any case, it sounds like memory latencies to different memory addresses will be different between the cores.

    Either one core will always have a higher latency, or each one will have low latency to some addresses and high latency to others.
  • Starglider - Friday, August 12, 2005 - link

    The Athlon64 die contains a dual-channel DDR memory controller, three hypertransport transcievers, one or two processor cores and a crossbar switch that links them all together. Adding an extra processor core to the X2 didn't duplicate any of the other parts, so no there aren't any disabled memory controllers on there. Both cores are connected to the memory controller through the switch, so they have equal access to both channels (which are interleaved anyway when both active). NUMA would not be relevant because the banks aren't independently addressable by the OS and deliver exactly the same bandwidth and latency to both cores anyway. NUMA is only useful if your system has more than one processor socket, i.e. is an Opteron system.
  • Araemo - Friday, August 19, 2005 - link

    Thanks for clearing that up for me, but the # of sockets really has nothing to do with it. It is the # of independant memory controllers that matters, and AMD could have placed multiple single-channel controllers on the die if they thought the performance would be improved, but if the memory controller is 'external' to the core(Accessable via HT instead of a more direct link.. not that HT isn't good.), then I guess it doesn't matter. I was thinking the memory controller was part of the same HT node as the CPU core, but the method you described makes more sense anyways. If you have the memory controller logically seperated from the core, it can serve DMA requests from the northbridge/southbridge without bothering the CPU at all, as DMA should be.
  • Diasper - Friday, August 12, 2005 - link

    It looks to me like future dual-core games will benefit from the extra bandwidth. The logic for that being using a high-efficient dual-core engine both cores should be demanding as much bandwidth as possible and so consequently, we might see something more akin to the multitasking with Doom3 performance numbers.

    Either way the numbers should be over the numbers we saw first time when testing dual-core with only a single-core game so say that's 5%+ im[provement at DDR500. Either way I think this information is pretty significant for those going with dual-core processors.

    Now where did my high sppeed low latency 1GB sticks go...

    Oh yeah and first.
  • Zebo - Friday, August 12, 2005 - link

    I don't know about that. Anand did'nt mention timings. I can only assume they are the same since he did'nt mention them at DDR400 and DDR480 respectivly... Which is faster? Who knows really... My feeling is if he let DDR400 at low latency it's capable of while DDR480 had high latency which it runs you would see neligible differences. Again not enough information...
  • Diasper - Friday, August 12, 2005 - link

    That's probably larger correct. I suspect they'll be running a similiar setup to before (http://www.anandtech.com/cpuchipsets/showdoc.aspx?...">http://www.anandtech.com/cpuchipsets/showdoc.aspx?... where they were running 2 x 512MB sticks that could do 2-2-2 timings all the way upto DDR500 or so.

    But yeah, can we get any clarification on that please - it's appalling that you didn't include your test system criteria although we can probably guess and trust it was done correctly.
  • Zebo - Friday, August 12, 2005 - link

    Yeah that VX stuff is most excellente.. The review I *really* want to see is how well DDR2 667 on M2 competes with say DDR 500 with it's new found low latency.. I have my money on "old tech":P
  • Diasper - Friday, August 12, 2005 - link

    Interestingly, looking at the results:

    For a 20% increase in memory speed we saw upto 10% increase in speed (approx) suggesting that X2 is bandwidth confined at least 10% when running full tilt so you'd be looking to be running at least DDR440 speeds or otherwise be risking lessened performance.

    Of course, given the uneveness of memory requests from both processors, I guess we could presume they would benefit with more memory speed although benefits would lessen above a certain speed (eg the guesstimate DDR440) as it is unlikely that you'll typically come across a scenario where both processors are demanding maximum memory bandwidth at the exact same moment.

    I guess that's speculation at best - but unless your an engineer that's about all you can do...
  • Spacecomber - Friday, August 12, 2005 - link

    I think we can assume that it is the same set up as with the first article, as the previous poster suggested.

    From the article:

    quote:

    The details of how to select these speeds and the hardware we used to do so can be found in our first article .


    Space
  • Diasper - Friday, August 12, 2005 - link

    [q}http://www.anandtech.com/cpuchipsets/showdoc.aspx?...">http://www.anandtech.com/cpuchipsets/showdoc.aspx?...


    Ah, well I probably shouldn't skip over stuff so quickly to get to the results - however why when in the previous test was the memory run at DDR500 now run here is only run at DDR480?

    That rather nullifies the comparative significance of the test as the same test wasn't run. :/

Log in

Don't have an account? Sign up now