XDMA: Improving Crossfire

Over the past year or so a lot of noise has been made over AMD’s Crossfire scaling capabilities, and for good reason. With the evolution of frame capture tools such as FCAT it finally became possible to easily and objectively measure frame delivery patterns.  The results of course weren’t pretty for AMD, showcasing that Crossfire may have been generating plenty of frames, but in most cases it was doing a very poor job of delivering them.

AMD for their part doubled down on the situation and began rolling out improvements in a plan that would see Crossfire improved in multiple phases. Phase 1, deployed in August, saw a revised Crossfire frame pacing scheme implemented for single monitor resolutions (2560x1600 and below) which generally resolved AMD’s frame pacing in those scenarios. Phase 2, which is scheduled for next month, will address multi-monitor and high resolution scaling, which faces a different set of problems and requires a different set of fixes than what went into phase 1.

The fact that there’s even a phase 2 brings us to our next topic of discussion, which is a new hardware DMA engine in GCN 1.1 parts called XDMA. Being first utilized on Hawaii, XDMA is the final solution to AMD’s frame pacing woes, and in doing so it is redefining how Crossfire is implemented on 290X and future cards. Specifically, AMD is forgoing the Crossfire Bridge Interconnect (CFBI) entirely and moving all inter-GPU communication over the PCIe bus, with XDMA being the hardware engine that makes this both practical and efficient.

But before we get too far ahead of ourselves, it would be best to put the current Crossfire situation in context before discussing how XDMA deviates from it.

In AMD’s current CFBI implementation, which itself dates back to the X1900 generation, a CFBI link directly connects two GPUs and has 900MB/sec of bandwidth. In this setup the purpose of the CFBI link is to transfer completed frames to the master GPU for display purposes, and to so in a direct GPU-to-GPU manner to complete the job as quickly and efficiently as possible.

For single monitor configurations and today’s common resolutions the CFBI excels at its task. AMD’s software frame pacing algorithms aside, the CFBI has enough bandwidth to pass around complete 2560x1600 frames at over 60Hz, allowing the CFBI to handle the scenarios laid out in AMD’s phase 1 frame pacing fix.

The issue with the CFBI is that while it’s an efficient GPU-to-GPU link, it hasn’t been updated to keep up with the greater bandwidth demands generated by Eyefinity, and more recently 4K monitors. For a 3x1080p setup frames are now just shy of 20MB/each, and for a 4K setup frames are larger still at almost 24MB/each. With frames this large CFBI doesn’t have enough bandwidth to transfer them at high framerates – realistically you’d top out at 30Hz or so for 4K – requiring that AMD go over the PCIe bus for their existing cards.

Going over the PCIe bus is not in and of itself inherently a problem, but pre-GCN 1.1 hardware lacks any specialized hardware to help with the task. Without an efficient way to move frames, and specifically a way to DMA transfer frames directly between the cards without involving CPU time, AMD has to resort to much uglier methods of moving frames between the cards, which are in part responsible for the poor frame pacing we see today on Eyefinity/4K setups.

CFBI Crossfire At 4K: Still Dropping Frames

For GCN 1.1 and Hawaii in particular, AMD has chosen to solve this problem by continuing to use the PCIe bus, but by doing so with hardware dedicated to the task. Dubbed the XDMA engine, the purpose of this hardware is to allow CPU-free DMA based frame transfers between the GPUs, thereby allowing AMD to transfer frames over the PCIe bus without the ugliness and performance costs of doing so on pre-GCN 1.1 cards.

With that in mind, the specific role of the XDMA engine is relatively simple. Located within the display controller block (the final destination for all completed frames) the XDMA engine allows the display controllers within each Hawaii GPU to directly talk to each other and their associated memory ranges, bypassing the CPU and large chunks of the GPU entirely. Within that context the purpose of the XDMA engine is to be a dedicated DMA engine for the display controllers and nothing more. Frame transfers and frame presentations are still directed by the display controllers as before – which in turn are directed by the algorithms loaded up by AMD’s drivers – so the XDMA engine is not strictly speaking a standalone device, nor is it a hardware frame pacing device (which is something of a misnomer anyhow). Meanwhile this setup also allows AMD to implement their existing Crossfire frame pacing algorithms on the new hardware rather than starting from scratch, and of course to continue iterating on those algorithms as time goes on.

Of course by relying solely on the PCIe bus to transfer frames there are tradeoffs to be made, both for the better and for the worse. The benefits are of course the vast increase in memory bandwidth (PCIe 3.0 x16 has 16GB/sec available versus .9GB/sec for CFBI) not to mention allowing Crossfire to be implemented without those pesky Crossfire bridges. The downside to relying on the PCIe bus is that it’s not a dedicated, point-to-point connection between GPUs, and for that reason there will bandwidth contention, and the latency for using the PCIe bus will be higher than the CFBI. How much worse depends on the configuration; PCIe bridge chips for example can both improve and worsen latency depending on where in the chain the bridges and the GPUs are located, not to mention the generation and width of the PCIe link. But, as AMD tells us, any latency can be overcome by measuring it and thereby planning frame transfers around it to take the impact of latency into account.

Ultimately AMD’s goal with the XDMA engine is to make PCIe based Crossfire just as efficient, performant, and compatible as CFBI based Crossfire, and despite the initial concerns we had over the use of the PCIe bus, based on our test results AMD appears to have delivered on their promises.

The XDMA engine alone can’t eliminate the variation in frame times, but in its first implementation it’s already as good as CFBI in single monitor setups, and being free of the Eyefinity/4K frame pacing issues that still plague CFBI, is nothing short of a massive improvement over CFBI in those scenarios. True to their promises, AMD has delivered a PCie based Crossfire implementation that incurs no performance penalty versus CFBI, and on the whole fully and sufficiently resolves AMD’s outstanding frame pacing issues. The downside of course is that XDMA won’t help the 280X or other pre-GCN 1.1 cards, but at the very least going forward AMD finally has demonstrated that they have frame pacing fully under control.

On a side note, looking at our results it’s interesting to see that despite the general reuse of frame pacing algorithms, the XDMA Crossfire implementation doesn’t exhibit any of the distinct frame time plateaus that the CFBI implementation does. The plateaus were more an interesting artifact than a problem, but it does mean that AMD’s XDMA Crossfire implementation is much more “organic” like NVIDIA’s, rather than strictly enforcing a minimum frame time as appeared to be the case with CFBI.

Hawaii: Tahiti Refined PowerTune: Improved Flexibility & Fan Speed Throttling
Comments Locked

396 Comments

View All Comments

  • DMCalloway - Thursday, October 24, 2013 - link

    Once again, against the Titan it's $450 cheaper, not $100. Against the gtx 780 it is a wash on performance at a cheaper price point. Eight months late to the game I'll agree on, however it took time to get in bed with Sony and Micro$oft which was needed if they (AMD) ever hope to get to the point of being able to release 'at a competitive time'. I'm amazed that they are still viable after the financial losses they suffered with the whole Intel paying OEM's to not release their current cpu gen. along side AMD's business . Sure, AMD won the law suit but the financial losses in market share was in the billions , Intel jumped ahead a gen. and the damage was done. Realistically, I believe AMD chose wisely to focus on the console market because the 7970ghz pushed hard wasn't really that far behind a stock gtx780.
  • Bloodcalibur - Thursday, October 24, 2013 - link

    Ever wonder why the TItan costs $350 more than their own GTX 780 while having only a small margin of improvement?

    Oh, right, compute performance.
  • anubis44 - Thursday, October 24, 2013 - link

    and in some cases, the R9 290X is as much as 23% faster in 4K resolution than the Titan, or in the words of HardOCP: : "at Ultra HD 4K it (R9 290X) just owns the GeForce GTX TITAN."
  • Bloodcalibur - Thursday, October 24, 2013 - link

    Once again, Titan is a gaming/workstation hybrid, that's why it costs $350 more than their own GTX 780 with only a small FPS improvement in gaming.
  • TheJian - Friday, October 25, 2013 - link

    Depends on the games chosen. For instance All 4K:
    Guru3d:
    Tombraider tied 4k 40fps (they consider this BARELY playable-though advise 60fps)
    MOH Warfighter Titan wins 7%
    Bioshock Infinite Titan wins 10% (33fps to 30, but again not going to be playable min in teens?)
    BF3 TIE (32fps, again avg, so not playable)
    The only victory at 4K is Hitman absolution here. So clearly it depends on what your settings are and what games you play. Also note the fps at 4K at hardocp. They can't max settings and every game is a sacrifice of some stuff (or a lot). Even at 2560 Kyle notes all were unplayable with everything on with avg's at 22fps and mins 12fps for all 3 basically...ROFL. How useful is it to win (or even lose) at a res you can't play at?

    http://www.techpowerup.com/reviews/AMD/R9_290X/24....
    Techpowerup tests all the way to 5760x1080, quoting that unless not tested. Here we go again...LOL
    World of Warcraft domination for 780 & Titan (over 20% faster on titan 5760!)
    SKYRIM - Both Titan and 780 win 5760
    Starcraft2 only went to 2560 but again clean sweep for 780/Titan bot over 10%
    Splintercell blacklist clean sweep at 2560 & 5760 for Titan AND 780 (>20% for titan both res)
    Farcry3 (titan and 780 wins at 5760 but at 22fps who cares...LOL but 10% faster than 290x)
    black ops 2 (only went to 2560, but titan wins all res)
    Metro TIE (26fps, again neither playable)
    Crysis 3 Titan over 10% win (25fps vs. 22, but neither playable...LOL)

    At hardocp, metro, tombraider, bf3, and crysis 3 were all UNDER 25fps min on both cards with most coming in at 22fps or so on both. I wish they would benchmark at what they find is PLAYABLE, but even then I'm against 4K if I have to turn all kinds of stuff off in the first place. Only farcry3 was tested at above 30fps...LOL. You need TWO cards for 4K gaming. PERIOD. If you have the money to buy a 4K monitor or two monitors you probably have the cash to do it right and buy 2 cards. Steampowered survey shows this as most have 2 cards above 1920x1200! Bragging about 4K gaming on this card (or even titan) is ridiculous as it just ends up in an exercise of turning crap off that devs wanted me to SEE. I wouldn't care if 290x was 50% faster than Titan if you're running 22fps who cares? Neither is playable. You've proven NOTHING. If we jump off a 100 story building I'll beat you to the bottom...Yeah but umm...We're both still dead right? So what's the point no matter who wins that game?

    Funfact: techspot.com tombraider comment (2560/1080p both tested-4xSSAA+16af)
    "We expected AMD to do better in Tomb Raider since they supported the title's development, but the R9 290X was 12% slower than the GTX Titan and 3% slower than the GTX 780"
    LOL. I hope they do better with BF4 AMD enhancements. Resident Evil 6 shows titan win also.
    http://www.techspot.com/review/727-radeon-r9-290x/...

    Tomshardware 4K quote:
    "In Gaming At 3840x2160: Is Your PC Ready For A 4K Display?, I concluded that you’d want at least two GeForce GTX 780s for 4K. And although the R9 290X is faster than even the $1000 Titan, I maintain that you need a pair in order to crank your settings up to where they should be."
    That was their ARMA quote...But it applies to all 4K...TWO CARDS. But their benchmarks are really low compared to everyone else for Titan in the same games. It's like took 10-15% off Titan's scores. IE, Bioshock infinite at guru3d shows titan winning 10%, but at toms losing by 20% same game, same res...WTF? That's odd right? Skyrim shows NV domination at 4k (780 also). Almost 20% faster for Titan & 780 (they tied) over Uber. Of course they turned off ALL AA modes to get it playable. Again, you can't just judge 4K by one site's games. Clearly you can find the exact opposite at 4K and come back down to reality (a res you can actually play at above 30fps) and titan is smacking them in a ton of games (far more wins than losses). I could find a ton more if needed but you should get the point. TITAN isn't OWNED at 4K and usually when it is as toms says of Metro "the win is largely symbolic though", yeah at 30fps avg it is pointless even turned down!
  • bronopoly - Thursday, October 24, 2013 - link

    Why shouldn't one of the cards you mentioned be bought for 1080p? I don't know about you, but I prefer to get 120 FPS in games so it matches my monitor w/ lightboost enabled.
  • Bloodcalibur - Thursday, October 24, 2013 - link

    Except the Titan is a gaming/workstation hybrid due to its computing ability. Anyone who bought a Titan just for gaming is retarded and paid $350 more than they would have on a 780. Titan shouldn t be compared to 290X for gaming. Its a good card for those who do both gaming and a little bit of computing.
  • looncraz - Thursday, October 24, 2013 - link

    Install a new cooler and the last two of those problems vanish... and you've saved hundreds... you could afford to build a stand-alone water-cooling loop just for the 290x and still have money to spare for a nice dinner.
  • teiglin - Thursday, October 24, 2013 - link

    I haven't finished reading the article yet, but isn't that more than a little hyperbolic? It just means NVIDIA will have to cut back on the amount it gouges for GK110. The fact that it was able to leave the price high for so long is nearly all good for them--it's just a matter of how quickly they adjust their pricing to match.

    It will be nice to have a fair fight again at the high-end for a single card.
  • bill5 - Thursday, October 24, 2013 - link

    Heh, I'm the biggest AMD fanboy around, but these top two comments almost smell like marketing.

    It's a great card, and the Titan was deffo highly overpriced, but Nvidia can just make some adjustments on price and compete. That 780 Ti they showed will surely be something in that vein.

Log in

Don't have an account? Sign up now