XDMA: Improving Crossfire

Over the past year or so a lot of noise has been made over AMD’s Crossfire scaling capabilities, and for good reason. With the evolution of frame capture tools such as FCAT it finally became possible to easily and objectively measure frame delivery patterns.  The results of course weren’t pretty for AMD, showcasing that Crossfire may have been generating plenty of frames, but in most cases it was doing a very poor job of delivering them.

AMD for their part doubled down on the situation and began rolling out improvements in a plan that would see Crossfire improved in multiple phases. Phase 1, deployed in August, saw a revised Crossfire frame pacing scheme implemented for single monitor resolutions (2560x1600 and below) which generally resolved AMD’s frame pacing in those scenarios. Phase 2, which is scheduled for next month, will address multi-monitor and high resolution scaling, which faces a different set of problems and requires a different set of fixes than what went into phase 1.

The fact that there’s even a phase 2 brings us to our next topic of discussion, which is a new hardware DMA engine in GCN 1.1 parts called XDMA. Being first utilized on Hawaii, XDMA is the final solution to AMD’s frame pacing woes, and in doing so it is redefining how Crossfire is implemented on 290X and future cards. Specifically, AMD is forgoing the Crossfire Bridge Interconnect (CFBI) entirely and moving all inter-GPU communication over the PCIe bus, with XDMA being the hardware engine that makes this both practical and efficient.

But before we get too far ahead of ourselves, it would be best to put the current Crossfire situation in context before discussing how XDMA deviates from it.

In AMD’s current CFBI implementation, which itself dates back to the X1900 generation, a CFBI link directly connects two GPUs and has 900MB/sec of bandwidth. In this setup the purpose of the CFBI link is to transfer completed frames to the master GPU for display purposes, and to so in a direct GPU-to-GPU manner to complete the job as quickly and efficiently as possible.

For single monitor configurations and today’s common resolutions the CFBI excels at its task. AMD’s software frame pacing algorithms aside, the CFBI has enough bandwidth to pass around complete 2560x1600 frames at over 60Hz, allowing the CFBI to handle the scenarios laid out in AMD’s phase 1 frame pacing fix.

The issue with the CFBI is that while it’s an efficient GPU-to-GPU link, it hasn’t been updated to keep up with the greater bandwidth demands generated by Eyefinity, and more recently 4K monitors. For a 3x1080p setup frames are now just shy of 20MB/each, and for a 4K setup frames are larger still at almost 24MB/each. With frames this large CFBI doesn’t have enough bandwidth to transfer them at high framerates – realistically you’d top out at 30Hz or so for 4K – requiring that AMD go over the PCIe bus for their existing cards.

Going over the PCIe bus is not in and of itself inherently a problem, but pre-GCN 1.1 hardware lacks any specialized hardware to help with the task. Without an efficient way to move frames, and specifically a way to DMA transfer frames directly between the cards without involving CPU time, AMD has to resort to much uglier methods of moving frames between the cards, which are in part responsible for the poor frame pacing we see today on Eyefinity/4K setups.

CFBI Crossfire At 4K: Still Dropping Frames

For GCN 1.1 and Hawaii in particular, AMD has chosen to solve this problem by continuing to use the PCIe bus, but by doing so with hardware dedicated to the task. Dubbed the XDMA engine, the purpose of this hardware is to allow CPU-free DMA based frame transfers between the GPUs, thereby allowing AMD to transfer frames over the PCIe bus without the ugliness and performance costs of doing so on pre-GCN 1.1 cards.

With that in mind, the specific role of the XDMA engine is relatively simple. Located within the display controller block (the final destination for all completed frames) the XDMA engine allows the display controllers within each Hawaii GPU to directly talk to each other and their associated memory ranges, bypassing the CPU and large chunks of the GPU entirely. Within that context the purpose of the XDMA engine is to be a dedicated DMA engine for the display controllers and nothing more. Frame transfers and frame presentations are still directed by the display controllers as before – which in turn are directed by the algorithms loaded up by AMD’s drivers – so the XDMA engine is not strictly speaking a standalone device, nor is it a hardware frame pacing device (which is something of a misnomer anyhow). Meanwhile this setup also allows AMD to implement their existing Crossfire frame pacing algorithms on the new hardware rather than starting from scratch, and of course to continue iterating on those algorithms as time goes on.

Of course by relying solely on the PCIe bus to transfer frames there are tradeoffs to be made, both for the better and for the worse. The benefits are of course the vast increase in memory bandwidth (PCIe 3.0 x16 has 16GB/sec available versus .9GB/sec for CFBI) not to mention allowing Crossfire to be implemented without those pesky Crossfire bridges. The downside to relying on the PCIe bus is that it’s not a dedicated, point-to-point connection between GPUs, and for that reason there will bandwidth contention, and the latency for using the PCIe bus will be higher than the CFBI. How much worse depends on the configuration; PCIe bridge chips for example can both improve and worsen latency depending on where in the chain the bridges and the GPUs are located, not to mention the generation and width of the PCIe link. But, as AMD tells us, any latency can be overcome by measuring it and thereby planning frame transfers around it to take the impact of latency into account.

Ultimately AMD’s goal with the XDMA engine is to make PCIe based Crossfire just as efficient, performant, and compatible as CFBI based Crossfire, and despite the initial concerns we had over the use of the PCIe bus, based on our test results AMD appears to have delivered on their promises.

The XDMA engine alone can’t eliminate the variation in frame times, but in its first implementation it’s already as good as CFBI in single monitor setups, and being free of the Eyefinity/4K frame pacing issues that still plague CFBI, is nothing short of a massive improvement over CFBI in those scenarios. True to their promises, AMD has delivered a PCie based Crossfire implementation that incurs no performance penalty versus CFBI, and on the whole fully and sufficiently resolves AMD’s outstanding frame pacing issues. The downside of course is that XDMA won’t help the 280X or other pre-GCN 1.1 cards, but at the very least going forward AMD finally has demonstrated that they have frame pacing fully under control.

On a side note, looking at our results it’s interesting to see that despite the general reuse of frame pacing algorithms, the XDMA Crossfire implementation doesn’t exhibit any of the distinct frame time plateaus that the CFBI implementation does. The plateaus were more an interesting artifact than a problem, but it does mean that AMD’s XDMA Crossfire implementation is much more “organic” like NVIDIA’s, rather than strictly enforcing a minimum frame time as appeared to be the case with CFBI.

Hawaii: Tahiti Refined PowerTune: Improved Flexibility & Fan Speed Throttling
Comments Locked

396 Comments

View All Comments

  • kyuu - Friday, October 25, 2013 - link

    I agree. Ignore at all the complainers; it's great to have the benchmark data available without having to wait for all the rest of the article to be complete. Those who don't want anything at all until it's 100% done can always just come back later.
  • AnotherGuy - Friday, October 25, 2013 - link

    What a beast
  • zodiacsoulmate - Friday, October 25, 2013 - link

    Donno, all the geforce cards looks like sh!t in this review, and 280x/7970 290x looks like haven's god...
    but my 6990 7970 never really make me happier than my gtx 670 system...
    well, whatever
  • TheJian - Friday, October 25, 2013 - link

    While we have a great card here, it appears it doesn't always beat 780, and gets toppled consistently by Titan in OTHER games:
    http://www.techpowerup.com/reviews/AMD/R9_290X/24....
    World of Warcraft (spanked again all resolutions by both 780/titan even at 5760x1080)
    Splinter Cell Blacklist (smacked by 780 even, of course titan)
    StarCraft 2 (by both 780/titan, even 5760x1080)
    Titan adds more victories (780 also depending on res, remember 98.75% of us run 1920x1200 or less):
    Skyrim (all res, titan victory at techpowerup) Ooops, 780 wins all res but 1600p also skyrim.
    Assassins creed3, COD Black Ops2, Diablo3, FarCry3 (though uber ekes a victory at 1600p, reg gets beat handily in fc3, however hardocp shows 780 & titan winning apples-apples min & avg, techspot shows loss to 780/titan also in fc3)
    Hardocp & guru3d both show Bioshock infinite, Crysis 3 (titan 10% faster all res) and BF3 winning on Titan. Hardocp also show in apples-apples Tombraider and MetroLL winning on titan.
    http://www.guru3d.com/articles_pages/radeon_r9_290...
    http://hardocp.com/article/2013/10/23/amd_radeon_r...
    http://techreport.com/review/25509/amd-radeon-r9-2...
    Guild wars 2 at techreport win for both 780/titan big also (both over 12%).
    Also tweaktown shows lost planet 2 loss to the lowly 770, let alone 780/titan.
    I guess there's a reason why most of these quite popular games are NOT tested here :)

    So while it's a great card, again not overwhelming and quite the loser depending on what you play. In UBER mode as compared above I wouldn't even want the card (heat, noise, watts loser). Down it to regular and there are far more losses than I'm listing above to 780 and titan especially. Considering the overclocks from all sites, you are pretty much getting almost everything in uber mode (sites have hit 6-12% max for OCing, I think that means they'll be shipping uber as OC cards, not much more). So NV just needs to kick up 780TI which should knock out almost all 290x uber wins, and just make the wins they already have even worse, thus keeping $620-650 price. Also drop 780 to $500-550 (they do have great games now 3 AAA worth $100 or more on it).

    Looking at 1080p here (a res 98.75% of us play at 1920x1200 or lower remember that), 780 does pretty well already even at anandtech. Most people playing above this have 2 cards or more. While you can jockey your settings around all day per game to play above 1920x1200, you won't be MAXING much stuff out at 1600p with any single card. It's just not going to happen until maybe 20nm (big maybe). Most of us don't have large monitors YET or 1600p+ and I'm guessing all new purchases will be looking at gsync monitors now anyway. Very few of us will fork over $550 and have the cash for a new 1440p/1600p monitor ALSO. So a good portion of us would buy this card and still be 1920x1200 or lower until we have another $550-700 for a good 1440/1600p monitor (and I say $550+ since I don't believe in these korean junk no-namers and the cheapest 1440p newegg itself sells is $550 acer). Do you have $1100 in your pocket? Making that kind of monitor investment right now I wait out Gsync no matter what. If they get it AMD compatible before 20nm maxwell hits, maybe AMD gets my money for a card. Otherwise Gsync wins hands down for NV for me. I have no interest in anything but a Gsync monitor at this point and a card that works with it.

    Guru3D OC: 1075/6000
    Hardwarecanucks OC: 1115/5684
    Hardwareheaven OC: 1100/5500
    PCPerspective OC: 1100/5000
    TweakTown OC: 1065/5252
    TechpowerUp OC: 1125/6300
    Techspot OC: 1090/6400
    Bit-tech OC: 1120/5600
    Left off direct links to these sites regarding OCing but I'm sure you can all figure out how to get there (don't want post flagged as spam with too many links).
  • b3nzint - Friday, October 25, 2013 - link

    "So NV just needs to kick up 780TI which should knock out almost all 290x uber wins, and just make the wins they already have even worse, thus keeping $620-650 price. Also drop 780 to $500-550"

    we're talking about titan killer here.
    titan vs titan killer, at res 3840, at high or ultra :

    coh2 - 30%
    metro - 30%
    bio - (10%) but win 3% at medium
    bf3 - 15%
    crysis 3 - tie
    crysis - 10
    totalwar - tie
    hitman - 20%
    grid 2 - 10%+

    2816 sp, 64rop, 176tmu, 4gb 512bit. 780 or 780ti won't stand a chance. this is titan killer dude wake up. only then then we're talking CF, SLi and res 5760. But for single card i go for this titan killer. good luck with gsync, im not gave up my dell u2711 yet.
  • just4U - Friday, October 25, 2013 - link

    Well.. you have to put this in context. Those guys gave it their editor's choice award and a overall score of 9.3 They summed it up with this..

    "
    The real highlight of AMD's R9 290X is certainly the price. What has been rumored to cost around $700 (and got people excited at that price), will actually retail for $549! $549 is an amazing price for this card, making it the price/performance king in the high-end segment. NVIDIA's $1000 GTX Titan is completely irrelevant now, even the GTX 780 with its $625 price will be a tough sale."
  • theuglyman0war - Thursday, October 31, 2013 - link

    the flagship gtx *80 $msrp has been $499 for every upgrade I have ever made. After waiting out the 104 fer the 110 chip only to have the insult of the previous 780 pricing meant I will be holding off to see if everything returns to normal with Maxwell. Kind of depressing when others are excited for $550? As far as I know the market still dictates pricing and my price iz $499 if AMD is offering up decent competition to keep the market healthy and respectful.
  • ToTTenTranz - Friday, October 25, 2013 - link

    How isn't this viral?
  • nader21007 - Friday, October 25, 2013 - link

    Radeon R9 290X received Tom’s Hardware’s Elite award—the first time a graphics card has received this honor. Nvidia: Why?
    Wiseman: Because it Outperformed a card that is nearly double it's price (your Titan).
    Do you hear me Nvidia? Please don't gouge consumers again.
    Viva AMD.
  • doggghouse - Friday, October 25, 2013 - link

    I don't think the Titan was ever considered to be a gamer's card... it was more like "prosumer" card for compute. But it was also marketed to people who build EXTREME! machines for maximum OC scores. The 780 was basically the gamer's card... it has 90-95% of the Titan's gaming capability, but for only $650 (still expensive).

    If you want to compare the R9 290X to the Titan, I would look at the compute benchmarks. And in that, it seems to be an apples to oranges comparison... AMD and nVIDIA seem to trade blows depending on the type of compute.

    Compared to the 780, the 290X pretty much beats it hands down in performance. If I hadn't already purchased a 780 last month ($595 yay), I would consider the 290X... though I'd definitely wait for 3rd party cards with better heat solutions. A stock card on "Uber" setting is simply way too hot, and too loud!

Log in

Don't have an account? Sign up now