XDMA: Improving Crossfire

Over the past year or so a lot of noise has been made over AMD’s Crossfire scaling capabilities, and for good reason. With the evolution of frame capture tools such as FCAT it finally became possible to easily and objectively measure frame delivery patterns.  The results of course weren’t pretty for AMD, showcasing that Crossfire may have been generating plenty of frames, but in most cases it was doing a very poor job of delivering them.

AMD for their part doubled down on the situation and began rolling out improvements in a plan that would see Crossfire improved in multiple phases. Phase 1, deployed in August, saw a revised Crossfire frame pacing scheme implemented for single monitor resolutions (2560x1600 and below) which generally resolved AMD’s frame pacing in those scenarios. Phase 2, which is scheduled for next month, will address multi-monitor and high resolution scaling, which faces a different set of problems and requires a different set of fixes than what went into phase 1.

The fact that there’s even a phase 2 brings us to our next topic of discussion, which is a new hardware DMA engine in GCN 1.1 parts called XDMA. Being first utilized on Hawaii, XDMA is the final solution to AMD’s frame pacing woes, and in doing so it is redefining how Crossfire is implemented on 290X and future cards. Specifically, AMD is forgoing the Crossfire Bridge Interconnect (CFBI) entirely and moving all inter-GPU communication over the PCIe bus, with XDMA being the hardware engine that makes this both practical and efficient.

But before we get too far ahead of ourselves, it would be best to put the current Crossfire situation in context before discussing how XDMA deviates from it.

In AMD’s current CFBI implementation, which itself dates back to the X1900 generation, a CFBI link directly connects two GPUs and has 900MB/sec of bandwidth. In this setup the purpose of the CFBI link is to transfer completed frames to the master GPU for display purposes, and to so in a direct GPU-to-GPU manner to complete the job as quickly and efficiently as possible.

For single monitor configurations and today’s common resolutions the CFBI excels at its task. AMD’s software frame pacing algorithms aside, the CFBI has enough bandwidth to pass around complete 2560x1600 frames at over 60Hz, allowing the CFBI to handle the scenarios laid out in AMD’s phase 1 frame pacing fix.

The issue with the CFBI is that while it’s an efficient GPU-to-GPU link, it hasn’t been updated to keep up with the greater bandwidth demands generated by Eyefinity, and more recently 4K monitors. For a 3x1080p setup frames are now just shy of 20MB/each, and for a 4K setup frames are larger still at almost 24MB/each. With frames this large CFBI doesn’t have enough bandwidth to transfer them at high framerates – realistically you’d top out at 30Hz or so for 4K – requiring that AMD go over the PCIe bus for their existing cards.

Going over the PCIe bus is not in and of itself inherently a problem, but pre-GCN 1.1 hardware lacks any specialized hardware to help with the task. Without an efficient way to move frames, and specifically a way to DMA transfer frames directly between the cards without involving CPU time, AMD has to resort to much uglier methods of moving frames between the cards, which are in part responsible for the poor frame pacing we see today on Eyefinity/4K setups.

CFBI Crossfire At 4K: Still Dropping Frames

For GCN 1.1 and Hawaii in particular, AMD has chosen to solve this problem by continuing to use the PCIe bus, but by doing so with hardware dedicated to the task. Dubbed the XDMA engine, the purpose of this hardware is to allow CPU-free DMA based frame transfers between the GPUs, thereby allowing AMD to transfer frames over the PCIe bus without the ugliness and performance costs of doing so on pre-GCN 1.1 cards.

With that in mind, the specific role of the XDMA engine is relatively simple. Located within the display controller block (the final destination for all completed frames) the XDMA engine allows the display controllers within each Hawaii GPU to directly talk to each other and their associated memory ranges, bypassing the CPU and large chunks of the GPU entirely. Within that context the purpose of the XDMA engine is to be a dedicated DMA engine for the display controllers and nothing more. Frame transfers and frame presentations are still directed by the display controllers as before – which in turn are directed by the algorithms loaded up by AMD’s drivers – so the XDMA engine is not strictly speaking a standalone device, nor is it a hardware frame pacing device (which is something of a misnomer anyhow). Meanwhile this setup also allows AMD to implement their existing Crossfire frame pacing algorithms on the new hardware rather than starting from scratch, and of course to continue iterating on those algorithms as time goes on.

Of course by relying solely on the PCIe bus to transfer frames there are tradeoffs to be made, both for the better and for the worse. The benefits are of course the vast increase in memory bandwidth (PCIe 3.0 x16 has 16GB/sec available versus .9GB/sec for CFBI) not to mention allowing Crossfire to be implemented without those pesky Crossfire bridges. The downside to relying on the PCIe bus is that it’s not a dedicated, point-to-point connection between GPUs, and for that reason there will bandwidth contention, and the latency for using the PCIe bus will be higher than the CFBI. How much worse depends on the configuration; PCIe bridge chips for example can both improve and worsen latency depending on where in the chain the bridges and the GPUs are located, not to mention the generation and width of the PCIe link. But, as AMD tells us, any latency can be overcome by measuring it and thereby planning frame transfers around it to take the impact of latency into account.

Ultimately AMD’s goal with the XDMA engine is to make PCIe based Crossfire just as efficient, performant, and compatible as CFBI based Crossfire, and despite the initial concerns we had over the use of the PCIe bus, based on our test results AMD appears to have delivered on their promises.

The XDMA engine alone can’t eliminate the variation in frame times, but in its first implementation it’s already as good as CFBI in single monitor setups, and being free of the Eyefinity/4K frame pacing issues that still plague CFBI, is nothing short of a massive improvement over CFBI in those scenarios. True to their promises, AMD has delivered a PCie based Crossfire implementation that incurs no performance penalty versus CFBI, and on the whole fully and sufficiently resolves AMD’s outstanding frame pacing issues. The downside of course is that XDMA won’t help the 280X or other pre-GCN 1.1 cards, but at the very least going forward AMD finally has demonstrated that they have frame pacing fully under control.

On a side note, looking at our results it’s interesting to see that despite the general reuse of frame pacing algorithms, the XDMA Crossfire implementation doesn’t exhibit any of the distinct frame time plateaus that the CFBI implementation does. The plateaus were more an interesting artifact than a problem, but it does mean that AMD’s XDMA Crossfire implementation is much more “organic” like NVIDIA’s, rather than strictly enforcing a minimum frame time as appeared to be the case with CFBI.

Hawaii: Tahiti Refined PowerTune: Improved Flexibility & Fan Speed Throttling
Comments Locked

396 Comments

View All Comments

  • TheJian - Friday, October 25, 2013 - link

    LOL. Tell that to both their bottom lines. I see AMD making nothing while NV profits. People who bought titan got a $2500 Tesla for $1000. You don't buy a titan just to game (pretty dumb if you did) as it's for pro apps too (the compute part of the deal). It's a steal for gamers who make money on the card too. Saving $1500 is a great deal. So since you're hating on NV pricing, how do you feel about the 7990 at $1000. Nice to leave that out of your comment fanboy ;) Will AMD now reap what they sow and have to deal with all the angry people who bought those? ROFL. Is the 1K business model unsustainable for AMD too? Even the 6990 came in at $700 a ways back. Dual or single chip the $1000 price is alive and well from either side for those who want it.

    I'd bet money a titan ultra will be $1000 again shortly if they even bother as it's not a pure gamer card but much more already. If you fire up pro apps with Cuda you'll smoke that 290x daily (which covers just about all pro apps). Let me know when AMD makes money in a quarter that NVDA loses money. Then you can say NV pricing is biting them in the A$$. Until then, your comment is ridiculous. Don't forget even as ryan points out in this article (and he don't love NV...LOL), AMD still has driver problems (and has for ages) but he believes in AMD much like fools do in Obama still...LOL. For me, even as an 5850 owner, they have to PROVE themselves before I ponder another card from them at 20nm. The 290x is hot, noisy and uses far more watts and currently isn't coming with 3 AAA games either. NV isn't shaking in their boots. I'll be shocked if 780TI isn't $600 or above as it should match Titan which 290x doesn't do even with the heat, noise and watts.

    And you're correct no OC room. Nobody has hit above 1125.

    If NV was greedy, wouldn't they be making MORE money than in 2007? They haven't cracked 850mil in 5 years. Meanwhile, AMD's pricing which you seem to love, has cause their entire business to basically fail (no land, no fabs, gave up cpu race, 8 months to catch up with a hot noisy chip etc). They have lost over $6B in the last 10yrs. AMD has idiots managing their company and they are destroying what used to be a GREAT company with GREAT products. They should have priced this card $100 higher and all other cards rebadged should be $50 higher. They might make some actual money then every quarter right? Single digit margins on console chips (probably until 20nm shrink) won't get you rich either. Who made that deal? FIRE THAT GUY. That margin is why NV said it wasn't worth it.
  • chizow - Saturday, October 26, 2013 - link

    AMD's non-profitability goes far beyond their GPU business, it's more due to their CPU business. People who got Titan didn't get a Tesla for $1000, they got a Tesla without ECC. Compute apps without ECC would be like second-guessing every result because you're unsure whether a value was stored/retrieved from memory correctly. Regard to 7990 pricing, you can surely look it up before pulling the fanboy card, just as you can look up my comments on 7970 launch pricing. And yes, AMD will absolutely have to deal with that backlash giving that card dropped even more precipitously than even Titan, going from $1K to $600 in only 4-5 months.

    I don't think Nvidia will make the same mistake with a Titan Ultra at $1K. I also don't think Nvidia fans who only bought Titan for gaming will fall for the same mistake 2x. If Maxwell comes out and Nvidia holds out on the big ASIC, I doubt anyone disinterested in compute will fall for the same trick if Nvidia launches a Titan 2 at $1K using a compute gimmick to justify the price. They will just point to Titan and say "wait 3 months and they'll release something that's 95% of it's performance at 65% of it's price". As they say, Fool me once, shame on you, fool me twice, shame on me.

    And no, greed and profit don't go hand in hand. In 2007-2008, Nvidia posted record profits and revenue for multiple consecutive quarters as you stated on the back of a cheap $230-$270 8800GT. With Titan, they reversed course by setting record margins, but on reduced revenue and profits. They basically covet Intel's huge profit margins, but they clearly lack the revenue to grow their bottomline. Selling $1K GPUs certainly isn't going to get them there any faster.
  • FragKrag - Thursday, October 24, 2013 - link

    great performance, but I'll wait until I see some better thermals/noise from aftermarket coolers :p
  • Shark321 - Thursday, October 24, 2013 - link

    As with the Titan in the beginning, no alternate coolers will be available for the time being (according to computerbase). This means even if the price is great, you will be stuck with a very noisy and hot card. 780Ti will outperform the 290x in 3 weeks. It remains to bee seen how it will be priced (I guess $599).
  • The Von Matrices - Thursday, October 24, 2013 - link

    This is the next GTX 480 or HD 2900 XT. It provides great performance for the price, that is if you can put up with the heat and noise.
  • KaosFaction - Thursday, October 24, 2013 - link

    Work in Progress!!! Whhhhaaaattttt I want answers now!!
  • masterpine - Thursday, October 24, 2013 - link

    Good to see something from AMD challenging the GK110's, I still find it fairly remarkable that in the fast moving world of GPU's it's taken 12 months for AMD to build something to compete. Hopefully this puts a swift end to the above $600 prices in the single GPU high end.

    More than a little concerned at the 95C target temp of these things. 80C is toasty enough already for the GTX780, actually had to point a small fan at the DVI cables coming out the back of my 780 SLI surround setup because the heat coming out the back of them was causing dramas. Not sure i could cope with the noise of a 290X either.

    Anyhow, this is great for consumers. Hope to see some aftermarket coolers reign these things in a bit. If the end result is both AMD and Nvidia playing hard-ball at the $500 mark in a few weeks time, we all win.
  • valkyrie743 - Thursday, October 24, 2013 - link

    HOLY TEMPS BATMAN. its the new gtx 480 in the temp's department
  • kallogan - Thursday, October 24, 2013 - link

    No overclocking headroom with stock cooler. That's for sure.
  • FuriousPop - Thursday, October 24, 2013 - link

    can we please see 2x in CF mode with eyefinity!? or am i asking for too much?

    also, Nvidia will always be better for those of you in the 30% department of having only max 1080p. for the rest of us in 1440p and 1600p and beyond (eyefinity) then AMD will be as stated by previous comments in this thread "King of the hill"....

    but none the less, some more testing in the CF+3x monitor department would be great to see how far this puppy really goes...

    i mean seriously whats the point of putting a 80 year old man behind the wheel of the worlds fastest car?!? please push the specs on gaming benchmarks pls (eg; higher res)

Log in

Don't have an account? Sign up now