XDMA: Improving Crossfire

Over the past year or so a lot of noise has been made over AMD’s Crossfire scaling capabilities, and for good reason. With the evolution of frame capture tools such as FCAT it finally became possible to easily and objectively measure frame delivery patterns.  The results of course weren’t pretty for AMD, showcasing that Crossfire may have been generating plenty of frames, but in most cases it was doing a very poor job of delivering them.

AMD for their part doubled down on the situation and began rolling out improvements in a plan that would see Crossfire improved in multiple phases. Phase 1, deployed in August, saw a revised Crossfire frame pacing scheme implemented for single monitor resolutions (2560x1600 and below) which generally resolved AMD’s frame pacing in those scenarios. Phase 2, which is scheduled for next month, will address multi-monitor and high resolution scaling, which faces a different set of problems and requires a different set of fixes than what went into phase 1.

The fact that there’s even a phase 2 brings us to our next topic of discussion, which is a new hardware DMA engine in GCN 1.1 parts called XDMA. Being first utilized on Hawaii, XDMA is the final solution to AMD’s frame pacing woes, and in doing so it is redefining how Crossfire is implemented on 290X and future cards. Specifically, AMD is forgoing the Crossfire Bridge Interconnect (CFBI) entirely and moving all inter-GPU communication over the PCIe bus, with XDMA being the hardware engine that makes this both practical and efficient.

But before we get too far ahead of ourselves, it would be best to put the current Crossfire situation in context before discussing how XDMA deviates from it.

In AMD’s current CFBI implementation, which itself dates back to the X1900 generation, a CFBI link directly connects two GPUs and has 900MB/sec of bandwidth. In this setup the purpose of the CFBI link is to transfer completed frames to the master GPU for display purposes, and to so in a direct GPU-to-GPU manner to complete the job as quickly and efficiently as possible.

For single monitor configurations and today’s common resolutions the CFBI excels at its task. AMD’s software frame pacing algorithms aside, the CFBI has enough bandwidth to pass around complete 2560x1600 frames at over 60Hz, allowing the CFBI to handle the scenarios laid out in AMD’s phase 1 frame pacing fix.

The issue with the CFBI is that while it’s an efficient GPU-to-GPU link, it hasn’t been updated to keep up with the greater bandwidth demands generated by Eyefinity, and more recently 4K monitors. For a 3x1080p setup frames are now just shy of 20MB/each, and for a 4K setup frames are larger still at almost 24MB/each. With frames this large CFBI doesn’t have enough bandwidth to transfer them at high framerates – realistically you’d top out at 30Hz or so for 4K – requiring that AMD go over the PCIe bus for their existing cards.

Going over the PCIe bus is not in and of itself inherently a problem, but pre-GCN 1.1 hardware lacks any specialized hardware to help with the task. Without an efficient way to move frames, and specifically a way to DMA transfer frames directly between the cards without involving CPU time, AMD has to resort to much uglier methods of moving frames between the cards, which are in part responsible for the poor frame pacing we see today on Eyefinity/4K setups.

CFBI Crossfire At 4K: Still Dropping Frames

For GCN 1.1 and Hawaii in particular, AMD has chosen to solve this problem by continuing to use the PCIe bus, but by doing so with hardware dedicated to the task. Dubbed the XDMA engine, the purpose of this hardware is to allow CPU-free DMA based frame transfers between the GPUs, thereby allowing AMD to transfer frames over the PCIe bus without the ugliness and performance costs of doing so on pre-GCN 1.1 cards.

With that in mind, the specific role of the XDMA engine is relatively simple. Located within the display controller block (the final destination for all completed frames) the XDMA engine allows the display controllers within each Hawaii GPU to directly talk to each other and their associated memory ranges, bypassing the CPU and large chunks of the GPU entirely. Within that context the purpose of the XDMA engine is to be a dedicated DMA engine for the display controllers and nothing more. Frame transfers and frame presentations are still directed by the display controllers as before – which in turn are directed by the algorithms loaded up by AMD’s drivers – so the XDMA engine is not strictly speaking a standalone device, nor is it a hardware frame pacing device (which is something of a misnomer anyhow). Meanwhile this setup also allows AMD to implement their existing Crossfire frame pacing algorithms on the new hardware rather than starting from scratch, and of course to continue iterating on those algorithms as time goes on.

Of course by relying solely on the PCIe bus to transfer frames there are tradeoffs to be made, both for the better and for the worse. The benefits are of course the vast increase in memory bandwidth (PCIe 3.0 x16 has 16GB/sec available versus .9GB/sec for CFBI) not to mention allowing Crossfire to be implemented without those pesky Crossfire bridges. The downside to relying on the PCIe bus is that it’s not a dedicated, point-to-point connection between GPUs, and for that reason there will bandwidth contention, and the latency for using the PCIe bus will be higher than the CFBI. How much worse depends on the configuration; PCIe bridge chips for example can both improve and worsen latency depending on where in the chain the bridges and the GPUs are located, not to mention the generation and width of the PCIe link. But, as AMD tells us, any latency can be overcome by measuring it and thereby planning frame transfers around it to take the impact of latency into account.

Ultimately AMD’s goal with the XDMA engine is to make PCIe based Crossfire just as efficient, performant, and compatible as CFBI based Crossfire, and despite the initial concerns we had over the use of the PCIe bus, based on our test results AMD appears to have delivered on their promises.

The XDMA engine alone can’t eliminate the variation in frame times, but in its first implementation it’s already as good as CFBI in single monitor setups, and being free of the Eyefinity/4K frame pacing issues that still plague CFBI, is nothing short of a massive improvement over CFBI in those scenarios. True to their promises, AMD has delivered a PCie based Crossfire implementation that incurs no performance penalty versus CFBI, and on the whole fully and sufficiently resolves AMD’s outstanding frame pacing issues. The downside of course is that XDMA won’t help the 280X or other pre-GCN 1.1 cards, but at the very least going forward AMD finally has demonstrated that they have frame pacing fully under control.

On a side note, looking at our results it’s interesting to see that despite the general reuse of frame pacing algorithms, the XDMA Crossfire implementation doesn’t exhibit any of the distinct frame time plateaus that the CFBI implementation does. The plateaus were more an interesting artifact than a problem, but it does mean that AMD’s XDMA Crossfire implementation is much more “organic” like NVIDIA’s, rather than strictly enforcing a minimum frame time as appeared to be the case with CFBI.

Hawaii: Tahiti Refined PowerTune: Improved Flexibility & Fan Speed Throttling
Comments Locked

396 Comments

View All Comments

  • Sandcat - Thursday, October 24, 2013 - link

    Perhaps they knew it was unsustainable from the beginning, but short term gains are generally what motivate managers when the develop pricing strategies, because bonus. Make hay whilst the sun shines, or when AMD is 8 months late.
  • chizow - Saturday, October 26, 2013 - link

    Possibly, but now they have to deal with the damaged goodwill of some of their most enthusiastic, spendy customers. I can't count how many times I've seen it, someone saying they swore off company X or company Y because they felt they got burned/screwed/fleeced by a single transaction. That is what Nvidia will be dealing with going forward with Titan early adopters.
  • Sancus - Thursday, October 24, 2013 - link

    AMD really needs to do better than a response 8 months later to crash anyone's parade. And honestly, I would love to see them put up a fight with Maxwell at a reasonable time period so they have incentive to keep prices lower. Otherwise, expect Nvidia to "overprice" things next generation as well.

    When they have no competition for 8 months it's not unsustainable to price as high as the market will bear, and there's no real evidence that Titan was economically overpriced because it's not like there was a supply glut of Titans sitting around anywhere, in fact they were often out of stock. So really, Nvidia is just pricing according to the market -- no competition from AMD for 8 months, fastest card with limited supply, why WOULD they price it at anything below $1000?
  • chizow - Saturday, October 26, 2013 - link

    My reply would be that they've never had to price it at $1000 before, and we have certainly seen this level of advancement from one generation to the next in the past (7900GTX to 8800GTX, 8800GTX to GTX 280, 280 GTX to 480 GTX, etc), so it's not completely ground-breaking performance increases even though Kepler overall outperformed historical improvements by ~20%, imo.

    Also, the concern with Titan isn't just the fact it was priced at ungodly premiums this time around, it's the fact it held it's crown for such a relatively short period of time. Sure Nvidia had no competition at the $500+ range for 8 months, but that was also the brevity of Titan's reign at the top. In the past, a flagship in that $500 or $600+ range would generally reign for the entire generation, especially one that was launched half way through that generation's life cycle. Now Nvidia has already announced a reply with the 780 Ti which will mean not one, but TWO cards will surpass Titan at a fraction of it's price before the generation goes EOL.

    Nvidia was clearly blind-sided by Hawaii and ultimately it will cost them customer loyalty, imo.
  • ZeDestructor - Thursday, October 24, 2013 - link

    $1000 cards are fine, since the Titan is a cheap compute unit compared to the Quadro K6000 and the 690 is a dual-GPU card (Dual-GPU has always been in the $800+ range).

    What we should see is the 780 (Ti?) go down in price and match the R9-290x, much to the rejoicing of all!

    Nvidia got away with $650-750 on the 780 because they could, and THAT is why competition is important, and why I pay attention to AMD even if I have no reason to buy from them over Nvidia (driver support on Linux is a joke). Now they have to match. Much of the same happens in the CPU segement.
  • chizow - Saturday, October 26, 2013 - link

    For those that actually bought the Titan as a cheap compute card, sure Titan may have been a good buy, but I doubt most Titan buyers were buying it for compute. It was marketed as a gaming card with supercomputer guts and at the time, there was still much uncertainty whether or not Nvidia would release a GTX gaming card based on GK110.

    I think Nvidia preyed on these fears and took the opportunity to launch a $1K part, but I knew it was an unsustainable business model for them because it was predicated on the fact Nvidia would be an entire ASIC ahead of AMD and able to match AMD's fastest ASIC (Tahiti) with their 2nd fastest (GK104). Clearly Hawaii has turned that idea on it's head and Nvidia's premium product stack is crashing down in flames.

    Now, we will see at least 4 cards (290/290X, 780/780Ti) that all come close to or exceed Titan performance at a fraction of the price, only 8 months after it's launch. Short reign indeed.
  • TheJian - Friday, October 25, 2013 - link

    The market dictates pricing. As they said, they sell every Titan immediately, so they could probably charge more. But that's because it has more value than you seem to understand. It is a PRO CARD at it's core. Are you unaware of what a TESLA is for $2500? It's the same freaking card with 1 more SMX and driver support. $1000 is GENEROUS whether you like it or not. Gamers with PRO intentions laughed when they saw the $1000 price and have been buying them like mad ever since. No parade has been crashed. They will continue to do this pricing model for the foreseeable future as they have proven there is a market for high-end gamers with a PRO APP desire on top. The first run was 100,000 and sold in days. By contrast Asus Rog Ares 2 had 1000 unit first run and didn't sell out like that. At $1500 it really was a ripoff with no PRO side.

    I think they'll merely need another SMX turned on and 50-100mhz for the next $1000 version which likely comes before xmas :) The PRO perf is what is valued here over a regular card. Your short-lived statement makes no sense. It's been 8 months, a rather long life in gpus when you haven't beaten the 8 month old card in much (I debunked 4k crap already, and pointed to a dozen other games where titan wins at every res). You won't fire up Blender, Premiere, PS CS etc and smoke a titan with 290x either...LOL. You'll find out what the other $450 is for at that point.
  • chizow - Saturday, October 26, 2013 - link

    Yes and as soon as they released the 780, the market corrected itself and Titans were no longer sold out anywhere, clearly a shift indicating the price of the 780 was really what the market was willing to bear.

    Also, there are more differences with their Tesla counterparts than just 1 SMX, Titan lacks ECC support which makes it an unlikely candidate for serious compute projects. Titan is good for hobby compute, anything serious business or research related is going to spend the extra for Tesla and ECC.

    And no, 8-months is not a long time at the top, look at the reigns of previous high-end parts and you will see it is generally longer than this. Even the 580 that preceded it held sway for 14-months before Tahiti took over it's spot. Time at the top is just one part though, the amount which Titan devalued is the bigger concern. When 780 launched 3 months after Titan, you could maybe sell Titan for $800. Now that Hawaii has launched, you could maybe sell it for $700? It's only going to keep going down, what do you think it will sell for once 780Ti beats it outright for $650 or less?
  • Sandcat - Thursday, October 24, 2013 - link

    I noticed your comments on the Tahiti pricing fiasco 2 years ago and generally skip through the comment section to find yours because they're top notch. Exactly what I was thinking with the $550 price point, finally a top-tier card at the right price for 28nm. Long live sanity.
  • chizow - Saturday, October 26, 2013 - link

    Thanks! Glad you appreciated the comments, I figured this business model and pricing for Nvidia would be unsustainable, but I thought it wouldn't fall apart until we saw 20nm Maxwell/Pirate Islands parts in 2014. Hawaii definitely accelerated the downfall of Titan and Nvidia's $1K eagle's nest.

Log in

Don't have an account? Sign up now