A Brief History on Multi-GPU with Dissimilar GPUs

Before we dive into our results, let’s talk briefly about the history of efforts to render games with multiple, dissimilar GPUs. After the development of PCI Express brought about the (re)emergence of NVIDIA’s SLI and AMD’s CrossFire, both companies eventually standardized their multi-GPU rendering efforts across the same basic technology. Using alternate frame rendering (AFR), NVIDIA and AMD would have the GPUs in a multi-GPU setup each render a separate frame. With the drivers handing off frames to each GPU in an alternating manner, AFR was the most direct and most compatible way to offer multi-GPU rendering as it didn’t significantly disrupt the traditional game rendering paradigms. There would simply be two (or more) GPUs rendering frames instead of one, with much of the work abstracted by the GPU drivers.

Using AFR allowed for relatively rapid multi-GPU support, but it did come with tradeoffs as well. Alternating frames meant that inter-frame dependencies needed to be tracked and handled, which in turn meant that driver developers had to add support for games on a game-by-game basis. Furthermore the nature of distributing the work meant that care needed to be taken to ensure each GPU rendered at an even pace so that the resulting in-game motion was smooth, a problem AMD had to face head-on in 2013. Finally, because AFR had each GPU rendering whole frames, it worked best when GPUs were as identical as possible in performance; a performance gap would at best require a faster card to spend some time waiting on the slower card, and at worse exacerbate the aforementioned frame pacing issues. As a result NVIDIA only allows identical cards to be paired up in SLI, and AMD only allows a slightly wider variance (typically cards using the same GPU).

In 2010 LucidLogix set out to do one better, leveraging their graphics expertise to develop their Hydra technology. By using a combination of a hardware and software, Hydra could intercept DirectX and OpenGL calls and redistribute them to split up rendering over multiple, and for the first time, dissimilar GPUs. Long a dream within the PC gaming space (and subject of a few jokes), the possibilities for using dissimilar GPUs via Hydra were immense – pairing up GPUs not only from different vendors, but of differing performance as well – resolving some of AFR’s shortcomings while allowing gamers to do things such as reuse old video cards and still receive a performance benefit.

However in the long run the Hydra technology failed to catch on. The process of splitting up API calls, having multiple GPUs render them, and compositing them back together to a single frame proved to be harder than LucidLogix expected, and as a result Hydra’s compatibility was poor and performance gains were limited. Coupled with the cost of the hardware, the licensing, and the fact that Hydra boards were never SLI certified (preventing typical NVIDIA SLI operation) meant that Hydra had a quick exit from motherboards.

In the end what LucidLogix was attempting was a valiant effort, but in retrospect one that was misguided. Working at the back-end of the rendering chain and manipulating API calls can work, but it is a massive amount of effort and it has hardware developers aiming at a moving target, requiring constant effort to keep up with new games. AMD and NVIDIA’s driver-level optimizations don’t fare too much better in this respect; there are vendor-specific shortcuts such as NVAPI that simplify this some, but even AMD and NVIDIA have to work to keep up with new games. This is why they need to issue driver updates and profile updates so frequently in order to get the best performance out of CrossFire and SLI.

But what if there was a better way to manage multiple GPUs and assign work to them? Would it be possible to do a better job working from the front-end of the rendering chain? This is something DirectX 12 sets out to answer with its multi-adapter modes.

DirectX 12 Multi-GPU

In DirectX 12 there are technically three different modes for multi-adapter operation. The simplest of these modes is what Microsoft calls Implicit Multi-Adapter. Implicit Multi-Adapter is essentially the lowest rung of multi-adapter operation, intended to allow developers to use the same AFR-friendly techniques as they did with DirectX 11 and before. This model retains the same limited ability for game developers to control the multi-GPU rendering process, which limits the amount of power they have, but also limits their responsibilities as well. Consequently, just as with DirectX 11 mutli-GPU, in implicit mode much of the work is offloaded to the drivers (and practically speaking, AMD and NVIDIA).

While the implicit model has the most limitations, the lack of developer responsibilities also means it’s the easiest to implement. In an era where multi-platform games are common, even after developers make the switch to DirectX 12 they may not want to undertake the effort to support Explicit Multi-Adapter, as the number of PC owners with multiple high-powered GPU is a fraction of the total PC gaming market. And in that case, with help from driver developers, implicit is the fastest path towards supporting multiple GPUs.

What’s truly new to DirectX 12 then are its Explicit Multi-Adapter (EMA) modes. As implied by the name, these modes require game developers to explicitly program for multi-GPU operation, specifying how work will be assigned to each GPU, how memory will be allocated, how the GPUs will communicate, etc. By giving developers explicit control over the process, they have the best chance to extract the most multi-GPU performance out of a system, as they have near-absolute control over both the API and the game, allowing them to work with more control and more information than any of the previously discussed multi-GPU methods. The cost of using explicit mode is resources: with great power comes great responsibility, and unlike implicit mode, game developers must put in a fair bit of work to make explicit mode work, and more work yet to make it work well.

Within EMA there are two different ways to address GPUs: linked mode and unlinked mode. Unlinked mode is essentially the baseline mode for EMA, and offers the bulk of EMA’s features. Linked mode on the other hand builds unlinked that by offering yet more functionality in exchange for much tighter restrictions on what adapters can be used.

The ultimate purpose of unlinked mode is to allow developers to take full advantage of all DirectX 12 capable GPU resources in a system, at least so long as they are willing to do all of the work required to manage those resources. Unlinked mode, as opposed to linked mode and implicit multi-adapter, can work with DX12 GPUs from any vendor, providing just enough abstraction to allow GPUs to exchange data but putting everything else in the developer’s hands. Depending on what developers want to do, unlinked mode can be used for anything from pairing up two dGPUs to pairing up a dGPU with an iGPU, with the GPUs being a blank slate of sort for developers to use as they see fit for whatever algorithms and technologies they opt to use.

As the base mode for DirectX 12 multi-GPU, unlinked mode presents each as its own device, with its own memory, its own command processor, and more, accurately representing the layout of the physical hardware. What DirectX 12’s EMA brings to the table that’s new is that it allows developers to exchange data between GPUs, going beyond just finished, rendered images and potentially exchanging partially rendered frames, buffers, and other forms of data. It’s the ability to exchange multiple data types that gives EMA its power and its flexibility, as without it, it wouldn’t be possible to implement much more than AFR. EMA is the potential for multiple GPUs to work together, be it similar or disparate; no more and no less.

If this sounds very vague that’s because it is, and that in turn is because the explicit API outstrips what today’s hardware is capable of. Compared to on-board memory, any operations taking place over PCI Express are relatively slow and high latency. Some GPUs in turn handle this better than others, but at the end of the day the PCIe bus is still a bottleneck at a fraction of the speed of local memory. That means while GPUs can work together, they must do so intelligently, as we’re not yet at the point where GPUs can quickly transfer large amounts of data from each other.

Because EMA is a blank slate, it ultimately falls to developers to put it to good use; DirectX 12 just supplies the tools. Traditional AFR implementations are one such option, and so are splitting up workloads in other fashions such as split-frame rendering (SFR), or even methods where one GPU doesn’t render a complete frame or fractions of a complete frame, passing off frames at different stages to different GPUs.

But practically speaking, a lot of the early focus on EMA development and promotion is on dGPU + iGPU, and this is because the vast majority of PCs with a dGPU also have an iGPU. Relative to even a $200 dGPU, an iGPU is going to offer a fraction of the performance, but it’s also a GPU resource that is otherwise going unused. Epic Games has been experimenting with using EMA to have iGPUs do post-processing, as finished frames are relatively small (1080p60 with an FP32 frame is only 2GB/sec, a fraction of PCIe 3.0 x16’s bandwidth), post-processing is fairly lightweight in resource requirements, and it typically has a predictable processing time.

Moving on, building on top of unlinked mode is EMA’s linked mode. Linked mode is by and large the equivalent of SLI/CrossFire for EMA, and is designed for systems where all GPUs being used are near-identical. Within linked mode all of the GPUs are pooled and presented to applications as a single GPU, just with multiple command processors and multiple memory pools due to the limits of the PCIe bus. Because linked mode is restricted to similar GPUs, developers gain even more power and control, as linked GPUs will be from the same vendor and use the same data formats at every step.

Broadly speaking, linked mode will be both easier and harder for developers to use relative to unlinked mode. Unlike unlinked mode there are certain assumptions that can be made about the hardware and what it’s capable of, and developers won’t need to juggle the complications of using GPUs from multiple vendors at once. On the other hand this is the most powerful mode because of all of the options it presents developers, with more complex rendering techniques likely to be necessary to extract the full performance benefit of linked mode.

Ultimately, one point that Microsoft and developers have continually reiterated in their talks is that explicit multi-adapter is that like so many other low-level aspects of DirectX 12, it’s largely up to the developers to put the technology to good use. The API provides a broad set of capabilities – tempered a bit by hardware limitations and how quickly GPUs can exchange data – but unlike DirectX 11 and implicit multi-adapter, it’s developers that define how GPUs should work together. So whether a game supports any kind of EMA operation and whether this means combining multiple dGPUs from the same vendor, multiple dGPUs from different vendors, or a dGPU and an iGPU is a question of software more than it is of hardware.

Previewing DirectX 12 Explicit Multi-Adapter with Ashes of the Singularity Ashes of the Singularity: Unlinked Explicit Multi-Adapter w/AFR & The Test
Comments Locked

180 Comments

View All Comments

  • andrew_pz - Tuesday, October 27, 2015 - link

    Radeon placed in 16x slot, GeFroce installed to 4x slot only. WHY?
    It's cheat!
  • silverblue - Tuesday, October 27, 2015 - link

    There isn't a 4x slot on that board. To quote the specs...

    "- 4 x PCI Express 3.0 x16 slots (PCIE1/PCIE2/PCIE4/PCIE5: x16/8/16/0 mode or x16/8/8/8 mode)"

    Even if the GeForce was in an 8x slot, I really doubt it would've made a difference.
  • Ryan Smith - Wednesday, October 28, 2015 - link

    Aye. And just to be clear here, both cards are in x16 slots (we're not using tri-8 mode).
  • brucek2 - Tuesday, October 27, 2015 - link

    The vast majority of PCs, and 100% of consoles, are single GPU (or less.) Therefore developers absolutely must ensure their game can run satisfactorily on one GPU, and have very little to gain from investing extra work in enabling multi GPU support.

    To me this suggests that moving the burden of enabling multi-gpu support from hardware sellers (who can benefit from selling more cards) to game publishers (who basically have no real way to benefit at all) is that the only sane decision is not invest any additional development or testing on multi gpu support and that therefore multi GPU support will effectively be dead in the DX12 world.

    What am I missing?
  • willgart - Tuesday, October 27, 2015 - link

    well... you no longer need to change your card to a big one, you can just upgrade your pc with a low or middle entry card to get a good boost! and you keep your old one. from a long term point of view we win, not the hardware resellers.
    imagine today you have a GTX970, in 4 years you can get a GTX 2970 and have a stronger system than a single 2980 card... specialy the FPS / $ is very interesting.

    and when you compare the setup HD7970+GTX680, maybe the cost is 100$ today(?) can be compared to a single GTX980 which cost nearly 700$...
  • brucek2 - Tuesday, October 27, 2015 - link

    I understand the benefit to the user. What I'm worried is missing is incentive to the game developer. For them the new arrangement sounds like nothing but extra cost and likely extra technical support hassle to make multi-gpu work. Why would they bother? To use your example of a user with 7970+680, the 680 alone would at least meet the console-equivalent setting, so they'd probably just tell you to use that.)
  • prtskg - Wednesday, October 28, 2015 - link

    It would make their game run better and thus improve their brand name.
  • brucek2 - Wednesday, October 28, 2015 - link

    Making it run "better" implies it runs "worse" for the 95%+ of PC users (and 100% of console users) who do not have multi-GPU. That's a non-starter. The publisher has to make it a good experience for the overwhelmingly common case of single gpu or they're not going to be in business for very long. Once they've done that, what they are left with is the option to spend more of their own dollars so that a very tiny fraction of users can play the same game at higher graphics settings. Hard to see how that's going to improve their brand name more than virtually anything else they'd choose to spend that money on, and certainly not for the vast majority of users who will never see or know about it.
  • BrokenCrayons - Wednesday, October 28, 2015 - link

    You're not missing anything at all. Multi-GPU systems, at least in the case of there being more than one discrete GPU, represent a small number of halo desktop computers. Desktops, gaming desktops in particular, are already a shrinking market and even the large majority of such systems contain only a single graphics card. This means there's minimal incentive for a developer of a game to bother soaking up the additional cost of adding support for multi GPU systems. As developers are already cost-sensitive and working in a highly competitive business landscape, it seems highly unlikely that they'll be willing to invest the human resources in the additional code or soak up the risks associated with bugs and/or poor performance. In essence, DX12 seems poised to end multi GPU gaming UNLESS the dGPU + iGPU market is large enough in modern computers AND the performance benefits realized are worth the cost to the developers to write code for it. There are, after all, a lot more computers (even laptops and a very limited number of tablets) that contain an Intel graphics processor and an NV or more rarely an AMD dGPU. Though even then, I'd hazard a guess to say that the performance improvement is minimal and not worth the trouble. Plus most computers sold contain only whatever Intel happens to throw onto the CPU die so even that scenario is of limited benefit in a world of mostly integrated graphics processors.
  • mayankleoboy1 - Wednesday, October 28, 2015 - link

    Any idea what LucidLogix are doing these days?
    Last i remember, they had released some software solutions which reduced battery drain on Samsung devices (by dynamically decreasing the game rendering quality

Log in

Don't have an account? Sign up now