Yesterday after an all-day session of benchmarking on Wednesday, we published our initial performance results for Civilization: Beyond Earth. As can often be the case with limited testing, we ran into a problem and were unable to find a solution at the time. In short, while there was a lot of talk about how developers Firaxis had spent some effort to improve latency using a custom Split-Frame Rendering (SFR) approach with Mantle on CrossFire configurations, we were unable to produce anything that corroborated that story. Emails were sent, but it took half a day before we finally had the answer: enabling SFR actually requires manual editing of the configuration file. Oops.

We could ask why manual editing of the INI file is even necessary, and there are other user interface items that would be nice to address as well as I noted in the conclusion of the original Benchmarked article. But that's all water under the bridge at this point, so let me issue a public apology for not having the complete information yesterday.

I've updated the text of the original article (and added a discussion of minimum frame rates in case you missed that), but since many people have potentially read the article already and are unlikely to revisit the subject, I wanted to post a separate Pipeline to update everyone on the true performance of CrossFire with Mantle and SFR. But before we get to that, let me also take this opportunity to provide some of the additional information from Firaxis and AMD on why SFR matters. Firaxis has a couple blog posts on the subject (including one highlighting the benefits of Mantle with multiple GPUs), and here's the direct quote from AMD's marketing folks:

With a traditional graphics API, multi-GPU (MGPU) arrays like AMD CrossFire are typically utilized with a rendering method called "alternate-frame rendering" (AFR). AFR renders odd frames on the first GPU, and even frames on the second GPU. Parallelizing a game’s workload across two GPUs working in tandem has obvious performance benefits.

As AFR requires frames to be rendered in advance, this approach can occasionally suffer from some issues:

  • Large queue depths can reduce the responsiveness of the user’s mouse input
  • The game’s design might not accommodate a queue sufficient for good MGPU scaling
  • Predicted frames in the queue may not be useful to the current state of the user’s movement or camera

Thankfully, AFR is not the only approach to multi-GPU. Mantle empowers game developers with full control of a multi-GPU array and the ability to create or implement unique MGPU solutions that fit the needs of the game engine. In Civilization: Beyond Earth, Firaxis designed a "split-frame rendering" (SFR) subsystem. SFR divides each frame of a scene into proportional sections, and assigns a rendering slice to each GPU in AMD CrossFire configuration. The "master" GPU quickly receives the work of each GPU and composites the final scene for the user to see on his or her monitor.

If you don’t see 70-100% GPU scaling, that is working as intended, according to Firaxis. Civilization: Beyond Earth’s GPU-oriented workloads are not as demanding as other recent PC titles. However, Beyond Earth’s design generates a considerable amount of work in the producer thread. The producer thread tracks API calls from the game and lines them up, through the CPU, for the GPU’s consumer thread to do graphics work. This producer thread vs. consumer thread workload balance is what establishes Civilization as a CPU-sensitive title (vs. a GPU-sensitive one).

Because the game emphasizes CPU performance, the rendering workloads may not fully utilize the capacity of a high-end GPU. In essence, there is no work leftover for the second GPU. However, in cases where the GPU workload is high and a frame might take a while to render (affecting user input latency), the decision to use SFR cuts input latency in half, because there is no long AFR queue to work through. The queue is essentially one frame, each GPU handling a half. This will keep the game smooth and responsive, emphasizing playability, vs. raw frame rates.

Let me provide an example. Let’s say a frame takes 60 milliseconds to render, and you have an AFR queue depth of two frames. That means the user will experience 120ms of lag between the time they move the map and that movement is reflected on-screen. Firaxis’ decision to use SFR halves the queue down to one frame, reducing the input latency to 60ms. And because each GPU is working on half the frame, the queue is reduced by half again to just 30ms.

In this way the game will feel very smooth and responsive, because raw frame-rate scaling was not the goal of this title. Smooth, playable performance was the goal. This is one of the unique approaches to MGPU that AMD has been extolling in the era of Mantle and other similar APIs.

When I first read the above, my initial reaction was: "This is awesome!" I've always been a bit leery of AFR and the increase in input latency that it can create, so using SFR to avoid the issue is an excellent idea. Unfortunately, it requires more work and testing to get it working right, so most games simply stick with AFR. Ironically, while reducing input latency is never a bad thing, it honestly doesn't matter nearly as much in a turn-based strategy game like Civilization: Beyond Earth. What we'd really love to see is use of techniques like SFR to reduce input latency on games from genres where input latency is a bigger deal – first-person games like Crysis, Battlefield, Far Cry, etc. and third-person games like Batman, Shadow of Mordor, Assassin's Creed, etc. being prime examples. With that said, let's revisit the subject of Civilization: Beyond Earth and CrossFire performance, with and without Mantle:

Civilization: Beyond Earth 4K Performance

Civilization: Beyond Earth QHD Performance

Civilization: Beyond Earth 1080p Performance

Civilization: Beyond Earth 1080p High Performance

Our graphing engine doesn't allow for sorting on multiple criteria, otherwise I might try sorting by average + minimum frame rate. Regardless, you can see that across the range of options the CrossFire Mantle SFR support is now doing what we'd expect and improving frame rates. But it's not just about improving frame rates; as the above commentary notes, improving input latency is also important. We aren't really equipped to test for input latency (that would require a very high speed camera as well as additional time filming and measuring input latency), but the minimum frame rates definitely improve as well.

What's interesting is that CrossFire without Mantle (which uses AFR) has higher average FPS in many cases, but the minimum frame rates are worse than with a single GPU. The two images above show why this isn't necessarily a good thing. We haven't tested SLI performance, but I have at least one source that says SLI performance is similar to CrossFire AFR: higher average FPS but lower minimum FPS. It's entirely possible that driver updates will improve the situation with D3D, but for now CrossFire with Mantle SFR definitely scores a win over Direct3D AFR as it provides for a smoother gaming experience.

Let's look at the above charts in a different format before we continue this discussion.

We can see that even with just two GPUs splitting the workload, our CPU has apparently become a bottleneck with the R9 290X. Average frame rates still show an increase going from 4K Ultra to QHD Ultra to 1080p Ultra to 1080p High, but when we look at minimum FPS we've apparently run straight into a wall. For the R9 290X with Mantle, CrossFire effectively tops out with a minimum FPS of roughly 65FPS while a single GPU hits a lower minimum of around 50FPS without Mantle, and regular CrossFire on the 290X (i.e. without Mantle) has a minimum of 45FPS. Again, there are likely some optimizations that could be made in both drivers and the game to improve the situation, but it wouldn't be too surprising to find that Mantle and SFR with three or four GPUs doesn't show much of an increase over two GPUs.

I do have to wonder how applicable the above results are to other games. Last I checked, Mantle CrossFire rendering on Sniper Elite 3 was basically not working, but if other software developers can use Mantle to effectively implement SFR instead of AFR that would be nice to see. But didn't we have SFR way back in the early days of multiple GPUs? Of course we did! 3dfx initially called their solution SLI – Scan Line Interleave – and had each GPU rendering every other line. That approach had problems with things like anti-aliasing, but there are many other ways to divide the workload between GPUs, and both AMD (formerly ATI) and NVIDIA have done variations on SFR in the past.

The problem is that when DirectX 9 rolled around and we started getting programmable shaders and deferred rendering, at some point synchronization issues cropped up and basically developers were locked out of doing creative things like SFR (or geometry processing on one GPU and rendering on another). The only thing you can do with multiple GPUs using Direct3D right now is AFR. That may change with Direct3D 12, but we're still a ways out from that release. Basically, AFR is the easiest approach to implement, but it has various drawbacks even when it does work properly.

Of course there are other potential pitfalls with doing alternative workload splitting like SFR. They can require more work from the CPU, and as you add GPUs the CPU already creates a potential bottleneck. AMD informed us that the engine in Civilization: Beyond Earth is actually extremely scalable with CPU cores, so while we're testing with an overclocked i7-4770K, AMD said they even saw a 20% improvement in performance (with Mantle) going from hex-core Ivy Bridge-E to octal-core Haswell-E with R9 290X CrossFire. There are apparently other cases where certain hardware configurations and game settings can result in an even greater improvement in performance thanks to Mantle (e.g. the 50% increase in minimum frame rates on the R9 290X at our 1080p High settings).

The bottom line is that if you have an AMD GPU, games like Civilization: Beyond Earth can certainly benefit. Maybe Direct3D 12 will bring similar options to developers next year, but in the meantime, congrats to both AMD and Firaxis for shining the light on the latency subject once again. NVIDIA made some waves with similar discussions when they released FCAT last year, but the topic of latency and jitters is definitely important – and don't even get me started on silliness like capping frame rates at 30FPS by default (cough, The Evil Within, cough).



View All Comments

  • MrSpadge - Friday, October 24, 2014 - link

    Ay Caramba!

    When reading the original article yesterday I thought:"Oh dear, this whole Mantle business seems like a huge waste of development time." The current results change this considerably to:"Wow, they're doing the right thing!"

    Improving minimum frame rates is what really counts towards making a game / application feel smooth. From my point of view the GPU turbo modes should also be used to equalize maximum and minimum frame rates: there's no point in rendering at super-high frame rates, especially with Free-/G-Sync. Better throttle the GPU a bit at light to moderate loads to have some thermal budget to spare for short bursts of high load.
  • djscrew - Sunday, October 26, 2014 - link

    +1 Reply
  • eddman - Friday, October 24, 2014 - link

    That Directx CF minimum FPS looks suspicious to me. You sure it's not a game bug? Reply
  • JarredWalton - Friday, October 24, 2014 - link

    It's just a frame rate pacing issue -- I've added two images to show what's going on. Basically the AFR frame times are all over the place compared to the Mantle and single GPU frame times. Reply
  • eanazag - Friday, October 24, 2014 - link

    This ends up being akin to the SSD consistency performance. Hence, there is value in Mantle. What would really be cool would be to see what the CPU was up to at the same time. The sad part is Jarred's electricity bill probably spiked this month from testing the 290X in CF. Reply
  • whyso - Friday, October 24, 2014 - link

    You can't compare Mantle vs. DX without looking at Nvidia as well. Where are the nvidia frame time charts? Reply
  • JarredWalton - Friday, October 24, 2014 - link

    I don't have an SLI configuration yet, so I can't test it. Single GPU frame times are fine, but this was specifically looking at CrossFire D3D vs. Mantle. (FWIW, a colleague at another web site is reporting rather jittery frame times on SLI -- hopefully not as bad as D3D CF, though.) Reply
  • TheJian - Monday, October 27, 2014 - link

    The SLI isn't that important here, heck just thrown in a SINGLE 980/970. Nobody is using either SLI or CF. I say nobody when it is less than 3% of the public (according to steam's surveys). Mind you, that is a percent of that 3% that runs above 1920x1200. So it's more important (to 97% of us anyway) to at least show the single 980/970 in the charts ALWAYS.

    As others have said, you need to show the other side (SLI or not, some may not have read the other story). IF NV+DX11 is beating them already who cares about this then? If not, show that. That is why they need to be in there, even if just single cards, to answer the question of DX11 on NV vs. everything you're showing from AMD.

    You already have the results from there, just add them. From that article the bottom 1% doesn't mean much and you saw no problems in the game. Comic you say the bang for buck winner is 290x because it can be had for under $400. You can get the 970 for $330. Bang for buck winner should be the one who wins in many games not just this one, and the 970 topples 290x in a lot of stuff.
    Especially where most of us play (below 1920x1200, only 3% above that). IN many cases 970 beats 290x by quite a lot, but even if 290x won much of it by 10% you'd have a hard time claiming it was the bang for buck champ (costing 10% more it should be winning by 10% in EVERY game, mantle or not). This isn't even counting the OCing that can be done on 970/980 or the noise (as even you pointed out in the link above) that 280/290 pound out.

    And FWIW, very few of us run SLI or CF as I already noted, as it's a percentage of the 3% that run above 1920x1200...LOL. Concentrating on SLI/CR is writing an article for an audience the size of a percent of 3% of the public (so like 1.5% of us overall?) ;) Judging the amount of games using mantle that show AMD victories, I'll take a 970 to go please...ROFL. Then again, I'll be waiting for the 20nm versions which should make an already power sipping maxwell that much better, but you get the point. That AMD portal costs you guys a lot of objective journalism IMHO.
  • Navvie - Monday, October 27, 2014 - link

    This is comment of the year. Reply
  • ZeDestructor - Monday, October 27, 2014 - link

    I'm waiting for big Maxwell on 20nm, at which point I'll grab a pair for 5760x1200 gaming :)

    /me is in the 0.33% ^_^

Log in

Don't have an account? Sign up now