Yesterday after an all-day session of benchmarking on Wednesday, we published our initial performance results for Civilization: Beyond Earth. As can often be the case with limited testing, we ran into a problem and were unable to find a solution at the time. In short, while there was a lot of talk about how developers Firaxis had spent some effort to improve latency using a custom Split-Frame Rendering (SFR) approach with Mantle on CrossFire configurations, we were unable to produce anything that corroborated that story. Emails were sent, but it took half a day before we finally had the answer: enabling SFR actually requires manual editing of the configuration file. Oops.

We could ask why manual editing of the INI file is even necessary, and there are other user interface items that would be nice to address as well as I noted in the conclusion of the original Benchmarked article. But that's all water under the bridge at this point, so let me issue a public apology for not having the complete information yesterday.

I've updated the text of the original article (and added a discussion of minimum frame rates in case you missed that), but since many people have potentially read the article already and are unlikely to revisit the subject, I wanted to post a separate Pipeline to update everyone on the true performance of CrossFire with Mantle and SFR. But before we get to that, let me also take this opportunity to provide some of the additional information from Firaxis and AMD on why SFR matters. Firaxis has a couple blog posts on the subject (including one highlighting the benefits of Mantle with multiple GPUs), and here's the direct quote from AMD's marketing folks:

With a traditional graphics API, multi-GPU (MGPU) arrays like AMD CrossFire are typically utilized with a rendering method called "alternate-frame rendering" (AFR). AFR renders odd frames on the first GPU, and even frames on the second GPU. Parallelizing a game’s workload across two GPUs working in tandem has obvious performance benefits.

As AFR requires frames to be rendered in advance, this approach can occasionally suffer from some issues:

  • Large queue depths can reduce the responsiveness of the user’s mouse input
  • The game’s design might not accommodate a queue sufficient for good MGPU scaling
  • Predicted frames in the queue may not be useful to the current state of the user’s movement or camera

Thankfully, AFR is not the only approach to multi-GPU. Mantle empowers game developers with full control of a multi-GPU array and the ability to create or implement unique MGPU solutions that fit the needs of the game engine. In Civilization: Beyond Earth, Firaxis designed a "split-frame rendering" (SFR) subsystem. SFR divides each frame of a scene into proportional sections, and assigns a rendering slice to each GPU in AMD CrossFire configuration. The "master" GPU quickly receives the work of each GPU and composites the final scene for the user to see on his or her monitor.

If you don’t see 70-100% GPU scaling, that is working as intended, according to Firaxis. Civilization: Beyond Earth’s GPU-oriented workloads are not as demanding as other recent PC titles. However, Beyond Earth’s design generates a considerable amount of work in the producer thread. The producer thread tracks API calls from the game and lines them up, through the CPU, for the GPU’s consumer thread to do graphics work. This producer thread vs. consumer thread workload balance is what establishes Civilization as a CPU-sensitive title (vs. a GPU-sensitive one).

Because the game emphasizes CPU performance, the rendering workloads may not fully utilize the capacity of a high-end GPU. In essence, there is no work leftover for the second GPU. However, in cases where the GPU workload is high and a frame might take a while to render (affecting user input latency), the decision to use SFR cuts input latency in half, because there is no long AFR queue to work through. The queue is essentially one frame, each GPU handling a half. This will keep the game smooth and responsive, emphasizing playability, vs. raw frame rates.

Let me provide an example. Let’s say a frame takes 60 milliseconds to render, and you have an AFR queue depth of two frames. That means the user will experience 120ms of lag between the time they move the map and that movement is reflected on-screen. Firaxis’ decision to use SFR halves the queue down to one frame, reducing the input latency to 60ms. And because each GPU is working on half the frame, the queue is reduced by half again to just 30ms.

In this way the game will feel very smooth and responsive, because raw frame-rate scaling was not the goal of this title. Smooth, playable performance was the goal. This is one of the unique approaches to MGPU that AMD has been extolling in the era of Mantle and other similar APIs.

When I first read the above, my initial reaction was: "This is awesome!" I've always been a bit leery of AFR and the increase in input latency that it can create, so using SFR to avoid the issue is an excellent idea. Unfortunately, it requires more work and testing to get it working right, so most games simply stick with AFR. Ironically, while reducing input latency is never a bad thing, it honestly doesn't matter nearly as much in a turn-based strategy game like Civilization: Beyond Earth. What we'd really love to see is use of techniques like SFR to reduce input latency on games from genres where input latency is a bigger deal – first-person games like Crysis, Battlefield, Far Cry, etc. and third-person games like Batman, Shadow of Mordor, Assassin's Creed, etc. being prime examples. With that said, let's revisit the subject of Civilization: Beyond Earth and CrossFire performance, with and without Mantle:

Civilization: Beyond Earth 4K Performance

Civilization: Beyond Earth QHD Performance

Civilization: Beyond Earth 1080p Performance

Civilization: Beyond Earth 1080p High Performance

Our graphing engine doesn't allow for sorting on multiple criteria, otherwise I might try sorting by average + minimum frame rate. Regardless, you can see that across the range of options the CrossFire Mantle SFR support is now doing what we'd expect and improving frame rates. But it's not just about improving frame rates; as the above commentary notes, improving input latency is also important. We aren't really equipped to test for input latency (that would require a very high speed camera as well as additional time filming and measuring input latency), but the minimum frame rates definitely improve as well.

What's interesting is that CrossFire without Mantle (which uses AFR) has higher average FPS in many cases, but the minimum frame rates are worse than with a single GPU. The two images above show why this isn't necessarily a good thing. We haven't tested SLI performance, but I have at least one source that says SLI performance is similar to CrossFire AFR: higher average FPS but lower minimum FPS. It's entirely possible that driver updates will improve the situation with D3D, but for now CrossFire with Mantle SFR definitely scores a win over Direct3D AFR as it provides for a smoother gaming experience.

Let's look at the above charts in a different format before we continue this discussion.

We can see that even with just two GPUs splitting the workload, our CPU has apparently become a bottleneck with the R9 290X. Average frame rates still show an increase going from 4K Ultra to QHD Ultra to 1080p Ultra to 1080p High, but when we look at minimum FPS we've apparently run straight into a wall. For the R9 290X with Mantle, CrossFire effectively tops out with a minimum FPS of roughly 65FPS while a single GPU hits a lower minimum of around 50FPS without Mantle, and regular CrossFire on the 290X (i.e. without Mantle) has a minimum of 45FPS. Again, there are likely some optimizations that could be made in both drivers and the game to improve the situation, but it wouldn't be too surprising to find that Mantle and SFR with three or four GPUs doesn't show much of an increase over two GPUs.

I do have to wonder how applicable the above results are to other games. Last I checked, Mantle CrossFire rendering on Sniper Elite 3 was basically not working, but if other software developers can use Mantle to effectively implement SFR instead of AFR that would be nice to see. But didn't we have SFR way back in the early days of multiple GPUs? Of course we did! 3dfx initially called their solution SLI – Scan Line Interleave – and had each GPU rendering every other line. That approach had problems with things like anti-aliasing, but there are many other ways to divide the workload between GPUs, and both AMD (formerly ATI) and NVIDIA have done variations on SFR in the past.

The problem is that when DirectX 9 rolled around and we started getting programmable shaders and deferred rendering, at some point synchronization issues cropped up and basically developers were locked out of doing creative things like SFR (or geometry processing on one GPU and rendering on another). The only thing you can do with multiple GPUs using Direct3D right now is AFR. That may change with Direct3D 12, but we're still a ways out from that release. Basically, AFR is the easiest approach to implement, but it has various drawbacks even when it does work properly.

Of course there are other potential pitfalls with doing alternative workload splitting like SFR. They can require more work from the CPU, and as you add GPUs the CPU already creates a potential bottleneck. AMD informed us that the engine in Civilization: Beyond Earth is actually extremely scalable with CPU cores, so while we're testing with an overclocked i7-4770K, AMD said they even saw a 20% improvement in performance (with Mantle) going from hex-core Ivy Bridge-E to octal-core Haswell-E with R9 290X CrossFire. There are apparently other cases where certain hardware configurations and game settings can result in an even greater improvement in performance thanks to Mantle (e.g. the 50% increase in minimum frame rates on the R9 290X at our 1080p High settings).

The bottom line is that if you have an AMD GPU, games like Civilization: Beyond Earth can certainly benefit. Maybe Direct3D 12 will bring similar options to developers next year, but in the meantime, congrats to both AMD and Firaxis for shining the light on the latency subject once again. NVIDIA made some waves with similar discussions when they released FCAT last year, but the topic of latency and jitters is definitely important – and don't even get me started on silliness like capping frame rates at 30FPS by default (cough, The Evil Within, cough).

POST A COMMENT

61 Comments

View All Comments

  • Tikcus9666 - Saturday, October 25, 2014 - link

    the FX 8 series do not use a lot more power, they use more power
    performance aside, if system A uses 50 Watts more than system B
    At 12p per KWH, system A is £130 cheaper than system B (Price difference between i7 4790 and FX 8350, assuming all other prices are the same)

    It Will take 21667 Hours to get your money back, at 4 hours per day that is just short of 15 years, no one is going to keep the computer long enough for the power to make a difference as a home user, this is different if your are running an office with hundreds of systems
    Reply
  • CrazyElf - Friday, October 24, 2014 - link

    Overall it is looking like the Mantle split frame rendering ("scissor mode") is a step forward.

    There is a lot less stutter with the split frame rendering than with AFR. The minimum frame rates are higher than compared to single and double GPU, and that is what matters the most. I suspect that so long as Nvidia uses AFR as well, their results will be similar to the Crossfire non-Mantle performance.

    Personally, I wish that scissors mode was more widespread and that there was an emphasis on minimum frame rates, rather than maximizing average FPS.

    On that note, there is one other issue that is unrelated to all of this that makes me want to skip this title. The game play itself is said to be disappointing. It shares all of the drawbacks of Civ V, and none of the advantages of Alpha Centauri. I suspect that the expansions will be unable to fix the problem.

    There is one other issue - are 3 and 4 GPU setups compatible with SFR?
    Reply
  • JarredWalton - Friday, October 24, 2014 - link

    At present Mantle and SFR only support two GPUs. Firaxis is supposedly working on a fix to support three and four GPU configurations, but I don't have a time frame for that. I also suspect the scaling will be a case of diminishing returns (as usual for 3-way and 4-way setups). Reply
  • CrazyElf - Friday, October 24, 2014 - link

    Is there a way to post a link here?

    I've got results to show you, but they got flagged as spam.
    Reply
  • CrazyElf - Friday, October 24, 2014 - link

    Anyways, it's steeply diminishing returns as you note.

    This is from Udteam's review:
    - GTX 780 Ti : 100% -> 74% -> 41% -> 34%

    - GTX 780 : 100% -> 68% -> 50% -> 14%

    - R9 290X : 100% -> 84% -> 58% -> 30%

    - R9 290 : 100% -> 82% -> 59% -> 37%

    Average FPS per GPU added. This is in AFR, so with SFR, the scaling would be even lower.
    Reply
  • Flunk - Friday, October 24, 2014 - link

    Hiding the setting in a config file where even a professional reviewer can't find it is not acceptable. What it really means for most players is that they'll end up with the performance you showed in the first review. I hope they patch this into the settings menu. Reply
  • Impulses - Friday, October 24, 2014 - link

    No doubt. Though as a multi screen gamer I'm kinda used to this, just an intrinsic part of PC gaming... Reply
  • eanazag - Friday, October 24, 2014 - link

    I love that AMD is testing with Intel CPUs. Intel can lay off some of their marketing team now since AMD is helping them sell more -E series CPUs.

    #sadbuttrue
    Reply
  • The_Assimilator - Saturday, October 25, 2014 - link

    Haha, I noticed that too. Seems even AMD's GPU division doesn't want to touch their own company's CPUs. Reply
  • TiGr1982 - Saturday, October 25, 2014 - link

    Their own old AM3+ platform does not even support PCIe 3.0 (with a very few specific MB exceptions), so their own PCIe 3.0-capable GPUs can be potentially limited on AM3+ being stuck with PCIe 2.0. This is somewhat riduculous, but this is how it is.
    FM2+ with Kaveri APU do support PCIe 3.0, but Kaveri's CPU part is like i3 and is not up to the job of feeding, say, Hawaii GPU(s) (especially more than one).
    So, AM3+ is old, FM2+ is budgetary, and, so, AMD top GPUs are left with Intel platforms to run. That's how it is.
    Reply

Log in

Don't have an account? Sign up now