Under The Hood: DirectX 9, Shader Caching, Liquid VR, and Power Consumption

Alongside AMD’s driver branding changes, Radeon Software Crimson Edition 15.11 also marks the first release of AMD’s major new driver branch, 15.300. Consequently this driver release comes with a number of feature improvements under the hood, a number of which work in conjunction with AMD’s control panel update.

DirectX 9: Frame Pacing, CF Freesync, & Frame Rate Target Control

We’ll start off our look under the hood of Crimson with AMD’s improvements for that old standard of graphics APIs: DirectX 9. While Microsoft has moved on from DirectX 9 and the last Windows OS limited to DX9 was discontinued in 2014, DirectX 9 became firmly entrenched in development circles over its nearly decade-long run, more so than I expect anyone was really expecting. Though console-quality AAA games have since switched over to DirectX 11, “lightweight” mass market games have stuck with DirectX 9, particularly team games like the MOBAs and Rocket League, where even new titles like Dota 2 Reborn use DX9 by default.

Although AMD doesn’t share every last aspect of their internal plans with us, I get the impression that AMD expected to be done with DX9 around 2013, when even the original DX11 cards were approaching 4 years old. Since that time AMD has rolled out several features that require per-API support such as their multi-GPU frame pacing improvements, CrossFire Freesync, and frame rate target control, none of which initially supported DX9.

However with Crimson AMD is doing some backtracking and at long last is adding support for these features when used in conjunction with DirectX 9. This means that it’s now possible to use AMD’s various framerate technologies – frame pacing, CrossFire Freesync, and frame rate target control (FRTC) – with DX9 games new and old.

Truthfully after AMD initially punted on CrossFire frame pacing support for DirectX 9 in 2013 I didn’t expect that we’d see support added at a later time (especially not two years later) so this comes as a bit of a surprise. However with DX9 games refusing to die and AMD in a position where they are heavily promoting the use of their APUs with MOBAs, AMD has a vested interest in making these games perform as well as possible. FRTC offers a more flexible alternative to v-sync for capping the frame rate, a potent option for conserving battery power on laptops. Meanwhile I suspect AMD’s CF improvements are specifically aimed at improving the Dual Graphics experience with AMD’s APUs, which has always been AMD’s ace in the hole for offering cheap GPU performance upgrades by allowing the APU to be used in conjunction with a cheap dGPU rather than having to disable it entirely. Otherwise there are a handful of games where these DX9 improvements will be applicable even to dGPU setups – particularly 2011’s Skyrim – however at this point in time most dGPUs should have little need for CrossFire to get good performance on legacy DX9 games.

Liquid VR

Back at the 2015 Game Developers' Conference, AMD announced their LiquidVR technology. LiquidVR, in a nutshell, is AMD’s collection of virtual reality related technologies, being rolled out in preparation for the 2016 consumer launches of the Oculus Rift and HTC Vive VR headsets. LiquidVR includes AMD’s technologies for implementing efficient (and timely) last-second time warping to cut down on latency, per-eye multi-GPU rendering (allowing for near-perfect use of two GPUs), and a series of OS and driver/stack optimizations that allow VR games to bypass parts of the OS to reduce rendering latency.

At the time of their announcement AMD was just releasing the SDK to developers, as the technology was still under active development. But as of Crimson AMD is enabling the LiquidVR feature set in their consumer drivers. This won’t have any immediate ramifications since the retail headsets still aren’t here, but this will allow developers to begin final preparations for the launch of the retail headsets next year.

Shader Caching

Another feature new to Crimson is shader caching. With shader caching AMD’s drivers can now transparently cache compiled game shader routines, reusing those shaders rather than recompiling them each time they’re used. DirectX lacks a universal, built-in shader caching solution, so as games have become more advanced shaders have as well, and this has increased the amount of time and resources required to compile shaders. This in turn has pretty much forced GPU vendors to implement shader caching at the driver level in order to accommodate games making poor use of shaders in order to avoid a very frustrating bottleneck. This is admittedly a case of AMD catching up with NVIDIA, but none the less it’s a welcome change.

Ultimately shader caching improves game performance in two specific areas. Games that do extensive pre-load shader compilation can now skip that compilation on future uses and reduce their overall load time, particularly on systems with slower CPUs (since shader compilation is a CPU operation). Meanwhile games that stream a large number of assets and regularly need to compile shaders on-the-fly can produce stuttering if the game needs to wait for a shader to compile before rendering the next frame, and while caching can’t eliminate the first instance of compilation, it can eliminate stuttering caused by successive loads of a shader.

The overall performance impact of shader caching will depend on the individual game used, and the speed of a system’s CPU. Checking Battlefield 4 and Crysis 3 didn’t show any improvements, while AMD notes that in their testing they’ve found Star Wars: Battlefront and Bioshock: Infinite to measurably benefit from caching.

Flip Queue Size

Filing this one under the “secret sauce” category, AMD tells us that they have made some changes to reduce the size of the DirectX flip queue. The flip queue is a data structure responsible for storing rendered frames, with the idea being that a game should queue up some frames to help ensure steady frame delivery. A short flip queue makes frame pacing more susceptible to being disrupted by frames that take too long (and otherwise makes inconsistent performance more likely) while a larger queue introduces additional lag.

AMD isn’t telling us what exactly they have done with the flip queue, however they note with Crimson that the size has been reduced. This is a fully transparent option – there isn’t a Radeon Settings option for it for users to access – so all users get it by default. AMD tells us that their changes have been implemented to reduce lag in games (particularly MOBAs) and shouldn’t impact framerate stability, but we don’t know just what they have done. AMD’s slide does imply that the queue has been reduced from 3 frames to 1, but these slides shouldn’t be taken as technical specifications as they’re primarily for communication/conceptual purposes.

Video Decode Power

Finally, another unexpected item in AMD’s change list was a mention that they have reduced the power consumption of video decoding on some of their cards. This came as a bit of a surprise since as far as we’ve been aware, there hasn’t been a power consumption problem (and we wouldn’t think to look for one).

But sure enough, on our Radeon R9 Fury, playing back a 1080p H.264 video resulting in the card clocking up to between 500MHz and 700MHz depending on the display resolution. With the Crimson driver the Fury went back to staying at its idle GPU clockspeed of 300MHz, with no change in measurable performance. Meanwhile power draw at the wall went down from 83W to 78W, a small difference as a result of our high powered tested, but none the less a measurable result and one that should be greater on lower powered systems.

1080p H.264 Video Playback Power Consumption
  Power Consumption at Wall
Idle 76W
Catalyst 15.11.1 83W
Crimson 15.11 78W

As best as we can gather, AMD had a UVD and/or desktop compositing bug on at least the Fiji cards that made them clock unnecessarily high for video playback. Frankly this seems like somewhat of a dumb thing to have to fix, but AMD correcting bugs is always appreciated.

Meet Crimson Under The Hood for Displays: Custom Resolutions, Freesync Improvements, & Framerate Target Control
POST A COMMENT

146 Comments

View All Comments

  • looncraz - Wednesday, November 25, 2015 - link

    I only play BF4 with Mantle, and I've never noticed a single glitch (I did when it first came out (with colors), so I ran DX11 for a while).

    The resolution actually doesn't dictate how much RAM you need as much as people think. A 1080p frame buffer only weighs in at ~8MB, 4k is ~34MB. You need VRAM to store all of the textures and other game data. Your resolution has an effect on VRAM use only for certain features.
    Reply
  • i_create_bugs - Wednesday, November 25, 2015 - link

    Except that you also need room for multiple render targets. Not just RGBA. Typically diffuse, normal, stencil, etc. Plus on top of that you need stencil/Z-buffer. Those buffers can also be 64 bits per pixel, if float16 pixel formats are used.

    Additionally sometimes frame buffer width is a bit more than actual resolution due to hardware limitations. So 1920 wide buffer might actually have room for 2048 pixels in real memory layout.

    Lower end guess for 1080P is 32 x 2^ceil(log2(1920)) x 1080. So at least 32 x 2048 x 1080 bytes. 67.5 MB per frame at 1080P. For 4k (3840x2160), 32 x 4096 x 2160 = 270 MB.

    Plus op top of that you need some RGBA frames for double / triple buffering.
    Reply
  • looncraz - Thursday, November 26, 2015 - link

    You calculated it for BITS, not BYTES.

    Also, we usually end up aligning just a few pixels on the end (as in two or three).

    A 1920x1080 buffer will be allocated as a slightly wider, but no higher, buffer. FOUR bytes per pixel (not 32). That gives 7.91MB per frame buffer.

    As for the z coordinate, we usually use the last 8 bits of the above buffer. Why? Because 8-bits per color channel is what anyone usually ever uses. This is called D24S8.

    When you increase the resolution of your game yourself, you are increasing the size of the frame buffers, including the flip queue, post-processing buffers, and a few others. Basically, you can generally assume there are 15 frame-buffer linked sized buffers.

    So at 1080p, you need 118MB of VRAM for the buffers, and at 4k you need 475MB. This is why you can see VSR running so well on video cards with only 2GB of RAM. You do need more RAM, but it isn't drastic. What can make a more drastic difference is the game using resolution-specific textures. THAT can eat up an extra GB or so, depending on the game developer. Older games, or games meant for 1080p, however, will not have 4k texture packs.
    Reply
  • The_Countess - Friday, November 27, 2015 - link

    you just described the game running in directX as well. Reply
  • dsumanik - Tuesday, November 24, 2015 - link

    Nvidia drivers have been superior for the last decade, end of story. I suspect in many cases when the silicon battle was close, this is how NVIDIA kept the edge.

    If crimson can fulfill it's ambitious vision, things will get might interesting next year.

    Got my fingers crossed for ya AMD.
    Reply
  • Dalamar6 - Wednesday, November 25, 2015 - link

    NVidia's superior drivers are why AMD was bargain binned even during the times when their performance:price ratio was actually significantly better.

    Of course we're talking Windows, AMD literally has NO linux presence at all, and literally cripples rolling distributions, and this rebadged driver won't change that.
    Reply
  • Gigaplex - Wednesday, November 25, 2015 - link

    AMD has their open source driver presence. For a lot of their hardware, it's very stable and performs well. It's pretty slow to support brand new hardware though. Reply
  • Fallen Kell - Wednesday, November 25, 2015 - link

    You mean 2D support is pretty stable and performs well. 3D is abysmal performance. There is a reason why not a single Steam Machine configuration out there has an AMD graphic card as an option, and it is because they all have HORRIBLE 3D performance. Reply
  • Beany2013 - Wednesday, November 25, 2015 - link

    As a user of Ubuntu and Debian, and AMD GPUs, I have to agree with Fallen Kell; it's not as bad as it was, but major updates (such as GCC updates, as happened with Ubuntu 15.10) utterly, utterly break things.

    It's working now on Wiley-proposed, but jesus, what a pain in the arse.

    I'm hoping that this, and other pressure (like not having any realistic Steam Machine presence) might make will force them to up their game. Majorly.

    Performance when it works though, is fine - in some cases though, it's just that you have to force it to work. With hard liquor. And swearing. And Fire.
    Reply
  • FourEyedGeek - Wednesday, November 25, 2015 - link

    There is one area AMD beats NVIDIA in drivers, old cards. NVIDIA haven't paid as much attention to older cards as AMD has, though it could be because AMD use the same architecture for longer periods of time. At release the NVIDIA 680 was faster than the 7970, but on modern games with new drivers the 7950 can even beat the 680 in some games. Reply

Log in

Don't have an account? Sign up now