GCN 1.2 - Image & Video Processing

AMD’s final set of architectural improvements for GCN 1.2 are focused on image and video processing blocks contained within the GPU. These blocks, though not directly tied to GPU performance, are important to AMD by enabling new functionality and by offering new ways to offload tasks on to fixed function hardware for power saving purposes.

First and foremost then, with GCN 1.2 comes a new version of AMD’s video decode block, the Unified Video Decoder. It has now been some time since UVD has received a significant upgrade, as outside of the addition of VC-1/WMV9 support it has remained relatively unchanged for a couple of GPU generations.

With this newest generation of UVD, AMD is finally catching up to NVIDIA and Intel in H.264 decode capabilities. New to UVD is full support for 4K H.264 video, up to level 5.2 (4Kp60). AMD had previously intended to support 4K up to level 5.1 (4Kp30) on the previous version of UVD, but that never panned out and AMD ultimately disabled that feature. So as of GCN 1.2 hardware decoding of 4K is finally up and working, meaning AMD GPU equipped systems will no longer have to fall back to relatively expensive software decoding for 4K H.264 video.

On a performance basis this newest iteration of UVD is around 3x faster than the previous version. Using DXVA checker we benchmarked it as playing back a 1080p video at 331fps, or roughly 27x real-time. For 1080p decode it has enough processing power to decode multiple streams and then-some, but this kind of performance is necessary for the much higher requirements of 4K decoding.

Video Decode Performance

Speaking of which, we can confirm that 4K decoding is working like a charm. While Media Player Classic Home Cinema’s built-in decoder doesn’t know what to do for 4K on the new UVD, Windows’ built-in codec has no such trouble. Playing back a 4K video using that decoder hit 152fps, more than enough to play back a 4Kp60 video or two. For the moment this also gives AMD a leg-up over NVIDIA; while Kepler products can handle 4Kp30, their video decoders are too slow to sustain 4Kp60, which is something only Maxwell cards such as 750 Ti can currently do. So at least for the moment with R9 285’s competition being composed of Kepler cards, it’s the only enthusiast tier card capable of sustaining 4Kp60 decoding.

This new version of UVD also expands AMD’s supported codec set by 1 with the addition of hardware MJPEG decoding. AMD has previously implemented JPEG decoding for their APUs, so MJPEG is a natural extension of that. Though MJPEG is a fairly uncommon codec for most workloads these days, so outside of perhaps pro video I’m not sure how often this feature will get utilized.

What you won’t find though – and we’re surprised it’s not here – is support for H.265 decoding in any form. While we’re a bit too early for full fixed function H.265 decoders since the specification was only ratified relatively recently, both Intel and NVIDIA have opted to bridge the gap by implementing a hybrid decode mode that mixes software, GPU shader, and fixed function decoding steps. H.265 is still in its infancy, but given the increasingly long shelf lives of video cards, it’s a reasonable bet that Tonga cards will still be in significant use after H.265 takes off. But to give AMD some benefit of the doubt, since a hybrid mode is partially software anyhow, there’s admittedly nothing stopping them from implementing it in a future driver (NVIDIA having done just this for H.265 on Kepler).

Moving on, along with their video decode capabilities, AMD has also improved on their video encode capabilities for GCN 1.2 with a new version of their Video Codec Engine. AMD’s hardware video encoder has received a speed boost to improve its encoding performance at all levels, and after previously being limited to a maximum resolution of 1080p can now encode at resolutions up to 4K. Meanwhile by AMD’s metrics this new version of VCE should be capable of encoding 1080p up to 12x over real time.

A quick performance check finds that while the current version of Cyberlink’s MediaEspresso software isn’t handling 4K video decoding quite right, encoding from a 1080p source shows that the new VCE is roughly 40% faster than the old VCE in our test.

Video Encode Performance (1080p)

4K video is still rather new, so there’s little to watch and even less of a reason to encode. That of course will change over time, but in the meantime the most promising use of a hardware 4K encoder would be 4K gameplay recording through the AMD Gaming Evolved Client’s DVR function.

GCN 1.2: Geometry Performance & Color Compression Meet The Sapphire R9 285 Dual-X OC 2GB
POST A COMMENT

86 Comments

View All Comments

  • mczak - Wednesday, September 10, 2014 - link

    This is only partly true. AMD cards nowadays can stay at the same clocks in multimon as in single monitor mode though it's a bit more limited than GeForces. Hawaii, Tonga can keep the same low clocks (and thus idle power consumption) up to 3 monitors, as long as they all are identical (or rather more accurately probably, as long as they all use the same display timings). But if they have different timings (even if it's just 2 monitors), they will clock the memory to the max clock always (this is where nvidia kepler chips have an advantage - they will stay at low clocks even with 2, but not 3, different monitors).
    Actually I believe if you have 3 identical monitors, current kepler geforces won't be able to stick to the low clocks, but Hawaii and Tonga can, though unfortunately I wasn't able to find the numbers for the geforces - ht4u.net r9 285 review has the numbers for it, sorry I can't post the link as it won't get past the anandtech forum spam detector which is lame).
    Reply
  • Solid State Brain - Thursday, September 11, 2014 - link

    A twin monitor configuration where the secondary display is smaller / has a lower resolution than the primary one is a very common (and logic) usage scenario nowadays and that's what AMD should sort out first. I'm positively surprised that on newer Tonga GPUs if both displays are identical frequencies remain low (according to the review you pointed out), but I'm not going to purchase a different display (or limit my selection) to get advantage of that when there's no need to with equivalent NVidia GPUs. Reply
  • mczak - Thursday, September 11, 2014 - link

    Fixing this is probably not quite trivial. The problem is if you reclock the memory you can't honor memory requests for display scan out for some time. So, for single monitor, what you do is reclock during vertical blank. But if you have several displays with different timings, this won't work for obvious reasons, whereas if they have identical timings, you can just run them essentially in sync, so they have their vertical blank at the same time.
    I don't know how nvidia does it. One possibility would be a large enough display buffer (but I think it would need to be in the order of ~100kB or so, so not quite free in terms of hw cost).
    Reply
  • PEJUman - Thursday, September 11, 2014 - link

    I used multimonitor with AMD & NVIDIA cards. I would take that 30W hit if it means working well.
    NVIDIA: too aggressive with low power mode, if you have video on one screen & game on the other, it will remain at the clock speed of the 1st event (if you start the video before the game loading, it will be stuck at the video clocks).

    I used 780TI currently, R9 290x I had previously works better where it will always clock up...
    Reply
  • hulu - Wednesday, September 10, 2014 - link

    The conclusions section of Crysis: Warhead seems to be copy-pasted from Crysis 3. R9 285 does not in fact trail GTX 760. Reply
  • thepaleobiker - Wednesday, September 10, 2014 - link

    @Ryan - A small typo on the last page, last line of first paragraph - "Functionally speaking it’s just an R9 285 with more features"

    It should be R9 280, not 285. Just wanted to call it out for you! :)

    Bring on more Tonga, AMD!
    Reply
  • FriendlyUser - Wednesday, September 10, 2014 - link

    I would like to note that if memory compression is effective, it should not only improve bandwidth but also reduce the need for texture memory. Maybe 2GB with compression is closer to 3GB in practice, at least if the ~40% compression advantage is true.

    Obviously, there is no way to predict the future, but I think your conclusion concerning 2GB boards should take compression in account.
    Reply
  • Spirall - Wednesday, September 10, 2014 - link

    If GCN1.2 (instead of a GCN 2.0) is what AMD has to offer as the new arquitecture for their next year cards, Maxwell (based in 750Ti x 260X tests), will punch hard AMD in terms of performance per watt and production cost (not price) so their net income. Reply
  • shing3232 - Wednesday, September 10, 2014 - link

    750ti use a better 28nm process call HPM while rest of the 200 series use HPL , that's the reason why maxwell are so efficient. Reply
  • Spirall - Wednesday, September 10, 2014 - link

    I'm afraid this won't be enough (but hope it does). Anyway, as Nvidia is expected to launch their Maxwell 256 bits card nearby, we'll have the answer soon. Reply

Log in

Don't have an account? Sign up now