GCN 1.2: Geometry Performance & Color Compression

Instruction sets aside, Radeon R9 285 is first and foremost a graphics and gaming product, so let’s talk about what GCN 1.2 brings to the table for those use cases.

Through successive generations of GPU architectures AMD has been iterating on and improving their geometry hardware, both at the base level and in the case of geometry generated through tessellation. This has alternated between widening the geometry frontends and optimizing the underlying hardware, with the most recent update coming in the GCN 1.1 based Hawaii, which increased AMD’s geometry processor count at the high end to 4 processors and implemented some buffering enhancements.

For Tonga AMD is bringing that 4-wide geometry frontend from Hawaii, which like Hawaii immediately doubles upon Tahiti’s 2-wide geometry frontend. Not stopping there however, AMD is also implementing a new round of optimizations to further improve performance. GCN 1.2’s geometry frontend includes improved vertex reuse (for better performance with small triangles) and improved work distribution between the geometry frontends to better allocate workloads between them.

At the highest level Hawaii and Tonga should be tied for geometry throughput at equivalent clockspeeds, or roughly 2x faster than Tahiti. However in practice due to these optimizations Tonga’s geometry frontend is actually faster than Hawaii’s in at least some cases, as our testing has discovered.

Comparing the R9 290 (Hawaii), R9 285 (Tonga), and R9 280 (Tahiti) in TessMark at various tessellation factors, we have found that while Tonga trails Hawaii at low tessellation factors – and oddly enough even Tahiti – at high tessellation factors the tables are turned. With x32 and x64 tessellation, the Tonga based R9 285 outperforms both cards in this raw tessellation test, and at x64 in particular completely blows away Hawaii, coming close to doubling its tessellation performance.

At the x64 tessellation factor we see the R9 285 spit out 134fps, or equivalent to roughly 1.47B polygons/second. This is as compared to 79fps (869M Polys/sec) for the R9 290, and 68fps (748M Polys/sec) for the R9 280. One of the things we noted when initially reviewing the R9 290 series was that AMD’s tessellation performance didn’t pick up much in our standard tessellation benchmark (Tessmark at x64) despite the doubling of geometry processors, and it looks like AMD has finally resolved that with GCN 1.2’s efficiency improvements. As this is a test with a ton of small triangles, it looks like we’ve hit a great case for the vertex reuse optimizations.

Meanwhile AMD’s other GCN 1.2 graphics-centric optimization comes at the opposite end of the rendering pipeline, where the ROPs and memory controllers lie. As we mentioned towards the start of this article, one of the notable changes between the R9 280 and R9 285 is that the latter utilizes a smaller 256-bit memory bus versus the R9 280’s larger 384-bit memory bus, and as a result has around 27% less memory bandwidth than the R9 280. Under most circumstances such a substantial loss in memory bandwidth would result in a significant performance hit, so for AMD to succeed Tahiti with a smaller memory bus, they needed a way to be able to offset that performance loss.

The end result is that GCN 1.2 introduces a new color compression method for its ROPs, to reduce the amount of memory bandwidth required for frame buffer operations. Color compression itself is relatively old – AMD has had color compression in some form for almost 10 years now – however GCN 1.2 iterates on this idea with a color compression method AMD is calling “lossless delta color compression.”

Since AMD is only meeting us half-way here we don’t know much more about what this does. Though the fact that they’re calling it delta compression implies that AMD has implemented a further layer of compression that works off of the changes (deltas) in frame buffers, on top of the discrete compression of the framebuffer. In this case this would not be unlike modern video compression codecs, which between keyframes will encode just the differences to reduce bandwidth requirements (though in AMD’s case in a lossless manner).

AMD’s own metrics call for a 40% gain in memory bandwidth efficiency, and if that is the average case it would more than make up for the loss of memory bandwidth from working on a narrower memory bus. We’ll see how this plays out over our individual games over the coming pages, but it’s worth noting that even our most memory bandwidth-sensitive games hold up well compared to the R9 280, never losing anywhere near the amount of performance that such a memory bandwidth reduction would imply (if they lose performance at all).

Tonga’s Microarchitecture – What We’re Calling GCN 1.2 GCN 1.2 – Image & Video Processing
POST A COMMENT

86 Comments

View All Comments

  • felaki - Wednesday, September 10, 2014 - link

    The article says that the Sapphire card has "1x DL-DVI-I, 1x DL-DVI-D, 1x HDMI, and 1x DisplayPort". Can you be more precise as to which versions of the spec are supported? Is it HDMI 1.4 or HDMI 2.0? I believe since this refers to MST, it's only HDMI 1.4 and a DisplayPort connection is required in MST mode for 4K@60Hz output?

    Reading the recent GPU articles, I'm very puzzled why HDMI 2.0 adoption is still lacking in GPUs and displays, even though the spec has been out there for about a year now. Is the PC industry reluctant to adopt HDMI 2.0 for some (political(?), business(?)) reason? I have heard only bad things about DisplayPort 1.2 MST to carry a 4K@60Hz signal, and I'm thinking it's a buggy hack for a transitional tech period.

    If the AMD newest next-gen graphics card only supports HDMI 1.4, that is mind-boggling. Please tell me I'm confused and this is a HDMI 2.0-capable release?
    Reply
  • Ryan Smith - Wednesday, September 10, 2014 - link

    DisplayPort 1.2 and HDMI 1.4. Tonga does not add new I/O options. Reply
  • felaki - Wednesday, September 10, 2014 - link

    Thanks for clarifying this! Reply
  • Penti - Wednesday, September 10, 2014 - link

    You can do 4K SST on both Nvidia and AMD-cards as long as they are DisplayPort 1.2 capable. It depends on your screen. There is no HDMI 600MHz on any graphics processor. Neither is their much of support from monitors or TVs as most don't do 600MHz. Reply
  • felaki - Wednesday, September 10, 2014 - link

    Thanks! I was not actually aware that SST existed. I see here http://community.amd.com/community/amd-blogs/amd-g... that AMD is referring to SST as being the thing to fix up the 4K issue, although the people in the comments on that link refer that the setup is not working properly.

    How do people generally see SST? Should one defer buying a new system now until proper HDMI 2.0 support comes along, or is SST+DisplayPort 1.2 already a glitch-free user experience for 4K@60Hz?
    Reply
  • Kjella - Wednesday, September 10, 2014 - link

    Got 3840x2160x60Hz using SST/DP and it's been fine, except UHD gaming is trying to kill my graphics card. Reply
  • mczak - Wednesday, September 10, 2014 - link

    DP SST 4k/60Hz should be every bit as glitch free as proper hdmi 2.0 (be careful though with the latter since some 4k TVs claiming to accept 60Hz 4k resolutions over hdmi will only do so with ycbcr 4:2:0). DP SST has the advantage that actually even "old" gear on the graphic card side can do it (such as radeons from the HD 6xxx series - from the hw side, if it could do DP MST 4k/60Hz it should most likely be able to do the same with SST too, the reason why MST hack was needed in the first place is entirely on the display side).
    But if you're planning to attach your 4k TV to your graphic card a DP port might not be of much use since very few TVs have that.
    Reply
  • Solid State Brain - Wednesday, September 10, 2014 - link

    I won't get another AMD video card until idle multimonitor consumption gets fixed. According to other websites, power consumption in such case increases substantially whereas NVidia video cards have almost the same consumption as when using a single display. In the case of the Sapphire 285 Dual-X it increases by almost 30W just by having a second display connected!!

    I think Anandtech should start measuring idle power consumption when more than one display is connected to the video card / multimonitor configurations. It's an important information for many users who not only game but also need to have productivity needs.
    Reply
  • Solid State Brain - Wednesday, September 10, 2014 - link

    And of course, a comment editing function would be useful too. Reply
  • shing3232 - Wednesday, September 10, 2014 - link

    well, AMD video card have to run higher frequency with multiscreen than with a single monitor Reply

Log in

Don't have an account? Sign up now