History: Where GDDR5 Reaches Its Limits

To really understand HBM we’d have to go all the way back to the first computer memory interfaces, but in the interest of expediency and sanity, we’ll condense that lesson down to the following. The history of computer and memory interfaces is a consistent cycle of moving between wide parallel interfaces and fast serial interfaces. Serial ports and parallel ports, USB 2.0 and USB 3.1 (Type-C), SDRAM and RDRAM, there is a continual process of developing faster interfaces, then developing wider interfaces, and switching back and forth between them as conditions call for.

So far in the race for PC memory, the pendulum has swung far in the direction of serial interfaces. Though 4 generations of GDDR, memory designers have continued to ramp up clockspeeds in order to increase available memory bandwidth, culminating in GDDR5 and its blistering 7Gbps+ per pin data rate. GDDR5 in turn has been with us on the high-end for almost 7 years now, longer than any previous memory technology, and in the process has gone farther and faster than initially planned.

But in the cycle of interfaces, the pendulum has finally reached its apex for serial interfaces when it comes to GDDR5. Back in 2011 at an AMD video card launch I asked then-graphics CTO Eric Demers about what happens after GDDR5, and while he expected GDDR5 to continue on for some time, it was also clear that GDDR5 was approaching its limits. High speed buses bring with them a number of engineering challenges, and while there is still headroom left on the table to do even better, the question arises of whether it’s worth it.


AMD 2011 Technical Forum and Exhibition

The short answer in the minds of the GPU community is no. GDDR5-like memories could be pushed farther, both with existing GDDR5 and theoretical differential I/O based memories (think USB/PCIe buses, but for memory), however doing so would come at the cost of great power consumption. In fact even existing GDDR5 implementations already draw quite a bit of power; thanks to the complicated clocking mechanisms of GDDR5, a lot of memory power is spent merely on distributing and maintaining GDDR5’s high clockspeeds. Any future GDDR5-like technology would only ratchet up the problem, along with introducing new complexities such as a need to add more logic to memory chips, a somewhat painful combination as logic and dense memory are difficult to fab together.

The current GDDR5 power consumption situation is such that by AMD’s estimate 15-20% of Radeon R9 290X’s (250W TDP) power consumption is for memory. This being even after the company went with a wider, slower 512-bit GDDR5 memory bus clocked at 5GHz as to better contain power consumption. So using a further, faster, higher power drain memory standard would only serve to exacerbate that problem.

All the while power consumption for consumer devices has been on a downward slope as consumers (and engineers) have made power consumption an increasingly important issue. The mobile space, with its fixed battery capacity, is of course the prime example, but even in the PC space power consumption for CPUs and GPUs has peaked and since come down some. The trend is towards more energy efficient devices – the idle power consumption of a 2005 high-end GPU would be intolerable in 2015 – and that throws yet another wrench into faster serial memory technologies, as power consumption would be going up exactly at the same time as overall power consumption is expected to come down, and individual devices get lower power limits to work with as a result.

Finally, coupled with all of the above has been issues with scalability. We’ll get into this more when discussing the benefits of HBM, but in a nutshell GDDR5 also ends up taking a lot of space, especially when we’re talking about 384-bit and 512-bit configurations for current high-end video cards. At a time when everything is getting smaller, there is also a need to further miniaturize memory, something that GDDR5 and potential derivatives wouldn’t be well suited to resolve.

The end result is that in the GPU memory space, the pendulum has started to swing back towards parallel memory interfaces. GDDR5 has been taken to the point where going any further would be increasingly inefficient, leading to researchers and engineers looking for a wider next-generation memory interface. This is what has led them to HBM.

AMD Dives Deep On High Bandwidth Memory HBM: Wide & Slow Makes It Fast
Comments Locked

163 Comments

View All Comments

  • Horza - Tuesday, May 19, 2015 - link

    TW3 doesn't even get close, highest VRAM usage I've seen is ~2.3GB @1440p everything ultra AA on etc. In fact of all the games you mentioned Shadows of Mordor is the only one that really pushes past 4GB @1440p in my experience (without unplayable levels of MSAA ). If that makes much difference to playability is another thing entirely, I've played Shadows on a 4GB card @1440p and it wasn't a stuttery mess or anything. It's hard to know without framerate/frametime testing if a specific game is using VRAM because it can or because is really requires it.

    We've been through a period of rapid VRAM requirement expansion but I think things are going to plateau soon like they did with the ports from previous console generation.
  • chizow - Wednesday, May 20, 2015 - link

    I just got TW3 free with Nvidia's Titan X promotion and it doesn't seem to be pushing upward of 3GB, but the rest of the games absolutely do. Are you enabling AA? GTA5, Mordor, Ryse (with AA/SSAA), and Unity all do push over 4GB at 1440p. Also, any game that has heavy texture modding, like Skyrim, appreciates the extra VRAM.

    Honestly I don't think we have hit the ceiling yet, the consoles are the best indication of this as they have 8GB of RAM, which is generally allocated as 2GB/6GB for CPU/GPU, so you are looking at ~6GB to really be safe, and we still haven't seen what DX12 will offer. Given many games are going to single large resources like megatextures, being able to load the entire texture to local VRAM would obviously be better than having to stream it in using advanced methods like bindless textures.
  • przemo_li - Thursday, May 21, 2015 - link

    False.

    It would be better to ONLY stream what will be needed!

    And that is why DX12/Vulkan will allow just that. App's will tell DX which part to stream.

    Wholesale streaming will only be good if whole resource will be consumed.

    This is benefit of bindless, only transfer what You will use.
  • chizow - Thursday, May 21, 2015 - link

    False, streaming from system RAM or slower resources is non-optimal compared to keeping it in local VRAM cache. Simply put, if you can eliminate streaming, you're going to get a better experience and more timely data accesses, plain and simple.
  • testbug00 - Tuesday, May 19, 2015 - link

    A quick check on hardOCP Shows max settings on Titan X with just under 4GB of RAM used at 1440p. To keep it playable, you had to turn down the settings slightly.
    http://www.hardocp.com/article/2015/05/04/grand_th...

    You certainly can push VRAM usage over 4GB at 1440/1600p, but, generally speaking, it appears that it would push the game into not being fluid.

    Having at least 6GB is 100% the safe spot. 4GB is pushing it.
  • chizow - Tuesday, May 19, 2015 - link

    Those aren't max settings, not even close. FXAA is being used, turn it up to just 2xMSAA or MFAA with Nvidia and that breaks 4GB easily.

    Source: I own a Titan X and play GTA5 at 1440p.

    Also, the longer you play, the more you load, the bigger your RAM and VRAM footprint. And this is a game that launched on last-gen consoles in 2013, so to think 4GB is going to hold up for the life of this card with DX12 on the horizon is not a safe bet, imo.
  • Mark_gb - Sunday, May 24, 2015 - link

    Do not forget the color compression that AMD designed into their chips. Its in Fuji. In addition, AMD assigned some engineers to work on ways to use the 4GB of memory more efficiently, since in the past AMD viewed the memory as free memory since it kept expanding, and wasn't really needed, so they had never bothered to assign anyone to make memory usage efficient. Now with a team having worked on that issue, which will work just with the drivers making changes to memory usage and allocation more efficiently, 4GB will be enough.
  • xthetenth - Tuesday, May 19, 2015 - link

    The Tonga XT x2 with HBM rumor is insane if you're suggesting the one I think you are. First off the chip has a GDDR memory controller, and second if the CF profile doesn't work out a 290X is a better card.
  • chizow - Tuesday, May 19, 2015 - link

    I do think its crazy but the more I read the more credibility there is to that rumor lol. Btw, memory controllers can often support more than 1 standard, not uncommon at all. In fact, most of AMD's APUs can support HBM per their own whitepapers and I do believe there was a similar leak last year that was the basis of the rumors Tonga would launch with HBM.
  • tuxRoller - Wednesday, May 20, 2015 - link

    David Kanter really seemed certain that amd was going to bring 8GB of HBM.

Log in

Don't have an account? Sign up now