History: Where GDDR5 Reaches Its Limits

To really understand HBM we’d have to go all the way back to the first computer memory interfaces, but in the interest of expediency and sanity, we’ll condense that lesson down to the following. The history of computer and memory interfaces is a consistent cycle of moving between wide parallel interfaces and fast serial interfaces. Serial ports and parallel ports, USB 2.0 and USB 3.1 (Type-C), SDRAM and RDRAM, there is a continual process of developing faster interfaces, then developing wider interfaces, and switching back and forth between them as conditions call for.

So far in the race for PC memory, the pendulum has swung far in the direction of serial interfaces. Though 4 generations of GDDR, memory designers have continued to ramp up clockspeeds in order to increase available memory bandwidth, culminating in GDDR5 and its blistering 7Gbps+ per pin data rate. GDDR5 in turn has been with us on the high-end for almost 7 years now, longer than any previous memory technology, and in the process has gone farther and faster than initially planned.

But in the cycle of interfaces, the pendulum has finally reached its apex for serial interfaces when it comes to GDDR5. Back in 2011 at an AMD video card launch I asked then-graphics CTO Eric Demers about what happens after GDDR5, and while he expected GDDR5 to continue on for some time, it was also clear that GDDR5 was approaching its limits. High speed buses bring with them a number of engineering challenges, and while there is still headroom left on the table to do even better, the question arises of whether it’s worth it.


AMD 2011 Technical Forum and Exhibition

The short answer in the minds of the GPU community is no. GDDR5-like memories could be pushed farther, both with existing GDDR5 and theoretical differential I/O based memories (think USB/PCIe buses, but for memory), however doing so would come at the cost of great power consumption. In fact even existing GDDR5 implementations already draw quite a bit of power; thanks to the complicated clocking mechanisms of GDDR5, a lot of memory power is spent merely on distributing and maintaining GDDR5’s high clockspeeds. Any future GDDR5-like technology would only ratchet up the problem, along with introducing new complexities such as a need to add more logic to memory chips, a somewhat painful combination as logic and dense memory are difficult to fab together.

The current GDDR5 power consumption situation is such that by AMD’s estimate 15-20% of Radeon R9 290X’s (250W TDP) power consumption is for memory. This being even after the company went with a wider, slower 512-bit GDDR5 memory bus clocked at 5GHz as to better contain power consumption. So using a further, faster, higher power drain memory standard would only serve to exacerbate that problem.

All the while power consumption for consumer devices has been on a downward slope as consumers (and engineers) have made power consumption an increasingly important issue. The mobile space, with its fixed battery capacity, is of course the prime example, but even in the PC space power consumption for CPUs and GPUs has peaked and since come down some. The trend is towards more energy efficient devices – the idle power consumption of a 2005 high-end GPU would be intolerable in 2015 – and that throws yet another wrench into faster serial memory technologies, as power consumption would be going up exactly at the same time as overall power consumption is expected to come down, and individual devices get lower power limits to work with as a result.

Finally, coupled with all of the above has been issues with scalability. We’ll get into this more when discussing the benefits of HBM, but in a nutshell GDDR5 also ends up taking a lot of space, especially when we’re talking about 384-bit and 512-bit configurations for current high-end video cards. At a time when everything is getting smaller, there is also a need to further miniaturize memory, something that GDDR5 and potential derivatives wouldn’t be well suited to resolve.

The end result is that in the GPU memory space, the pendulum has started to swing back towards parallel memory interfaces. GDDR5 has been taken to the point where going any further would be increasingly inefficient, leading to researchers and engineers looking for a wider next-generation memory interface. This is what has led them to HBM.

AMD Dives Deep On High Bandwidth Memory HBM: Wide & Slow Makes It Fast
Comments Locked

163 Comments

View All Comments

  • WinterCharm - Tuesday, May 19, 2015 - link

    Exactly. And science and common sense have shown again and again, if you eliminate the bottlenecks, you can get significant performance gains.

    That's why SSD's are so great.
  • AndrewJacksonZA - Thursday, May 28, 2015 - link

    What @dew111 said.
  • i7 - Tuesday, May 19, 2015 - link

    Wouldn't you see higher memory configs much like the 970 memory config 'fiasco' with greater than 4GB on another substrate or another entirely different configuration?
  • dew111 - Tuesday, May 19, 2015 - link

    No. The current HBM stacks come in a fixed capacity, and the Fiji chip will only have so many lanes. Also, it is unlikely an OEM would venture into designing (and funding) their own interposer; this probably won't happen for at least a few years (if ever).
  • akamateau - Monday, June 8, 2015 - link

    Actually an OEM can not design an Interposer with a memory controller. AMD owns that patent.

    Interposer having embedded memory controller circuitry
    US 20140089609 A1
    " For high-performance computing systems, it is desirable for the processor and memory modules to be located within close proximity for faster communication (high bandwidth). Packaging chips in closer proximity not only improves performance, but can also reduce the energy expended when communicating between the processor and memory. It would be desirable to utilize the large amount of "empty" silicon that is available in an interposer. "

    AMD has pretty sewn up the concept of an Interposer being just a substarte with vias to stack and connect silicon.

    Besides it would also be unlikely for OEM to be able to purchase unpackaged cpu or memory silicon for their own stacks. And why? Their manfacturing costs would be far higher
  • eachus - Friday, May 22, 2015 - link

    Don't forget the HBM1 vs. HBM2 change/upgrade that is coming. Will HMB2 show up late this year? Or early next year? Your guess. AMD will then be able to ship cards with twice the bandwidth--and four times the memory. My guess is that AMD plans a "mid-life kicker" for Fiji later this year taking it to 8 GBytes but still at HBM1 clock speeds. Then Greenland comes along with 16 Gig and HBM2 speeds.

    BTW don't knock the color compression technology. It makes (slightly) more work for the GPU, but reduces memory and bandwidth requirements. When working at 4K resolutions and beyond, it becomes very significant.
  • chizow - Tuesday, May 19, 2015 - link

    GTA5 does go over 4GB at 1440p, as do a number of other next-gen games like Assassin's Creed Unity, Shadows of Mordor, Ryse, I am sure Witcher 3 does as well. 6GB is probably safe for this gen until 14/16nm FinFET, 8GB safest, 12GB if you want no doubts. We also don't know what DX12 is going to do to VRAM requirements.

    Its not about fitting the actual frame buffer, its about holding and storing textures locally in VRAM so that the GPU has access them without going to System RAM or worst, local storage. Hi-Res 4K and 8K textures are becoming more common which increases storage footprint 4 fold and 16 fold over 2K, so more VRAM is always going to be welcome.
  • silverblue - Tuesday, May 19, 2015 - link

    That compression had better be good, then.
  • testbug00 - Tuesday, May 19, 2015 - link

    According to NVidia, without gameworks, 1440p at max settings 980 is the recommended card. And with gameworks Titan X/SLI 970.

    2160p w/out gameworks recommend Titan X/SLI 980. Even at 2160p w/ Gameworks they still recommend 980 SLI.

    Based on that my WAG is that TWIII uses under 4GB of VRAM at 2160. I'm guessing bringing Gameworks in pushes it just near the 4GB limit on 980. Probably in the 39xx range.
  • chizow - Tuesday, May 19, 2015 - link

    Can't say for sure as I don't have TW3 yet, but based on screenshots I wouldn't be surprised at all to see it break 4GB. In any case, games and drivers will obviously do what they can to work around any VRAM limitations, but as we have seen, it is not an ideal situation. I had a 980 and 290X long enough to know there were plenty of games dancing close enough to that 4GB ceiling at 1440p to make it too close for comfort.

Log in

Don't have an account? Sign up now