As part of this evening’s AMD Capsaicin event (more on that later), AMD’s Chief Architect and SVP of the Radeon Technologies Group has announced a new Radeon Pro card unlike anything else. Dubbed the Radeon Pro Solid State Graphics (SSG), this card includes M.2 slots for adding NAND SSDs, with the goal of vastly increasing the amount of local storage available to the video card.

Details are a bit thin and I’ll update this later this evening, but in short the card utilizes a Polaris 10 Fiji GPU and includes 2 PCIe 3.0 M.2 slots for adding flash drives to the card. These slots are then attached to the GPU (it’s unclear if there’s a PCIe switch involved or if it’s wired directly), which the GPU can then use as an additional tier of storage. I’m told that the card can fit at least 1TB of NAND – likely limited by M.2 MLC SSD capacities – which massively increases the amount of local storage available on the card.

As AMD explains it, the purpose of going this route is to offer another solution to the workset size limitations of current professional graphics cards. Even AMD’s largest card currently tops out at 32GB, and while this is a fair amount, there are workloads that can use more. This is particular the case for workloads with massive datasets (oil & gas), or as AMD demonstrated, scrubbing through an 8K video file.

Current cards can spill over to system memory, and while the PCIe bus is fast, it’s still much slower than local memory, plus it is subject to the latency of the relatively long trip and waiting on the CPU to address requests. Local NAND storage, by comparison, offers much faster round trips, though on paper the bandwidth isn’t as good, so I’m curious to see just how it compares to the real world datasets that spill over to system memory.  Meanwhile actual memory management/usage/tiering is handled by a combination of the drivers and developer software, so developers will need to code specifically for it as things stand.

For the moment, AMD is treating the Radeon Pro SSG as a beta product, and will be selling developer kits for it directly., with full availability set for 2017. For now developers need to apply for a kit from AMD, and I’m told the first kits are available immediately. Interested developers will need to have saved up their pennies though: a dev kit will set you back $9,999.

Update:

Now that AMD’s presentation is over, we have a bit more information on the Radeon Pro SSG and how it works.

In terms of hardware, the Fiji based card is outfit with a PCIe bridge chip – the same PEX8747 bridge chip used on the Radeon Pro Duo, I’m told – with the bridge connecting the two PCIe x4 M.2 slots to the GPU, and allowing both cards to share the PCIe system connection. Architecturally the prototype card is essentially a PCIe SSD adapter and a video card on a single board, with no special connectivity in use beyond what the PCIe bridge chip provides.

The SSDs themselves are a pair of 512GB Samsung 950 Pros, which are about the fastest thing available on the market today. These SSDs are operating in RAID-0 (striped) mode to provide the maximum amount of bandwidth. Meanwhile it turns out that due to how the card is configured, the OS actually sees the SSD RAID-0 array as well, at least for the prototype design.

To use the SSDs, applications need to be programmed using AMD’s APIs to recognize the existence of the local storage and that it is “special,” being on the same board as the GPU itself. Ultimately the trick for application developers is directly streaming resources from  the SSDs treating it as a level of cache between the DRAM and system storage. The use of NAND in this manner does not fit into the traditional memory hierarchy very well, as while the SSDs are fast, on paper accessing system memory is faster still. But it should be faster than accessing system storage, even if it’s PCIe SSD storage elsewhere on the system. Similarly, don’t expect to see frame buffers spilling over to NAND any time soon. This is about getting large, mostly static resources closer to the GPU for more efficient resource streaming.

To showcase the potential benefits of this solution, AMD had an 8K video scrubbing demonstration going, comparing performance between using a source file on the SSG’s local SSDs, and using a source file on the system SSD (also a 950 Pro).

The performance differential was actually more than I expected; reading a file from the SSG SSD array was over 4GB/sec, while reading that same file from the system SSD was only averaging under 900MB/sec, which is lower than what we know 950 Pro can do in sequential reads. After putting some thought into it, I think AMD has hit upon the fact that most M.2 slots on motherboards are routed through the system chipset rather than being directly attached to the CPU. This not only adds another hop of latency, but it means crossing the relatively narrow DMI 3.0 (~PCIe 3.0 x4) link that is shared with everything else attached to the chipset.

Though by and large this is all at the proof of concept stage. The prototype, though impressive in some ways in its own right, is really just a means to get developers thinking about the idea and writing their applications to be aware of the local storage. And this includes not just what content to put on the SSG's SSDs, but also how to best exploit the non-volatile nature of its storage, and how to avoid unnecessary thrashing of the SSDs and burning valuable program/erase cycles. The SSG serves an interesting niche, albeit a limited one: scenarios where you have a large dataset and you are somewhat sensitive to latency and want to stay off of the PCIe bus, but don't need more than 4-5GB/sec of read bandwidth. So it'll be worth keeping an eye on this to see what developers can do with it.

In any case, while AMD is selling dev kits now, expect some significant changes by the time we see the retail hardware in 2017. Given the timeframe I expect we’ll be looking at much more powerful Vega cards, where the overall GPU performance will be much greater, and the difference in performance between memory/storage tiers is even more pronounced.

Source: AMD

Comments Locked

120 Comments

View All Comments

  • MLSCrow - Tuesday, July 26, 2016 - link

    I'm really sorry that you don't see a point to this. I'm equally sorry that you feel that working on a new and brilliantly innovative technology that can provide benefit for certain situations has no "practical applications" and is completely "pointless". If you cannot see any benefit to this, then I'm sorry for you lack of vision.

    To spend money on R&D on a new and innovative product that doesn't necessarily target a mainstream market really isn't the move of a "desperate" company. It's the move of a company with vision that is moving forward, a company that is willing to spend money on inventing new things, which realistically, no desperate company would waste money on. A desperate company would be as stringent and efficient about their bottom line as possible. The fact that they are doing these things is a sign that the company is or is getting back on track and doing well. Their new GPU's are incredible. Their new Zen CPU's are going to be incredible. Their stock is soaring. They're gaining market share and investor confidence as well as confidence in themselves, and it's showing.

    Again, this isn't to replace VRAM, as that is the fastest buffer there is, but it will provide a new and fast buffer between system storage and VRAM that can store much more information that system RAM. Lets say, that on average, the typical power use has 16GB of system RAM. 4K videos can be over 10 times that space. 8K Videos and beyond will push you to over half a Terabyte. Having two SSD's in RAID 0, allows the fastest buffer than can store the entire file, skipping many steps required to send the data to the GPU. This can provide a large benefit to video editors, reducing latency by reducing the amount of hops, potentially reducing stuttering and other related issues and is an option that I feel is a great addition to a GPU. To offer customization to a GPU is a first and honestly, who wouldn't want the option of customization if you can have it?

    I don't need an extra gas tank in my car, I have one already, but if you told me I could have an additional one that stores way more gas than not only my default tank, but more gas than a gas station itself, with better, higher octane gas than I could get from a gas station, which will then improve the performance of my vehicle as well as reduce the frequency with which I need to stop to fill up, pffft, who wouldn't want that? Be realistic. Your negative comments are excessive, leaving one to conclude that you have bias, which then discredits your position.
  • ddriver - Tuesday, July 26, 2016 - link

    I'd say there are more reasons to feel sorry about yourself:

    1 - you don't see how this is technically redundant
    2 - you can't present any meaningful scenario what will show benefit
    3 - you claim that "Their new Zen CPU's are going to be incredible", and while I say "it better be" making such claims reveals you as a fanboy, so your level of intellect is already in question.
    4 - you are awful at making analogies, a more apt analogy would be to integrate a fuel tank into the engine to "bring gas closer to the engine, because that hose to the tank is so long and it has latency" which even you should now is entirely, 100% unnecessary.
    5 - you can't discern the difference between basic concepts such as negativism vs realism, efficiency vs redundancy and innovation vs desperation.

    Once again AMD are wasting budget on unnecessary stuff, and it is not like they are in a position to afford it. This concept could work nicely with something BETTER than flash nand, like for example Optane (again, if it lives up to the hype) - this is HBM all over again, they went for HBM prematurely, and not because they were innovative, but because they were desperate, their GPUs were poor performers and power hogs, and HBM offered a tiny performance boost and TDP drop, but it wasn't a game changer, Fury did not become the king of gaming, nor even a particularly well selling product.

    Few months from now PCIe v4 will be mainstream and it will double the bandwidth, further increasing the amount of data you can stream between PCIe devices (may I add directly without having to go through the CPU), further rendering this whole idea even less beneficial than it already is.
  • MLSCrow - Tuesday, July 26, 2016 - link

    ddriver, you've discredited yourself with your trolling. If you don't like this technology from AMD, then simply don't invest in it. Those that have a need, will. Everything else you have to say is /yawn at this point. Make something better. Prove that it's worthless. Run tests, show your results, and then get back to us about how pointless it is, if you can.
  • close - Wednesday, July 27, 2016 - link

    ddriver doesn't understand this kind of stuff. He reads diagonally and imagines GPU X + Y GB VRAM + SSD must be worse than GPU X + Y GB VRAM + system storage simply because he doesn't understand tiering and thinks that if an SSD is slower than RAM than adding it as a lower tier must somehow slow down the higher tiers o_O.

    As a sidenote to better understand this sorry dude he insists that his "design" for a 5.25" hard drive is better (faster/cheaper) than anything on the market but there's a conspiracy among storage manufacturers to keep HDDs and SSDs small, slow and expensive. Keep this in mind before wasting too much time writing long replies to him. You'll just confuse him. ;)

    Yes indeed he is biased. But not necessarily towards a specific company (although AMD is always the good target) but towards the engineers that actually put products on the market while he twiddles his thumbs in frustration on Anandtech.
  • ddriver - Wednesday, July 27, 2016 - link

    You are a real nitwit aren't you? How hard is it to understand that nand flash is the bottleneck here, and that an SSD on the GPU will not be any faster than a SSD on PCIe? That PCIe v3, much less the upcoming v4 has ample bandwidth and negligible latency which would in no way impede the GPU performance, that PCIe has DMA and GPUs are asynchronous and transferring data will not take away from GPU work cycles - all you have to is implement a basic good old buffering. It is so simple and obvious, yet you fail to grasp it.

    But hey why don't you repeat a few more times how silly is my concept of independent head 5.25" HDDs? I am sure that will make you feel better, since that such a substantiated argument ;) Oh wait, that's right, you haven't and couldn't possibly substantiate it because you lack the intellect to grasp it, much less to discredit it. You know no better than what the greedy and lazy industry crams down your throat to milk you for profit. And anything better is outside of your little cozy box of mainstream conformism, and you are not a big "outside the box thinker", come to think of it, you aren't exactly an "inside the box thinker" either, you are just a repeater, who wouldn't know innovation if it took a dunp on your face :D
  • close - Thursday, July 28, 2016 - link

    No, it actually makes me sad that guys like you think that some shitty napkin calculations are always better than anybody else's, including accomplished engineers with a lot more to show for it then you.

    Which reminds me of this: http://www.commitstrip.com/en/2016/06/02/thank-god...

    Now it's my turn: does insisting with your shitty ideas that somehow never seem to pan out make you feel better about yourself? I mean you're everywhere on AT commenting, using big words but for some reason you always miss the mark. That's some pent up frustration right there.
  • jabber - Friday, July 29, 2016 - link

    Oh don't worry, he'll probably call it revolutionary when Nvidia does it with a Quadro card.
  • fanofanand - Tuesday, August 2, 2016 - link

    With how hard times have been for HDD manufacturers, if they had a silver bullet you don't think they would have used it by now? Any additional moving parts inside a HDD enclosure increases failure points. Get over yourself.
  • msroadkill612 - Saturday, April 29, 2017 - link

    Speaking from down the pike a bit, you have been prescient sir. Well said too.

    I dont get it either. How can u violently hate something they patently dont understand, and then have to bore and bad vibe everybody with it?

    Whats not to like? A GPU with onboard 4GBps+~?, immediately adjacent to the gpu & vram for extra speed, and using ~no system resources. A true graphics co-processor.
  • Samus - Tuesday, July 26, 2016 - link

    Incredibly innovative idea from AMD. I don't expect that much anymore, but I love surprises.

    And although the dev kits are unreasonably priced, at retail there isn't a good reason "SSG" cards would command more than a $200 premium over non-SSG equivalents, especially since this could all be done native within Polaris GPU, the circuitry routing and additional power requirements are negligible, and they probably won't bundle SSD m.2 drives.

    The real issue is going to be potential benefits. It's hard to believe this will be any faster than going over an ultra fast PCIe 3.0 interface to another PCIe 3.0 device (the m.2 drive) just a few inches away. It isn't like any of this stuff is saturating the bus, and NAND is just too slow to be used in latency-intensive tasks where VRAM is required.

    Can't wait for the benchmarks.

Log in

Don't have an account? Sign up now