Radeon Instinct Hardware: Polaris, Fiji, Vega

Diving deeper into matters, let’s talk about the Radeon Instinct cards themselves. The Instinct cards are for all practical purposes a successor (or spin-off) to AMD’s current FirePro S series cards, so if you are familiar with AMD’s hardware there, then you know what to expect. Passively cooled cards geared for large scale server installations, offered across a range of power and performance options.

As this is a new product line the Instinct cards don’t have any immediate predecessors in AMD’s FirePro S lineup, but unsurprisingly, AMD has structured their new family of server cards similar to how NVIDIA has structured their P4/P40/P100 lineup of deep learning cards. All told, AMD is announcing 3 cards today, all 3 which tap different AMD GPUs, and are (roughly) named after their expected performance levels.

AMD Radeon Instinct
  Instinct MI6 Instinct MI8 Instinct MI25
Memory Type 16GB GDDR5 4GB HBM "High Bandwidth Cache and Controller"
Memory Bandwidth 224GB/sec 512GB/sec ?
Single Precision (FP32) 5.7 TFLOPS 8.2 TFLOPS 12.5 TFLOPS
Half Precision (FP16) 5.7 TFLOPS 8.2 TFLOPS 25 TFLOPS
TDP <150W <175W <300W
Cooling Passive Passive
(SFF)
Passive
GPU Polaris 10 Fiji Vega
Manufacturing Process GloFo 14nm TSMC 28nm ?

Starting things off, we have the Radeon Instinct MI6. This is a Polaris 10 card analogous to the consumer RX 480. As Polaris doesn’t have much in the way of special capabilities for deep learning (more on this in a second), AMD is pitching the card as their baseline card for neural network inference (execution). At 5.7 TFLOPS (FP16 or FP32) it will draw under 150W, and while pricing for the family hasn’t been announced, I believe it’s a safe bet that as the baseline card the MI6 will offer the best performance per dollar across the Instinct family.

Meanwhile in an unexpected move, AMD will be keeping their 2015 Fiji GPU around for the second card, the Instinct MI8. This card is for all intents and purposes a rebranded Radeon R9 Nano, AMD’s power tuned Fiji card that has proven quite popular with their server customers. Within the Instinct lineup, it is essentially an unusual variant to the MI6, offering higher throughput and greatly increased memory bandwidth for only a small increase in power consumption, with the drawback of Fiji’s 4GB VRAM limitation. Since it offers better performance than the MI6 and is smaller to boot, I expect we’ll see AMD pitch the MI8 as a premium alterative for inference.

The MI6 and MI8 will be going up against NVIDIA’s P4 and P40 accelerators. AMD’s cards don’t directly line-up against the NVIDIA cards in power consumption or expected performance, so the competitive landscape is somewhat broad, but those are the cards AMD will need to dethrone in the inference landscape. One potential issue here that I’m waiting to see if and how AMD addresses closer to the launch of the Instinct family will be the lack of high-speeds modes for lower precision operations. The competing Tesla cards can process 8-bit integer (INT8) operations at up to 4x speed, something the MI6 and MI8 Instinct cards can’t do. INT8 is something of a special case, but if NVIDIA’s expectations for inferencing with INT8 come to pass, then it means AMD has to compete more strongly on price than performance.

Last, but certainly not least in the Instinct family is the most powerful card of them all, and arguably the cornerstone of what the family is meant to become: the MI25. This is based on AMD’s forthcoming Vega GPU family, and while AMD is not sharing much in the way of new details on Vega today, they are leaving no doubts that this is going to be a high performance card. The passively cooled card is rated for sub-300W operation, and based on AMD performance projections elsewhere, AMD makes it clear that they’re targeting 25 TFLOPS FP16 (12.5 TFLOPS FP32) performance.

Significantly, of the few things AMD is saying about Vega right now, is that they’re confirming that it supports packed math formats for FP16 operations. This is something that first appears in Sony’s Playstation 4 Pro, with a strong hint that it was a feature of a future AMD architecture, and now this has been confirmed.

With AMD pitching the MI25 as a training accelerator, offering a packed math mode for FP16 is critical to the product. Neural network training very rarely requires higher precision FP32 math, which is otherwise the default for GPUs. Instead, FP16 is suitably precise for a process that is inherently imprecise, and as a result offering a fast FP16 mode makes the card significantly faster at its intended task. Coupled with the already high throughput rates of GPUs due to their wide arrays of ALUs, and this is what makes GPUs so potent at neural network training.

As AMD’s sole training card, the MI25 will be going up against NVIDIA’s flagship accelerator, the Tesla P100. And as opposed to the inference cards, this has the potential to be a much closer fight. AMD has parity on packed instructions, with performance that on paper would exceed the P100. AMD has yet to fully unveil what Vega can do – we have no idea what “NCU” stands for or what AMD’s “high bandwidth cache and controller” are all about – but on the surface there’s the potential for the kind of knock-down fight at the top that makes for an interesting spectacle. And for AMD the stakes are huge; even if they can’t necessarily win, being able to price the MI25 even remotely close to the P100 would give them huge margins. More practically speaking, it means they could afford to significantly undercut NVIDIA in this space to capture market share while still making a tidy profit.

On a final note, while AMD isn’t commenting on the future of FirePro S or other server GPU products – so it’s not clear if Instinct will be their entire server GPU backbone or only part of it – it’s interesting to note that they are pointing out that one of the ways they intend to stand out from NVIDIA is to not restrict their virtualization support to certain cards.

In other words, if Instinct does end up being AMD’s sole line of server cards, then these cards will be fully capable of serving the virtualization market just as well as the deep learning markets.

AMD Announces Radeon Instinct: GPU Accelerators for Deep Learning Software, Servers, & Closing Thoughts
POST A COMMENT

39 Comments

View All Comments

  • The_Assimilator - Monday, December 12, 2016 - link

    *AMD recycling products intensifies*

    Well, I guess they gotta do something with all those Fiji chips they produced and nobody wanted.
    Reply
  • JoeyJoJo123 - Monday, December 12, 2016 - link

    To be fair, it's Fiji with HBM on the die, so it does at least that which the Polaris chips don't. Reply
  • MLSCrow - Monday, December 12, 2016 - link

    Actually, lots of people wanted it, but it was a little too pricey. Side from that Google just bought a bunch of these actually and Alibaba may have done the same, even prior to this announcement. Reply
  • Demiurge - Monday, December 12, 2016 - link

    Nvidia does the same thing with Kepler cards and previously Fermi when Kepler was the norm... Also, I would encourage you to try a GTX 710 or 730 if you think AMD recycling is bad. Reply
  • evilspoons - Tuesday, December 13, 2016 - link

    I bought a half-height GT 730 when the GK208 version launched, but telling it apart from the GF108 version was idiotic. I can't believe they didn't call it a GT 735 or something. I had to read the core configuration on the side of the box, the clock speeds, and so on to avoid buying the Fermi part (I wanted the Kepler video decoder for my home theatre PC). Bleargh. Reply
  • jjj - Monday, December 12, 2016 - link

    "Naples doesn’t have an official launch date"

    Zen server is Q2.
    Reply
  • doggface - Monday, December 12, 2016 - link

    Oh nelly. Vega looks to be very interesting... Reply
  • ddriver - Monday, December 12, 2016 - link

    It looks like it won't be setting any efficiency records though. Adding the interconnect to maximize FP16 throughput guts efficiency as expected.

    The result is that for FP32 for Fiji we have 8.2 tflops in 175W budget at 28nm and for Vega 12.5 tflops in 300W budget at 14nm.

    In other words, process is scaled down twice, TDP budget is increased almost twice, but performance gains are only 66% or so. That's fairly modest. I'd expect even if not mature yet, process alone outta result in a 40% boost at the very least, and the expanded TDP headroom another 50%, so close to 90% at the very least. But that's just the cost of maximizing FP16 throughput, for their own sake I hope this instinct will be a different chip overall rather than just re-branding, cuz that would mean the workstation and compute workflows will needlessly suffer for the sake of a feature that is irrelevant in those fields.
    Reply
  • Drumsticks - Monday, December 12, 2016 - link

    It's not too bad, I think. The P100 offers 18.7 TF of half precision performance at about 250W, so AMD in theory is ahead of Nvidia on the efficiency curve here, offering around 35% more FLOPs for 20% more power. Now, AMD TF != Nvidia TF, especially in gaming, but there's probably a chance to expect that AMD could achieve better hardware efficiency in a compute environment than in a gaming one. Reply
  • Yojimbo - Monday, December 12, 2016 - link

    I don't think it's correct to compare the efficiency of the MI25 with the P100. Rather it should be compared efficiency-wise with the P40, as strong FP64 is not something that's been mentioned for the MI25 as far as I see.. Note that the P40 uses GDDR5 and not HBM2, which reduces its efficiency. I know the P40 doesn't have FP16 support but I don't think the MI25 will really be competing much with the Pascal generation of Tesla cards except after they are offered at a lower price once the Volta generation of cards are available. These Radeon cards are not just drop-in replacements for NVIDIA's hardware. Even assuming AMD can produce the MI25 in volume in Q2 2017, it will take a bit of testing and validation before people are willing to use it en mass in servers. Users also have to think about software and middleware considerations.

    In any case, they seem to be claiming efficiency close to the P40, which is a bit surprising. What we do know is that AMD claimed strong efficiency with Polaris before it was released and they overstated their claims. For me, I am taking their claims with a grain of salt until the product is actually released.
    Reply

Log in

Don't have an account? Sign up now