One of the more interesting consequences of GPUs being built on TSMC’s 28nm process for an extended period of time is that it has forced both vendors to compensate and compromise in order to have product lines that cover the nearly 5 year span. Traditional upgrade cycles got thrown out of the window, and instead we saw a number of refreshes and updates, culminating in both AMD and NVIDIA taking their top GPUs right to the 28nm reticle limit of ~600mm2. Such large GPUs have typically been the crossover point between graphics and compute parts, incorporating high-end features such as ECC memory and faster double precision (FP64) compute capabilities. However for the reticle riders, AMD and NVIDIA went another route, building what is arguably the ultimate graphics GPUs with the highest FP32 performance possible.

I mention this because it puts the GPU vendors into the position of doing unconventional things with their GPUs. Nowhere is this more evident than in the new FirePro card AMD is announcing today. The FirePro S9300 X2 is the latest entry into the FirePro S series lineup, and it marks the first (and possibly only) time we’ll see AMD’s Fiji GPU used to power an HPC-grade compute card. The end result is an interesting product that at times will be wickedly powerful for a 300W card, and at other times will have to cope with the abilities and limitations of a GPU that wasn’t designed for the traditional HPC market.

AMD FirePro S Series Specification Comparison
  FirePro S9300 X2 FirePro S9170 FirePro S9150 FirePro S9000
Stream Processors 2 x 4096 2816 2816 1792
Boost Clock 850MHz 930MHz 900MHz 900MHz
Memory Clock 1Gbps HBM 5Gbps GDDR5 5Gbps GDDR5 5.5Gbps GDDR5
Memory Bus Width 2 x 4096-bit 512-bit 512-bit 384-bit
VRAM 2 x 4GB 32GB 16GB 6GB
FP32 13.9 TFLOPs 5.2 TFLOPs 5.1 TFLOPs 3.2 TFLOPs
FP64 0.8 TFLOPs
(1/16)
2.6 TFLOPs
(1/2)
2.5 TFLOPs
(1/2)
0.8 TFLOPs
(1/4)
Transistor Count 2 x 8.9B 6.2B 6.2B 4.31B
TDP 300W 275W 235W 225W
Cooling Passive Passive Passive Passive
Target Market HPC HPC HPC HPC + VDI
Manufacturing Process TSMC 28nm TSMC 28nm TSMC 28nm TSMC 28nm
Architecture GCN 1.2 GCN 1.1 GCN 1.1 GCN 1.0
GPU Fiji Hawaii Hawaii Tahiti
Launch Date Q2 2016 07/2015 08/2014 08/2012
Launch Price $5999 $3999 N/A N/A

As alluded to by the name, the S9300 X2 is a dual Fiji card, integrating a pair of AMD’s last and most powerful 28nm GPUs. In the interests of delivering a more efficient 300W card, AMD clocks S9300 X2’s GPUs at 850MHz, giving the card a theoretical 13.9 TFLOPs of FP32 compute performance. Meanwhile on the memory side AMD leaves the card’s HBM memory untouched, with each GPU getting 512GB/sec of memory bandwidth, for an aggregate 1TB/sec of bandwidth. Like its graphics counterpart, the Radeon Pro Duo, the S9300 X2 is designed to be the fastest thing available in a single card, at least for the niche where Fiji shines.

Since making its consumer debut nine months ago, I have been pondering whether AMD would attempt to deploy Fiji in a FirePro card. Fiji is arguably built for graphics first and foremost; its FP64 performance is capped at 1/16th FP32 performance, it lacks ECC memory, and its limited to just 4GB of memory per GPU. Given the expectations set by “traditional” HPC cards such as the FirePro S9170 – which offers 4-8x the memory and 3x the FP64 performance – Fiji seemingly can’t stack up. However in building the ultimate graphics GPU, AMD also built the ultimate FP32 compute GPU – one that on paper delivers far more FP32 performance than any other HPC card – and this is where the company will be running with this card.

The end result is that the S9300 X2 is an interesting niche product designed for a certain market segments that need strong FP32 performance above all else – and everything else held equal, don’t use massive data sets. It’s a somewhat narrow niche as a result, but one AMD believes they can do very well in given what kind of FP32 performance S9300 X2 is capable of, especially as NVIDIA doesn’t have an FP32 HPC-focused dual-GPU card of their own.

If you follow the HPC market then the market segments AMD is going after should sound familiar to you. Oil and gas (geosciences) has long been a FP32-centric field – something NVIDIA exploited a few years back as well with the Tesla K10 – and AMD will be chasing after this market with the S9300 X2. AMD will also be trying to push farther into the neural network market, and this is an area where the S9300 X2 may be uniquely suited. Popular GPU neural network implementations don’t use FP32 math, rather they use even lower precision FP16 math. And though the S9300 X2’s FP16 throughput is merely equal to its FP32 throughput, internally Fiji supports natively storing FP16 data types, which will significantly reduce register pressure on the card, and register pressure is almost always a concern for HPC kernel development.

AMD will also be looking to exploit the products of their Boltzmann Initiative – now formally called the Radeon Open Compute Platform (ROCm) – which will be near or at production quality by the time the S9300 X2 ships. With AMD’s newest card providing the necessary muscle at the hardware level, the company is looking towards ROCm’s heterogeneous compiler to close the gap with NVIDIA on the software side, with the HIPify tools to further bridge that gap by giving developers the means to port their CUDA applications over to AMD’s platform. AMD has already seen some success with ROCm with the geosciences firm CGG, and they’re hoping to continue this trend as the ROCm platform reaches production quality.

Wrapping things up, when it’s released the S9300 X2 will take its place alongside the rest of AMD’s FirePro S series lineup. Continuing to ship alongside it will be the S9100 series cards, which are based on AMD’s Hawaii GPU and compliment the S9300 X2 with traditional HPC-centric features such as ECC memory and high performance FP64. The FirePro S9300 X2 will be shipping this quarter with an MSRP of $5999.

POST A COMMENT

19 Comments

View All Comments

  • Pork@III - Thursday, March 31, 2016 - link

    Too small VRAM and tooooo small FP64 Reply
  • bill.rookard - Thursday, March 31, 2016 - link

    Depends on the job at hand. Yes, FP64 is not good, but FP32/FP16 should be phenomenally good. Reply
  • Drumsticks - Thursday, March 31, 2016 - link

    I think you missed the point of the entire article... Reply
  • ImSpartacus - Thursday, March 31, 2016 - link

    He's just trolling. Anyone that reads the article gets it immediately. Reply
  • xthetenth - Thursday, March 31, 2016 - link

    An amazing card in a niche is worth releasing, especially for AMD because they're not the default choice in the market. So it's better for them to have something that really stands out in a niche than an also-ran generalist that can be safely ignored. Reply
  • ChefJeff789 - Thursday, March 31, 2016 - link

    Agreed. If you really need this, you'll be willing to pay the higher cost, which is great for AMD and for the consumer, since they get a killer card for the job. Reply
  • Samus - Thursday, March 31, 2016 - link

    Nobody is buying this for Solidworks, Pork. RTFA. Reply
  • gruffi - Thursday, March 31, 2016 - link

    If the performance is good it's irrelevant if the VRAM is small or large. And the card targets FP16 and FP32 areas. Have you even read the article? Reply
  • Marcelo Viana - Thursday, March 31, 2016 - link

    not gruffi, on servers memory comes first than performance, since you can put as many cards as you can, but the memory, limit the size of the job you can do on the entire cluster.
    But as the article says, have a niche where 4GVram is enough, so the performance comes very handy.
    Reply
  • prtskg - Friday, April 01, 2016 - link

    And yet it already has secured a contract -
    http://www.amd.com/en-us/press-releases/Pages/fire...
    Reply

Log in

Don't have an account? Sign up now