The Fiji GPU: Go Big or Go Home

Now that we’ve had a chance to take a look at the architecture backing Fiji, let’s talk about the Fiji GPU itself.

Fiji’s inclusion of High Bandwidth Memory (HBM) technology complicates the picture somewhat when talking about GPUs. Whereas past GPUs were defined by the GPU die itself and then the organic substrate package it sits on, the inclusion of HBM requires a third layer, the silicon interposer. The job of the interposer is to sit between the package and the GPU, serving as the layer that connects the on-package HBM memory stacks with the GPU. Essentially a very large chip without any expensive logic on it, the silicon interposer allows for finer, denser signal routing than organic packaging is capable of, making the ultra-wide 4096-bit HBM bus viable for the first time.

We’ll get to HBM in detail in a bit, but it’s important to call out the impact of HBM and the interposer early, since they have a distinct impact on how Fiji was designed and what its capabilities are.

As for Fiji itself, Fiji is unlike any GPU built before by AMD, and not only due to the use of HBM. More than anything else, it’s simply huge, 596mm2 to be precise. As we mentioned in our introduction, AMD has traditionally shied away from big chips, even after the “small die” era ended, and for good reason. Big chips are expensive to develop, expensive to produce, take longer to develop, and yield worse than small chips (this being especially the case early-on for 40nm). Altogether they’re riskier than smaller chips, and while there are times where they are necessary, AMD has never reached this point until now.

The end result is that for the first time since the unified shader era began, AMD has gone toe-to-toe with NVIDIA on die size. Fiji’s 596mm2 die size is just 5mm2 (<1%) smaller than NVIDIA’s GM200, and more notably still hits TSMC’s 28nm reticle limit. TSMC can’t build chips any bigger than this; Fiji is as big a chip as AMD can order.

AMD Big GPUs
  Die Size Native FP64 Rate
Fiji (GCN 1.2) 596mm2 1/16
Hawaii (GCN 1.1) 438mm2 1/2
Tahiti (GCN 1.0) 352mm2 1/4
Cayman (VLIW4) 389mm2 1/4
Cypress (VLIW5) 334mm2 1/5
RV790 (VLIW5) 282mm2 N/A

Looking at Fiji relative to AMD’s other big GPUs, it becomes very clear very quickly just how significant this change is for AMD. When Hawaii was released in 2013 at 438mm2, it was already AMD’s biggest GPU ever for its time. And yet Fiji dwarfs it, coming in at 158mm2 (36%) larger. The fact that Fiji comes at the latter-half of the 28nm process’s life time means that such a large GPU is not nearly as risky now as it would have been in 2011/2012 (NVIDIA surely took some licks internally on GK110), but still, nothing else we can show you today can really sell the significance of Fiji to AMD as much as the die size can.

And the fun doesn’t stop there. Along with producing the biggest die they could, AMD has also more or less gone the direction of NVIDIA and Maxwell in the case of Fiji, building what is unambiguously the most gaming/FP32-centric GPU the company could build. With GCN supporting power-of-two FP64 rates between 1/2 and 1/16, AMD has gone for the bare minimum in FP64 performance that their architecture allows, leading to a 1/16 FP64 rate on Fiji. This is a significant departure from Hawaii, which implemented native support for ½ rate, and on consumer parts offered a handicapped 1/8 rate. Fiji will not be a FP64 powerhouse – its 4GB of VRAM is already perhaps too large of a handicap for the HPC market – so instead we get AMD’s best FP32 GPU going against NVIDIA’s best FP32 GPU.

AMD’s final ace up their sleeve on die size is HBM. Along with HBM’s bandwidth and power benefits, HBM is also much simpler to implement, requiring less GPU space for PHYs than GDDR5 does. This is in part due to the fact that HBM stacks have their own logic layer, distributing some of the logic on to each stack, and furthermore a benefit of the fact that the signaling logic that remains doesn’t have to be nearly as complex since the frequencies are so much lower. 4096-bits of HBM PHYs still takes up a fair bit of space – though AMD won’t tell us how much – but it’s notably lower than the amount of space AMD was losing to Hawaii’s GDDR5 memory controllers.

The end result is that not only has AMD built their biggest GPU ever, but they have done virtually everything they can to maximize the amount of die space they get to allocate to FP32 and rendering resources. Simply put, AMD has never reached so high and aimed for parity with NVIDIA in this manner.

Ultimately this puts Fiji’s transistor count at 8.9 billion transistors, even more than the 8 billion transistors found in NVIDIA’s GM200, and, as expected, significantly more than Hawaii’s 6.2 billion. Interestingly enough, on a relative basis this is almost exactly the same increase we saw with Hawaii; Fiji packs in 43.5% more transistors than Hawaii, and Hawaii packed in 43.9% more transistors than Tahiti. So going by transistors alone, Fiji is very much to Hawaii what Hawaii was to Tahiti.

Finally, as large as the Fiji GPU is, the silicon interposer it sits on is even larger. The interposer measures 1011mm2, nearly twice the size of Fiji. Since Fiji and its HBM stacks need to fit on top of it, the interposer must be very large to do its job, and in the process it pushes its own limits. The actual interposer die is believed to exceed the reticle limit of the 65nm process AMD is using to have it built, and as a result the interposer is carefully constructed so that only the areas that need connectivity receive metal layers. This allows AMD to put down such a large interposer without actually needing a fab capable of reaching such a large reticle limit.

What’s interesting from a design perspective is that the interposer and everything on it is essentially the heart and soul of the GPU. There is plenty of power regulation circuitry on the organic package and even more on the board itself, but within the 1011mm2 floorplan of the interposer, all of Fiji’s logic and memory is located. By mobile standards it’s very nearly an SoC in and of itself; it needs little more than external power and I/O to operate.

Fiji’s Architecture: The Grandest of GCN 1.2 Fiji’s Layout
Comments Locked

458 Comments

View All Comments

  • D. Lister - Thursday, July 2, 2015 - link

    "AMD had tessellation years before nVidia, but it went unused until DX11, by which time nVidia knew AMD's capabilities and intentionally designed a way to stay ahead in tessellation. AMD's own technology being used against it only because it released it so early. HBM, I fear, will be another example of this. AMD helped to develop HBM and interposer technologies and used them first, but I bet nVidia will benefit most from them."

    AMD is often first at announcing features. Nvidia is often first at implementing them properly. It is clever marketing vs clever engineering. At the end of the day, one gets more customers than the other.
  • sabrewings - Thursday, July 2, 2015 - link

    While you're right that Nvidia paid for the chips used in 980 Tis, they're still most likely not fit for Titan X use and are cut to remove the underperforming sections. Without really knowing what their GM200 yields are like, I'd be willing to be the $1000 price of the Titan X was already paying for the 980 Ti chips. So, Nvidia gets to play with binned chips to sell at $650 while AMD has to rely on fully up chips added to an expensive interposer with more expensive memory and a more expensive cooling solution to meet the same price point for performance. Nvidia definitely forced AMD into a corner here, so as I said I would say they won.

    Though, I don't necessarily say that AMD lost, they just make it look much harder to do what Nvidia was already doing and making bookoo cash at that. This only makes AMD's problems worse as they won't get the volume to gain marketshare and they're not hitting the margins needed to heavily reinvest in R&D for the next round.
  • Kutark - Friday, July 3, 2015 - link

    So basically what you're saying is Nvidia is a better run company with smarter people working there.
  • squngy - Friday, July 3, 2015 - link

    "and they cost more per chip to produce than AMD's Fiji GPU."

    Unless AMD has a genie making it for them that's impossible.
    Not only is fiji larger, it also uses a totally new technology (HBM).
  • JumpingJack - Saturday, July 4, 2015 - link

    "AMD had tessellation years before nVidia, but it went unused until DX11, by which time nVidia knew AMD's capabilities and intentionally designed a way to stay ahead in tessellation. AMD's own technology being used against it only because it released it so early. HBM, I fear, will be another example of this. AMD helped to develop HBM and interposer technologies and used them first, but I bet nVidia will benefit most from them."

    AMD fanboys make it sound like AMD can actually walk on water. AMD did work with Hynix, but the magic of HBM comes in the density from die stacking, which AMD did nothing (they are no longer the actual chipmaker as you probably know). As for interposers, this is not new technology, interposers are well established techniques for condensing an array of devices into one package.

    AMD deserves credit for bringing the technology to market, no doubt, but their actually IP contribution is quite small.
  • ianmills - Thursday, July 2, 2015 - link

    Good that you are feeling better Ryan and thanks for the review :)
    That being said Anandtech needs keep us better informed when things come up.... The way this site handled it though is gonna lose this site readers...
  • Kristian Vättö - Thursday, July 2, 2015 - link

    Ryan tweeted about the Fiji schedule several times and we were also open about it in the comments whenever someone asked, even though it wasn't relevant to the article in question. It's not like we were secretive about it and I think a full article of an article delay would be a little overkill.
  • sabrewings - Thursday, July 2, 2015 - link

    Those tweets are even featured on the site in the side bar. Not sure how much clearer it could get without an article about a delayed article.
  • testbug00 - Sunday, July 5, 2015 - link

    Pipeline story... Dunno title, but, for text, explain it there. Have a link to THG as owned by same company now if readers want to read a review immediately.

    Twitter is non-ideal.
  • funkforce - Monday, July 6, 2015 - link

    The problem isn't only with the delays, it is that since Ryan took over as Editor in Chief I suspect his workload is too large.
    Because this also happened with the Nvidia GTX 960 review. He told 5-6 people (including me) for 5 weeks that it would come, and then it didn’t and he stopped responding to inquires about it.
    Now in what way is that a good way to build a good relationship and trust between you and your readers?
    I love Ryan's writing, this article was one of the best I've read in a long time. But not everyone is good at everything, maybe Ryan needs to focus on only GPU reviews and not running the site or whatever his other responsibilities are as Edit. in Chief.

    Because the Reviews are what most ppl. come here for and what built this site. You guys are amazing, but AT never used to miss releasing articles the same day NDA was lifted in the past that I can remember. And promising things and then not delivering, sticking your head in the sand and not even apologizing isn’t a way to build up trust and uphold and strengthen the large following this site has.

    I love this site, been reading it since the 1st year it came out, and that's why I care and I want you to continue and prosper.
    Since a lot of ppl. can’t reed the twitter feed then what you did here: http://www.anandtech.com/show/8923/nvidia-launches...
    Is the way to go if something comes up, but then you have to deliver on your promises.

Log in

Don't have an account? Sign up now