While AMD and NVIDIA are consistently revising their GPU architectures, for the most part the changes they make are just that: revisions. It’s only once in a great while that a GPU architecture is thrown out entirely, which makes the arrival of a new architecture a monumental occasion in the GPU industry. The last time we saw this happen was in 2006/2007, when unified shaders and DirectX 10 lead to AMD and NVIDIA developing brand new architectures for their GPUs. Since then there have been some important revisions such as AMD’s VLIW4 architecture and NVIDIA’s Fermi architecture, but so far nothing has quite compared to 2006/2007, until now.

At AMD’s Fusion Developer Summit 2011 AMD announced Graphics Core Next, their next-generation GPU architecture. GCN would be AMD’s Fermi moment, where AMD got serious about GPU computing and finally built an architecture that would serve as both a graphics workhorse and a computing workhorse. With the ever increasing costs of high-end GPU development it’s not enough to merely develop graphics GPUs, GPU developers must expand into GPU computing in order to capture the market share they need to live well into the future.

At the same time, by canceling their 32nm process TSMC has directed a lot of hype about future GPU development onto the 28nm process, where the next generation of GPUs would be developed. In an industry accustomed to rapid change and even more rapid improvement never before have GPU developers and their buyers had to wait a full 2 years for a new fabrication process to come online.

All of this has lead to a perfect storm of anticipation for what has become the Radeon HD 7970: not only is it the first video card based on a 28nm GPU, but it’s the first member of the Southern Islands and by extension the first video card to implement GCN. As a result the Radeon HD 7970 has a tough job to fill, as a gaming card it not only needs to deliver the next-generation performance gamers expect, but as the first GCN part it needs to prove that AMD’s GCN architecture is going to make them a competitor in the GPU computing space. Can the 7970 do all of these things and live up to the anticipation? Let’s find out…

AMD GPU Specification Comparison
  AMD Radeon HD 7970 AMD Radeon HD 6970 AMD Radeon HD 6870 AMD Radeon HD 5870
Stream Processors 2048 1536 1120 1600
Texture Units 128 96 56 80
ROPs 32 32 32 32
Core Clock 925MHz 880MHz 900MHz 850MHz
Memory Clock 1.375GHz (5.5GHz effective) GDDR5 1.375GHz (5.5GHz effective) GDDR5 1.05GHz (4.2GHz effective) GDDR5 1.2GHz (4.8GHz effective) GDDR5
Memory Bus Width 384-bit 256-bit 256-bit 256-bit
Frame Buffer 3GB 2GB 1GB 1GB
FP64 1/4 1/4 N/A 1/5
Transistor Count 4.31B 2.64B 1.7B 2.15B
Manufacturing Process TSMC 28nm TSMC 40nm TSMC 40nm TSMC 40nm
Price Point $549 $350 $160 -

The Radeon HD 7970 is a card of many firsts. It’s the first video card using a 28nm GPU. It’s the first card supporting Direct3D 11.1. It’s the first member of AMD’s new Southern Islands Family. And it’s the first video card implementing AMD’s Graphics Core Next architecture. All of these attributes combine to make the 7970 quite a different video card from any AMD video card before it.

Cutting right to the chase, the 7970 will serve as AMD’s flagship video card for the Southern Islands family. Based on a complete AMD Tahiti GPU, it has 2048 stream processors organized according to AMD’s new SIMD-based GCN architecture. With so many stream processors coupled with a 384bit GDDR5 memory bus, it’s no surprise that Tahiti is has the highest transistor count of any GPU yet: 4.31B transistors. Fabricated on TSMC’s new 28nm High-K process, this gives it a die size of 365mm2, making it only slightly smaller than AMD’s 40nm Cayman GPU at 389mm2.

Looking at specifications specific to the 7970, AMD will be clocking it at 925MHz, giving it 3.79TFLOPs of theoretical computing performance compared to 2.7TFLOPs under the much different VLIW4 architecture of the 6970. Meanwhile the wider 384bit GDDR5 memory bus for 7970 will be clocked at 1.375GHz (5.5GHz data rate), giving it 264GB/sec of memory bandwidth, a significant jump over the 176GB/sec of the 6970.

These functional units are joined by a number of other elements, including 8 ROP partitions that can process 32 ROPs per clock, 128 texture units divided up among 32 Compute Units (CUs), and a fixed function pipeline that contains a pair of AMD’s 9th generation geometry engines. Of course all of this hardware would normally take quite a bit of power to run, but thankfully power usage is kept in check by the advancements offered by TSMC’s 28nm process. AMD hasn’t provided us with an official typical board power, but we estimate it’s around 220W, with an absolute 250W PowerTune limit. Meanwhile idle power usage is looking particularly good, as thanks to AMD's further work on power savings their typical power consumption under idle is only 15W. And with AMD's new ZeroCore Power technology (more on that in a bit), idle power usage drops to an asbolutely miniscule 3W.

Overall for those of you looking for a quick summary of performance, the 7970 is quite powerful, but it may not be as powerful as you were expecting. Depending on the game being tested it’s anywhere between 5% and 35% faster than NVIDIA’s GeForce GTX 580, averaging 15% to 25% depending on the specific resolution in use. Furthermore thanks to TSMC’s 28nm process power usage is upwards of 50W lower than the GTX 580, but it’s still higher than the 6970 it replaces. As far as performance jumps go from new fabrication processes, this isn’t as big a leap as we’ve seen in the past.

In a significant departure from the launch of the Radeon HD 5870 and 4870, AMD will not be pricing the 7970 nearly as aggressively as those cards with its launch. The MSRP for the 7970 will be $550, a premium price befitting a premium card, but a price based almost exclusively on the competition (e.g. the GTX 580) rather than one that takes advantage of cheaper manufacturing costs to aggressively undercuts the competition. In time AMD needs to bring down the price of the card, but for the time being they will be charging a price premium reflecting the card’s status as the single-GPU king.

For those of you trying to decide whether to get a 7970, you will have some time to decide. This is a soft launch; AMD will not make the 7970 available until January 9th (the day before the Consumer Electronics Show), nearly 3 weeks from now. We don’t have any idea what the launch quantities will be like, but from what we hear TSMC’s 28nm process has finally reached reasonable yields, so AMD should be in a better position than the 5870 launch. The price premium on the card will also help taper demand side some, though even at $550 this won’t rule out the first batch of cards selling out.

Beyond January 9th, AMD as an entire family of Southern Islands video cards still to launch. AMD will reveal more about those in due time, but as with the Evergreen and Northern Islands families AMD has a plan to introduce a number of video cards over the next year. So 7970 is just the beginning.

Winter 2011 GPU Pricing Comparison
  $750 GeForce GTX 590
Radeon HD 6990 $700  
Radeon HD 7970 $549  
  $500 GeForce GTX 580
Radeon HD 6970 $350 GeForce GTX 570
Radeon HD 6950 2GB $250  
  $240 GeForce GTX 560 Ti
Radeon HD 6870 $160  


A Quick Refresher: Graphics Core Next


View All Comments

  • CeriseCogburn - Thursday, March 08, 2012 - link

    Interesting, amd finally copied nvidia...
    " This problem forms the basis of this benchmark, and the NQueen test proves once more that AMD's Radeon HD 7970 tremendously benefits from leaving behind the VLIW architecture in complex workloads. Both the HD 7970 and the GTX 580 are nearly twice as fast as the older Radeons. "

    When we show diversity we should also show that amd radeon has been massively crippled for a long time except when "simpleton" was the key to speed. "Superior architecture" actually means "simple and stupid" - hence "fast" at repeating simpleton nothings, but unable to handle "complex tasks".
    LOL - the dumb gpu by amd has finally "evolved".
  • chizow - Thursday, December 22, 2011 - link

    ....unfortunately its going to be pitted against Kepler for the long haul.

    There's a lot to like about Southern Islands but I think its going to end up a very similar situation as Evergreen vs. Fermi, where Evergreen released sooner and took the early lead, but Fermi ultimately won the generation. I expect similar with Tahiti holding the lead for the next 3-6 months until Kepler arrives, but Kepler and its refresh parts winning this 28nm generation once they hit the streets.

    Overall the performance and changes AMD made with Tahiti look great compared to Northern Islands, but compared to Fermi parts, its just far less impressive. If you already owned an AMD NI or Evergreen part, there'd be a lot of reason to upgrade, but if you own a Fermi generation Nvidia card there's just far less reason to, especially at the asking price.

    I do like how AMD opened up the graphics pipeline with Tahiti though, 384-bit bus, 3GB framebuffer, although I wonder if holding steady with ROPs hurts them compared to Kepler. It would've also been interesting to see how the 3GB GTX 580 compared at 2560 since the 1.5GB model tended to struggle even against 2GB NI parts at that resolution.
  • ravisurdhar - Thursday, December 22, 2011 - link

    My thoughts exactly. Can't wait to see what Kepler can do.

    Also...4+B transistors? mind=blown. I remember when we were ogling over 1B. Moore's law is crazy.... :D
  • johnpombrio - Wednesday, December 28, 2011 - link

    Exactly. If you look at all the changes that AMD did on the card, I would have expected better results: the power consumption decrease with the Radeon 7970 is mainly due to the die shrink to 28nm. NVidia is planning on a die shrink of their existing Fermi architecture before Kepler is released:


    Another effect of the die shrink is that clock speed usually increases as there is less heat created at the lower voltage needed with a smaller transistor.

    The third change that is not revolutionary is the bump of AMD's 7970's memory bus from 384 bits (matching the 580) from the 6970's 256 bits along with 3GB DDR5 memory vs the GTX580's 1.5GB and the 6970's 2GB.

    The final non revolutionary change is bumping the number of stream processors by 33% from 1,536 to 2,048.

    Again, breaking out my calculator, the 35% bump in the number of stream processors ALONE causes the increase in the change in the benchmark differences between the 7970 and the 6970.

    The higher benchmark, however, does not show ANY OTHER large speed bumps that SHOULD HAVE OCCURED due to the increase in the memory bus size, the higher amount of memory, compute performance, texture fill rate, or finally the NEW ARCHITECTURE.

    If I add up all the increases in the technology, I would have expected benchmarks in excess of 50-60% over the previous generation. Perhaps I am naive in how much to expect but, hell, a doubling of transistor count should have produced a lot more than a 35% increase. Add the new architecture, smaller die size, and more memory and I am underwhelmed.
  • CeriseCogburn - Thursday, March 08, 2012 - link

    Well, we can wait for their 50%+ driver increase package+ hotfixes - because after reading that it appears they are missing the boat in drivers by a wide margin.
    Hopefully a few months after Kepler blows them away, and the amd fans finally allow themselves to complain to the proper authorities and not blame it on Nvida, they will finally come through with a "fix" like they did when the amd (lead site review mastas) fans FINALLY complained about crossfire scaling....
  • KaarlisK - Thursday, December 22, 2011 - link

    What is the power consumption with multiple monitors? Previously, you could not downclock GDDR5, so the resulting consumption was horrible. Reply
  • Ryan Smith - Thursday, December 22, 2011 - link

    "On that note, for anyone who is curious about idle clockspeeds and power consumption with multiple monitors, it has not changed relative to the 6970. When using a TMDS-type monitor along with any other monitor, AMD has to raise their idle clockspeeds from 350MHz core and 600Mhz memory to 350MHz core and the full 5.5GHz speed for memory, with the power penalty for that being around 30W. Matched timing monitors used exclusively over DisplayPort will continue to be the only way to be able to use multiple monitors without incurring an idle penalty." Reply
  • KaarlisK - Thursday, December 22, 2011 - link

    Thank you for actually replying :)
    I am so sorry for having missed this.
  • ltcommanderdata - Thursday, December 22, 2011 - link

    Great review.

    Here's hoping that AMD will implement 64-bit FP support across the whole GCN family and not just the top-end model. Seeing AMD's mobile GPUs don't use the highest-end chip, settling for the 2nd highest and lower, there hasn't been 64-bit FP support in AMD mobile GPUs since the Mobility HD4800 series. I'm interested in this because I can then dabble in some 64-bit GPGPU programming on the go. It also has implications for Apple since their iMacs stick to mobile GPUs, so would otherwise be stuck without 64-bit FP support which presumably could be useful for some of their professional apps.

    In regards to hardware accelerated Megatexture, is it directly applicable to id Tech 5's OpenGL 3.2 solution? ie. Will id Tech 5 games see an immediate speed-up with no recoding needed? Or does Partially Resident Texture support require a custom AMD specific OpenGL extension? If it's the later, I can't see it going anywhere unless nVidia agrees to make it a multivendor EXT extension.
  • Ryan Smith - Thursday, December 22, 2011 - link

    Games will need to be specifically coded for PRT; it won't benefit any current games. And you are correct in that it will require and AMD OpenGL extension to use (it won't be accessible from D3D at this time). Reply

Log in

Don't have an account? Sign up now