AMD Radeon HD 7970 Review: 28nm And Graphics Core Next, Together As One

Name: AMD Radeon HD 7970 Review: 28nm And Graphics Core Next, Together As One
Item: AMD Radeon HD 7970 Review: 28nm And Graphics Core Next, Together As One
Author: Ryan Smith

by Ryan Smith on December 22, 2011 12:00 AM EST

292 Comments | Add A Comment

292 Comments

PCI Express 3.0: More Bandwidth For Compute

It may seem like it’s still fairly new, but PCI Express 2 is actually a relatively old addition to motherboards and video cards. AMD first added support for it with the Radeon HD 3870 back in 2008 so it’s been nearly 4 years since video cards made the jump. At the same time PCI Express 3.0 has been in the works for some time now and although it hasn’t been 4 years it feels like it has been much longer. PCIe 3.0 motherboards only finally became available last month with the launch of the Sandy Bridge-E platform and now the first PCIe 3.0 video cards are becoming available with Tahiti.

But at first glance it may not seem like PCIe 3.0 is all that important. Additional PCIe bandwidth has proven to be generally unnecessary when it comes to gaming, as single-GPU cards typically only benefit by a couple percent (if at all) when moving from PCIe 2.1 x8 to x16. There will of course come a time where games need more PCIe bandwidth, but right now PCIe 2.1 x16 (8GB/sec) handles the task with room to spare.

So why is PCIe 3.0 important then? It’s not the games, it’s the computing. GPUs have a great deal of internal memory bandwidth (264GB/sec; more with cache) but shuffling data between the GPU and the CPU is a high latency, heavily bottlenecked process that tops out at 8GB/sec under PCIe 2.1. And since GPUs are still specialized devices that excel at parallel code execution, a lot of workloads exist that will need to constantly move data between the GPU and the CPU to maximize parallel and serial code execution. As it stands today GPUs are really only best suited for workloads that involve sending work to the GPU and keeping it there; heterogeneous computing is a luxury there isn’t bandwidth for.

The long term solution of course is to bring the CPU and the GPU together, which is what Fusion does. CPU/GPU bandwidth just in Llano is over 20GB/sec, and latency is greatly reduced due to the CPU and GPU being on the same die. But this doesn’t preclude the fact that AMD also wants to bring some of these same benefits to discrete GPUs, which is where PCI e 3.0 comes in.

With PCIe 3.0 transport bandwidth is again being doubled, from 500MB/sec per lane bidirectional to 1GB/sec per lane bidirectional, which for an x16 device means doubling the available bandwidth from 8GB/sec to 16GB/sec. This is accomplished by increasing the frequency of the underlying bus itself from 5 GT/sec to 8 GT/sec, while decreasing overhead from 20% (8b/10b encoding) to 1% through the use of a highly efficient 128b/130b encoding scheme. Meanwhile latency doesn’t change – it’s largely a product of physics and physical distances – but merely doubling the bandwidth can greatly improve performance for bandwidth-hungry compute applications.

As with any other specialized change like this the benefit is going to heavily depend on the application being used, however AMD is confident that there are applications that will completely saturate PCIe 3.0 (and thensome), and it’s easy to imagine why.

Even among our limited selection compute benchmarks we found something that directly benefitted from PCIe 3.0. AESEncryptDecrypt, a sample application from AMD’s APP SDK, demonstrates AES encryption performance by running it on square image files. Throwing it a large 8K x 8K image not only creates a lot of work for the GPU, but a lot of PCIe traffic too. In our case simply enabling PCIe 3.0 improved performance by 9%, from 324ms down to 297ms.

Ultimately having more bandwidth is not only going to improve compute performance for AMD, but will give the company a critical edge over NVIDIA for the time being. Kepler will no doubt ship with PCIe 3.0, but that’s months down the line. In the meantime users and organizations with high bandwidth compute workloads have Tahiti.

Video & Movies: The Video Codec Engine, UVD3, & Steady Video 2.0 Managing Idle Power: Introducing ZeroCore Power

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

292 Comments

View All Comments

Iketh - Thursday, December 22, 2011 - link
As mentioned several times in the article and in the comments, time was an issue. You can rest assured that follow-up articles are in the works.
Ryan Smith - Thursday, December 22, 2011 - link
Indeed it is.
Malih - Thursday, December 22, 2011 - link
dude, awesome in-depth (emphasizing on depth) review, thank you very much for the excellent work Ryan.
Esbornia - Thursday, December 22, 2011 - link
After reading a half ass misinforming review full of errors and typos, I think you didn't read it to say something like that.
Iketh - Thursday, December 22, 2011 - link
It is full of typos, but that has nothing to do with in-depth. It was certainly in-depth and a joy to read despite the typos.

I'd like to know what you believe is misinformation though.
SlyNine - Thursday, December 22, 2011 - link
He probably couldn't understand alot of it and though they were all typo's.
WhoBeDaPlaya - Thursday, December 22, 2011 - link
Sod off you wanker. Go and read Walmart reviews for this cart - they're probably more at your level ;)
Marburg U - Thursday, December 22, 2011 - link
Does Eyefinity Technology 2.0 allow me to launch an application within Windows ON WHICH MONITOR I WANT?
NikosD - Thursday, December 22, 2011 - link
It seems that nobody noticed but where is FP64 = 1/2 FP32 performance that AMD said back in June when they first introduced CGN architecture ?

I copy from Ryan's June article:

"One thing that we do know is that FP64 performance has been radically improved: the GCN architecture is capable of FP64 performance up to ½ its FP32 performance. For home users this isn’t going to make a significant impact right away, but it’s going to help AMD get into professional markets where such precision is necessary."

The truth is that FP64 is 1/4 of FP32 eventually!

Big Loss in GPGPU community even if 7970 is capable of 3.79Tflops of FP32 compared to 2.7Tflops of 6970
R3MF - Thursday, December 22, 2011 - link
it says 1/2 in the architecture article, but 1/4 in the consumer product review, is this AMD taking a leaf from Nvidia's (shitty) book of using drivers to disable features in non-professional (price-tag) products?

AMD Radeon HD 7970 Review: 28nm And Graphics Core Next, Together As One

PCI Express 3.0: More Bandwidth For Compute

Post Your Comment

292 Comments

View All Comments

Iketh - Thursday, December 22, 2011 - link

Ryan Smith - Thursday, December 22, 2011 - link

Malih - Thursday, December 22, 2011 - link

Esbornia - Thursday, December 22, 2011 - link

Iketh - Thursday, December 22, 2011 - link

SlyNine - Thursday, December 22, 2011 - link

WhoBeDaPlaya - Thursday, December 22, 2011 - link

Marburg U - Thursday, December 22, 2011 - link

NikosD - Thursday, December 22, 2011 - link

R3MF - Thursday, December 22, 2011 - link

Log in

Don't have an account? Sign up now