AMD Radeon HD 7970 Review: 28nm And Graphics Core Next, Together As One

Name: AMD Radeon HD 7970 Review: 28nm And Graphics Core Next, Together As One
Item: AMD Radeon HD 7970 Review: 28nm And Graphics Core Next, Together As One
Author: Ryan Smith

by Ryan Smith on December 22, 2011 12:00 AM EST

292 Comments | Add A Comment

292 Comments

PCI Express 3.0: More Bandwidth For Compute

It may seem like it’s still fairly new, but PCI Express 2 is actually a relatively old addition to motherboards and video cards. AMD first added support for it with the Radeon HD 3870 back in 2008 so it’s been nearly 4 years since video cards made the jump. At the same time PCI Express 3.0 has been in the works for some time now and although it hasn’t been 4 years it feels like it has been much longer. PCIe 3.0 motherboards only finally became available last month with the launch of the Sandy Bridge-E platform and now the first PCIe 3.0 video cards are becoming available with Tahiti.

But at first glance it may not seem like PCIe 3.0 is all that important. Additional PCIe bandwidth has proven to be generally unnecessary when it comes to gaming, as single-GPU cards typically only benefit by a couple percent (if at all) when moving from PCIe 2.1 x8 to x16. There will of course come a time where games need more PCIe bandwidth, but right now PCIe 2.1 x16 (8GB/sec) handles the task with room to spare.

So why is PCIe 3.0 important then? It’s not the games, it’s the computing. GPUs have a great deal of internal memory bandwidth (264GB/sec; more with cache) but shuffling data between the GPU and the CPU is a high latency, heavily bottlenecked process that tops out at 8GB/sec under PCIe 2.1. And since GPUs are still specialized devices that excel at parallel code execution, a lot of workloads exist that will need to constantly move data between the GPU and the CPU to maximize parallel and serial code execution. As it stands today GPUs are really only best suited for workloads that involve sending work to the GPU and keeping it there; heterogeneous computing is a luxury there isn’t bandwidth for.

The long term solution of course is to bring the CPU and the GPU together, which is what Fusion does. CPU/GPU bandwidth just in Llano is over 20GB/sec, and latency is greatly reduced due to the CPU and GPU being on the same die. But this doesn’t preclude the fact that AMD also wants to bring some of these same benefits to discrete GPUs, which is where PCI e 3.0 comes in.

With PCIe 3.0 transport bandwidth is again being doubled, from 500MB/sec per lane bidirectional to 1GB/sec per lane bidirectional, which for an x16 device means doubling the available bandwidth from 8GB/sec to 16GB/sec. This is accomplished by increasing the frequency of the underlying bus itself from 5 GT/sec to 8 GT/sec, while decreasing overhead from 20% (8b/10b encoding) to 1% through the use of a highly efficient 128b/130b encoding scheme. Meanwhile latency doesn’t change – it’s largely a product of physics and physical distances – but merely doubling the bandwidth can greatly improve performance for bandwidth-hungry compute applications.

As with any other specialized change like this the benefit is going to heavily depend on the application being used, however AMD is confident that there are applications that will completely saturate PCIe 3.0 (and thensome), and it’s easy to imagine why.

Even among our limited selection compute benchmarks we found something that directly benefitted from PCIe 3.0. AESEncryptDecrypt, a sample application from AMD’s APP SDK, demonstrates AES encryption performance by running it on square image files. Throwing it a large 8K x 8K image not only creates a lot of work for the GPU, but a lot of PCIe traffic too. In our case simply enabling PCIe 3.0 improved performance by 9%, from 324ms down to 297ms.

Ultimately having more bandwidth is not only going to improve compute performance for AMD, but will give the company a critical edge over NVIDIA for the time being. Kepler will no doubt ship with PCIe 3.0, but that’s months down the line. In the meantime users and organizations with high bandwidth compute workloads have Tahiti.

Video & Movies: The Video Codec Engine, UVD3, & Steady Video 2.0 Managing Idle Power: Introducing ZeroCore Power

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

292 Comments

View All Comments

Esbornia - Thursday, December 22, 2011 - link
Fan boy much?
CeriseCogburn - Thursday, March 8, 2012 - link
Finally, piroroadkill, Esbornia - the gentleman ericore merely stated what all the articles here have done as analysis while the radeonite fans repeated it ad infinitum screaming nvidia's giant core count doesn't give the percentage increase it should considering transistor increase.
Now, when it's amd's turn, we get ericore under 3 attacks in a row...---
So you three all take it back concerning fermi ?
maverickuw - Thursday, December 22, 2011 - link
I want to know when the 7950 will come out and hopefully it'll come out at $400
duploxxx - Thursday, December 22, 2011 - link
Only the fact that ATI is able to bring a new architecture on a new process and result in such a performance increase for that power consumption is a clear winner.

looking at the past with Fermy 1st launch and even Cayman VLIW4 they had much more issues to start with.

nice job, while probably nv680 will be more performing it will take them at least a while to release that product and it will need to be also huge in size.
ecuador - Thursday, December 22, 2011 - link
Nice review, although I really think testing 1680x1050 for a $550 is a big waste of time, which could have to perhaps multi-monitor testing etc.
Esbornia - Thursday, December 22, 2011 - link
Its Anand you should expect this kind of shiet.
Ryan Smith - Thursday, December 22, 2011 - link
In this case the purpose of 1680 is to allow us to draw comparisons to low-end cards and older cards, which is something we consider to be important. The 8800GT and 3870 in particular do not offer meaningful performance at 1920.
poohbear - Thursday, December 22, 2011 - link
Why do you bencmark @ 1920x1200 resolution? according to the Steam December survey only 8% of gamers have that resolution, whereas 24% have 1920x1080 and 18% use 1680x1050 (the 2 most popular). Also, minimum FPS would be nice to know in your benchmarks, that is really useful for us! just a heads up for next time u benchmark a video card! Otherwise nice review! lotsa good info at the beginning!:)
Galcobar - Thursday, December 22, 2011 - link
Page 4, comments section.
Esbornia - Thursday, December 22, 2011 - link
They dont want to show the improvements on min FPS cause they hate AMD, you should know that already.

AMD Radeon HD 7970 Review: 28nm And Graphics Core Next, Together As One

PCI Express 3.0: More Bandwidth For Compute

Post Your Comment

292 Comments

View All Comments

Esbornia - Thursday, December 22, 2011 - link

CeriseCogburn - Thursday, March 8, 2012 - link

maverickuw - Thursday, December 22, 2011 - link

duploxxx - Thursday, December 22, 2011 - link

ecuador - Thursday, December 22, 2011 - link

Esbornia - Thursday, December 22, 2011 - link

Ryan Smith - Thursday, December 22, 2011 - link

poohbear - Thursday, December 22, 2011 - link

Galcobar - Thursday, December 22, 2011 - link

Esbornia - Thursday, December 22, 2011 - link

Log in

Don't have an account? Sign up now