PCI Express 3.0: More Bandwidth For Compute

It may seem like it’s still fairly new, but PCI Express 2 is actually a relatively old addition to motherboards and video cards. AMD first added support for it with the Radeon HD 3870 back in 2008 so it’s been nearly 4 years since video cards made the jump. At the same time PCI Express 3.0 has been in the works for some time now and although it hasn’t been 4 years it feels like it has been much longer. PCIe 3.0 motherboards only finally became available last month with the launch of the Sandy Bridge-E platform and now the first PCIe 3.0 video cards are becoming available with Tahiti.

But at first glance it may not seem like PCIe 3.0 is all that important. Additional PCIe bandwidth has proven to be generally unnecessary when it comes to gaming, as single-GPU cards typically only benefit by a couple percent (if at all) when moving from PCIe 2.1 x8 to x16. There will of course come a time where games need more PCIe bandwidth, but right now PCIe 2.1 x16 (8GB/sec) handles the task with room to spare.

So why is PCIe 3.0 important then? It’s not the games, it’s the computing. GPUs have a great deal of internal memory bandwidth (264GB/sec; more with cache) but shuffling data between the GPU and the CPU is a high latency, heavily bottlenecked process that tops out at 8GB/sec under PCIe 2.1. And since GPUs are still specialized devices that excel at parallel code execution, a lot of workloads exist that will need to constantly move data between the GPU and the CPU to maximize parallel and serial code execution. As it stands today GPUs are really only best suited for workloads that involve sending work to the GPU and keeping it there; heterogeneous computing is a luxury there isn’t bandwidth for.

The long term solution of course is to bring the CPU and the GPU together, which is what Fusion does. CPU/GPU bandwidth just in Llano is over 20GB/sec, and latency is greatly reduced due to the CPU and GPU being on the same die. But this doesn’t preclude the fact that AMD also wants to bring some of these same benefits to discrete GPUs, which is where PCI e 3.0 comes in.

With PCIe 3.0 transport bandwidth is again being doubled, from 500MB/sec per lane bidirectional to 1GB/sec per lane bidirectional, which for an x16 device means doubling the available bandwidth from 8GB/sec to 16GB/sec. This is accomplished by increasing the frequency of the underlying bus itself from 5 GT/sec to 8 GT/sec, while decreasing overhead from 20% (8b/10b encoding) to 1% through the use of a highly efficient 128b/130b encoding scheme. Meanwhile latency doesn’t change – it’s largely a product of physics and physical distances – but merely doubling the bandwidth can greatly improve performance for bandwidth-hungry compute applications.

As with any other specialized change like this the benefit is going to heavily depend on the application being used, however AMD is confident that there are applications that will completely saturate PCIe 3.0 (and thensome), and it’s easy to imagine why.

Even among our limited selection compute benchmarks we found something that directly benefitted from PCIe 3.0. AESEncryptDecrypt, a sample application from AMD’s APP SDK, demonstrates AES encryption performance by running it on square image files.  Throwing it a large 8K x 8K image not only creates a lot of work for the GPU, but a lot of PCIe traffic too. In our case simply enabling PCIe 3.0 improved performance by 9%, from 324ms down to 297ms.

Ultimately having more bandwidth is not only going to improve compute performance for AMD, but will give the company a critical edge over NVIDIA for the time being. Kepler will no doubt ship with PCIe 3.0, but that’s months down the line. In the meantime users and organizations with high bandwidth compute workloads have Tahiti.

Video & Movies: The Video Codec Engine, UVD3, & Steady Video 2.0 Managing Idle Power: Introducing ZeroCore Power
Comments Locked

292 Comments

View All Comments

  • GenSozo - Thursday, December 22, 2011 - link

    Style? Another possibility is that he has no life, a heavily worn F5 key, and lots of angst.
  • Blaster1618 - Monday, December 26, 2011 - link

    One request when diving into acronyms (from the “quick refresher”), first one is followed by (definition in parenthesis) or hyperlink. Your site does the best on the web at delving into and explaining the technical evolution of computing. You maybe even able to tech the trolls and shills a thing or to they can regurgitate at there post X-mas break circle jerk. Never underestimate the importance or reach of your work.
  • lordken - Friday, January 6, 2017 - link

    mmh quite far from disappointing, still running on 7950 as of today [5 years later] :)
  • Concillian - Thursday, December 22, 2011 - link

    Page 1
    Power Consumption Comparison: Columns: AMD / Price / NVIDIA

    Presumably mislabeled.
  • Anand Lal Shimpi - Thursday, December 22, 2011 - link

    Fixed, thank you!

    Take care,
    Anand
  • Penti - Thursday, December 22, 2011 - link

    Will the new video decode engine either add software accelerated gpu or fixed function hardware WebM/VP8 video decode? ARM SoC's basically already has those capabilities with rock-chip including hw-decoding, TI OMAP IVA3 DSP-video processor supporting VP8/WebM, Broadcom supporting it in their video processor and others to come. Would be odd to be able to do smooth troublefree 1080p WebM on a phone or tablet, but not a desktop and laptop computer without taxing the cpu and buses like crazy. It's already there hardware-wise in popular devices to do if they add software/driver support for it.

    Nice to see a new generation card any how.
  • Ryan Smith - Thursday, December 22, 2011 - link

    It's UVD3, the same decoder that was on Cayman. So if Cayman can't do it, Tahiti can't either.
  • MadMan007 - Thursday, December 22, 2011 - link

    Pretty sure the chart on the first page should be labeled Price Comparison not Power Consumption Comparison.

    Unless perhaps this was a sly way of saying money is power :)
  • descendency - Thursday, December 22, 2011 - link

    You list the HD 6870 as 240 on the first page ("AMD GPU Specification Comparison" chart) but then list it as around 160 in the "Winter 2011 GPU Pricing Comparison" chart. 80 dollars is quite a difference.
  • Anand Lal Shimpi - Thursday, December 22, 2011 - link

    Fixed, sorry those were older numbers.

    Take care,
    Anand

Log in

Don't have an account? Sign up now