PCI Express 3.0: More Bandwidth For Compute

It may seem like it’s still fairly new, but PCI Express 2 is actually a relatively old addition to motherboards and video cards. AMD first added support for it with the Radeon HD 3870 back in 2008 so it’s been nearly 4 years since video cards made the jump. At the same time PCI Express 3.0 has been in the works for some time now and although it hasn’t been 4 years it feels like it has been much longer. PCIe 3.0 motherboards only finally became available last month with the launch of the Sandy Bridge-E platform and now the first PCIe 3.0 video cards are becoming available with Tahiti.

But at first glance it may not seem like PCIe 3.0 is all that important. Additional PCIe bandwidth has proven to be generally unnecessary when it comes to gaming, as single-GPU cards typically only benefit by a couple percent (if at all) when moving from PCIe 2.1 x8 to x16. There will of course come a time where games need more PCIe bandwidth, but right now PCIe 2.1 x16 (8GB/sec) handles the task with room to spare.

So why is PCIe 3.0 important then? It’s not the games, it’s the computing. GPUs have a great deal of internal memory bandwidth (264GB/sec; more with cache) but shuffling data between the GPU and the CPU is a high latency, heavily bottlenecked process that tops out at 8GB/sec under PCIe 2.1. And since GPUs are still specialized devices that excel at parallel code execution, a lot of workloads exist that will need to constantly move data between the GPU and the CPU to maximize parallel and serial code execution. As it stands today GPUs are really only best suited for workloads that involve sending work to the GPU and keeping it there; heterogeneous computing is a luxury there isn’t bandwidth for.

The long term solution of course is to bring the CPU and the GPU together, which is what Fusion does. CPU/GPU bandwidth just in Llano is over 20GB/sec, and latency is greatly reduced due to the CPU and GPU being on the same die. But this doesn’t preclude the fact that AMD also wants to bring some of these same benefits to discrete GPUs, which is where PCI e 3.0 comes in.

With PCIe 3.0 transport bandwidth is again being doubled, from 500MB/sec per lane bidirectional to 1GB/sec per lane bidirectional, which for an x16 device means doubling the available bandwidth from 8GB/sec to 16GB/sec. This is accomplished by increasing the frequency of the underlying bus itself from 5 GT/sec to 8 GT/sec, while decreasing overhead from 20% (8b/10b encoding) to 1% through the use of a highly efficient 128b/130b encoding scheme. Meanwhile latency doesn’t change – it’s largely a product of physics and physical distances – but merely doubling the bandwidth can greatly improve performance for bandwidth-hungry compute applications.

As with any other specialized change like this the benefit is going to heavily depend on the application being used, however AMD is confident that there are applications that will completely saturate PCIe 3.0 (and thensome), and it’s easy to imagine why.

Even among our limited selection compute benchmarks we found something that directly benefitted from PCIe 3.0. AESEncryptDecrypt, a sample application from AMD’s APP SDK, demonstrates AES encryption performance by running it on square image files.  Throwing it a large 8K x 8K image not only creates a lot of work for the GPU, but a lot of PCIe traffic too. In our case simply enabling PCIe 3.0 improved performance by 9%, from 324ms down to 297ms.

Ultimately having more bandwidth is not only going to improve compute performance for AMD, but will give the company a critical edge over NVIDIA for the time being. Kepler will no doubt ship with PCIe 3.0, but that’s months down the line. In the meantime users and organizations with high bandwidth compute workloads have Tahiti.

Video & Movies: The Video Codec Engine, UVD3, & Steady Video 2.0 Managing Idle Power: Introducing ZeroCore Power
Comments Locked

292 Comments

View All Comments

  • RussianSensation - Thursday, December 22, 2011 - link

    That's not what the review says. The review clearly explains that it's the best single-GPU for gaming. There is nothing biased about not being mind-blown by having a card that's only 25% faster than GTX580 and 37% faster than HD6970 on average, considering this is a brand new 28nm node. Name a single generation where AMD's next generation card improved performance so little since Radeon 8500?

    There isn't any!
  • SlyNine - Friday, December 23, 2011 - link

    2900XT ? But I Don't remember if that was a new node and what the % of improvement was beyond the 1950XT.

    But still this is a 500$ card, and I don't think its what we have come to expect from a new node and generation of card. However some people seem more then happy with it, Guess they don't remember the 9700PRO days.
  • takeulo - Thursday, December 22, 2011 - link

    as ive read the review this is not a disappointment infact its only a single gpu card but it toughly competing or nearly chasing with the dual gpu's graphics card like 6990 and gtx 590 performance...
    imagine that 7970 is also a dual gpu?? it will tottally dominate the rest... sorry for my bad english..
  • eastyy - Thursday, December 22, 2011 - link

    the price vs performance is the most important thing for me at the moment i have a 460 that cost me about £160 at the time and that was a few years ago...seems like the cards now for the same price dont really give that much of a increase
  • Morg. - Thursday, December 22, 2011 - link

    What seems unclear to the writer here is that in fact 6-series AMD was better in single GPU than nVidia.

    Like miles better.

    First, the stock 6970 was within 5% of the gtx580 at high resolutions (and excuse me, but if you like a 500 bucks graphics board with a 100 bucks screen ... not my problem -- ).

    Second, if you put a 6970 OC'd at GTX580 TDP ... the GTX580 is easily 10% slower.

    So overall . seriously ... wake the f* up ?

    The only thing nVidia won at with fermi series 2 (gtx5xx) is making the most expensive highest TDP single GPU card. It wasn't faster, they just picked a price point AMD would never target .. and they got i .. wonderful.

    However, AMD raped nVidia all the way in perf/watt/dollar as they did with Intel in the Server CPU space since Opteron Istanbul ...

    If people like you stopped spouting random crap, companies like AMD would stand a chance of getting the market share their products deserve (sure their drivers are made of shit).
  • Leyawiin - Thursday, December 22, 2011 - link

    The HD 7970 is a fantastic card (and I can't wait to see the rest of the line), but the GTX 580 was indisputably better than the HD 6970. Stock or OC'd (for both).
  • Morg. - Friday, December 23, 2011 - link

    Considering TDP, price and all - no.

    The 6970 lost maximum 5% to the GTX580 above full HD, and the bigger the resolution, the smaller the GTX advantage.

    Every benchmark is skewed, but you should try interpreting rather than just reading the conclusion --

    Keep in mind the GTX580 die size is 530mm² whereas the 6970 is 380mm²

    Factor that in, aim for the same TDP on both cards . and believe me .. the GTX580 was a complete total failure, and a total loss above full HD.

    Yes it WAS the biggest single GPU of its time . but not the best.
  • RussianSensation - Thursday, December 22, 2011 - link

    Your post is ill-informed.

    When GTX580 and HD6970 are both overclocked, it's not even close. GTX580 destroyed it.

    http://www.xbitlabs.com/articles/graphics/display/...

    HD6950 was an amazing value card for AMD this generation, but HD6970 was nothing special vs. GTX570. GTX580 was overpriced for the performance over even $370 factory preoverclocked GTX570 cards (such as the almost eerily similar in performance EVGA 797mhz GTX570 card for $369).

    All in all, GTX460 ~ HD6850, GTX560 ~ HD6870, GTX560 Ti ~ HD6950, GTX570 ~ HD6970. The only card that had really poor value was GTX580. Of course if you overclocked it, it was a good deal faster than the 6970 that scaled poorly with overclocking.
  • Morg. - Friday, December 23, 2011 - link

    I believe you don't get what I said :

    AT THE SAME TDP, THE HD6xxx TOTALLY DESTROYED THE GTX 5xx

    THAT MEANS : the amd gpu was better even though AMD decided to sell it at a TDP / price point that made it cheaper and less performing than the GTX 5xx

    The "destroyed it" statement is full HD resolution only . which is dumb . I wouldn't ever get a top graphics board to just stick with full HD and a cheap monitor.
  • Peichen - Friday, December 23, 2011 - link

    According to your argument, all we'd ever need is IGP because no stand-alone card can compete with IGP at the same TDP / price point.

Log in

Don't have an account? Sign up now