PCI Express 3.0: More Bandwidth For Compute

It may seem like it’s still fairly new, but PCI Express 2 is actually a relatively old addition to motherboards and video cards. AMD first added support for it with the Radeon HD 3870 back in 2008 so it’s been nearly 4 years since video cards made the jump. At the same time PCI Express 3.0 has been in the works for some time now and although it hasn’t been 4 years it feels like it has been much longer. PCIe 3.0 motherboards only finally became available last month with the launch of the Sandy Bridge-E platform and now the first PCIe 3.0 video cards are becoming available with Tahiti.

But at first glance it may not seem like PCIe 3.0 is all that important. Additional PCIe bandwidth has proven to be generally unnecessary when it comes to gaming, as single-GPU cards typically only benefit by a couple percent (if at all) when moving from PCIe 2.1 x8 to x16. There will of course come a time where games need more PCIe bandwidth, but right now PCIe 2.1 x16 (8GB/sec) handles the task with room to spare.

So why is PCIe 3.0 important then? It’s not the games, it’s the computing. GPUs have a great deal of internal memory bandwidth (264GB/sec; more with cache) but shuffling data between the GPU and the CPU is a high latency, heavily bottlenecked process that tops out at 8GB/sec under PCIe 2.1. And since GPUs are still specialized devices that excel at parallel code execution, a lot of workloads exist that will need to constantly move data between the GPU and the CPU to maximize parallel and serial code execution. As it stands today GPUs are really only best suited for workloads that involve sending work to the GPU and keeping it there; heterogeneous computing is a luxury there isn’t bandwidth for.

The long term solution of course is to bring the CPU and the GPU together, which is what Fusion does. CPU/GPU bandwidth just in Llano is over 20GB/sec, and latency is greatly reduced due to the CPU and GPU being on the same die. But this doesn’t preclude the fact that AMD also wants to bring some of these same benefits to discrete GPUs, which is where PCI e 3.0 comes in.

With PCIe 3.0 transport bandwidth is again being doubled, from 500MB/sec per lane bidirectional to 1GB/sec per lane bidirectional, which for an x16 device means doubling the available bandwidth from 8GB/sec to 16GB/sec. This is accomplished by increasing the frequency of the underlying bus itself from 5 GT/sec to 8 GT/sec, while decreasing overhead from 20% (8b/10b encoding) to 1% through the use of a highly efficient 128b/130b encoding scheme. Meanwhile latency doesn’t change – it’s largely a product of physics and physical distances – but merely doubling the bandwidth can greatly improve performance for bandwidth-hungry compute applications.

As with any other specialized change like this the benefit is going to heavily depend on the application being used, however AMD is confident that there are applications that will completely saturate PCIe 3.0 (and thensome), and it’s easy to imagine why.

Even among our limited selection compute benchmarks we found something that directly benefitted from PCIe 3.0. AESEncryptDecrypt, a sample application from AMD’s APP SDK, demonstrates AES encryption performance by running it on square image files.  Throwing it a large 8K x 8K image not only creates a lot of work for the GPU, but a lot of PCIe traffic too. In our case simply enabling PCIe 3.0 improved performance by 9%, from 324ms down to 297ms.

Ultimately having more bandwidth is not only going to improve compute performance for AMD, but will give the company a critical edge over NVIDIA for the time being. Kepler will no doubt ship with PCIe 3.0, but that’s months down the line. In the meantime users and organizations with high bandwidth compute workloads have Tahiti.

Video & Movies: The Video Codec Engine, UVD3, & Steady Video 2.0 Managing Idle Power: Introducing ZeroCore Power
Comments Locked

292 Comments

View All Comments

  • Scali - Saturday, December 24, 2011 - link

    I have never heard Jen-Hsun call the mock-up a working board.
    They DID however have working boards on which they demonstrated the tech-demos.
    Stop trying to make something out of nothing.
  • Scali - Saturday, December 24, 2011 - link

    Actually, since Crysis 2 does not 'tessellate the crap' out of things (unless your definition of that is: "Doesn't run on underperforming tessellation hardware"), the 7970 is actually the fastest card in Crysis 2.
    Did you even bother to read some other reviews? Many of them tested Crysis 2, you know. Tomshardware for example.
    If you try to make smart fanboy remarks, at least make sure they're smart first.
  • Scali - Saturday, December 24, 2011 - link

    But I know... being a fanboy must be really hard these days..
    One moment you have to spread nonsense about how Crysis 2's tessellation is totally over-the-top...
    The next moment, AMD comes out with a card that has enough of a boost in performance that it comes out on top in Crysis 2 again... So you have to get all up to date with the latest nonsense again.
    Now you know what the AMD PR department feels like... they went from "Tessellation good" to "Tessellation bad" as well, and have to move back again now...
    That is, they would, if they weren't all fired by the new management.
  • formulav8 - Tuesday, February 21, 2012 - link

    Your worse than anything he said. Grow up
  • CeriseCogburn - Sunday, March 11, 2012 - link

    He's exactly correct. I quite understand for amd fanboys that's forbidden, one must tow the stupid crybaby line and never deviate to the truth.
  • crazzyeddie - Sunday, December 25, 2011 - link

    Page 4:

    " Traditionally the ROPs, L2 cache, and memory controllers have all been tightly integrated as ROP operations are extremely bandwidth intensive, making this a very design for AMD to use. "
  • Scali - Monday, December 26, 2011 - link

    Ofcourse it isn't. More polygons is better. Pixar subdivides everything on screen to sub-pixel level.
    That's where games are headed as well, that's progress.

    Only fanboys like you cry about it.... even after AMD starts winning the benchmarks (which would prove that Crysis is not doing THAT much tessellation, both nVidia and new AMD hardware can deal with it adequately).
  • Wierdo - Monday, January 2, 2012 - link

    http://techreport.com/articles.x/21404

    "Crytek's decision to deploy gratuitous amounts of tessellation in places where it doesn't make sense is frustrating, because they're essentially wasting GPU power—and they're doing so in a high-profile game that we'd hoped would be a killer showcase for the benefits of DirectX 11
    ...
    But the strange inefficiencies create problems. Why are largely flat surfaces, such as that Jersey barrier, subdivided into so many thousands of polygons, with no apparent visual benefit? Why does tessellated water roil constantly beneath the dry streets of the city, invisible to all?
    ...
    One potential answer is developer laziness or lack of time
    ...
    so they can understand why Crysis 2 may not be the most reliable indicator of comparative GPU performance"

    I'll take the word of professional reviewers.
  • CeriseCogburn - Sunday, March 11, 2012 - link

    Give them a month or two to adjust their amd epic fail whining blame shift.
    When it occurs to them that amd is actually delivering some dx11 performance for the 1st time, they'll shift to something else they whine about and blame on nvidia.
    The big green MAN is always keeping them down.
  • Scali - Monday, December 26, 2011 - link

    Wrong, they showed plenty of demos at the introduction. Else the introduction would just be Jen-Hsun holding up the mock card, and nothing else... which was clearly not the case.
    They demo'ed Endless City, among other things. Which could not have run on anything other than real Fermi chips.
    And yea, I'm really going to go to SemiAccurate to get reliable information!

Log in

Don't have an account? Sign up now