More on PCI Express and Graphics

There is going to be a lot of fighting over the next few months about the performance of PCI Express based graphics solutions. Unfortunately, we won't really know what's what until we have hardware in our hands that we can play with to test real world performance. In order to try to straighten out some of the madness that will surely be going on, we will try to present a better picture of what things are looking like with PCI Express graphics cards at the moment. First we will attempt to explain why the contenders chose their paths.

ATI went with a native PCI Express interface because it gives them a full 4GB/s upstream and downstream bandwidth at the same time. This will allow for some massive amounts of data to move between the GPU and the CPU/main memory in both directions. They also have the advantage of not needing an extra component on the graphics card itself.

NVIDIA chose to use a bridged solution (which they like to call their High Speed Interconnect or HSI solution) which gives them the ability to only produce one GPU for both AGP and PCI Express based solutions while the transition is being made to the new platform. This gives them the advantage of being more flexible to demand for AGP and PCI Express based products, and they won't have to forecast just how many of which type of GPU they will sell in any given silicon run.

The main point of contention between the two camps is bandwidth. We know what ATI is capable of; now let's take a look at NVIDIA. Because the bridge is in such close proximity to the AGP interface of the GPU, NVIDIA is able to run the AGP side of their bridge at 4GB/s (which is twice the bandwidth of AGP 8x). As NVIDIA is bridging PCI Express's 4GB/s up and 4GB/s down to an AGP 16x bus, they will not be able to sustain the full bandwidth of the x16 PCI Express interface in both directions at the same time. If data is moving in only one direction, there is no bandwidth loss. If data needs to move up and down at the same time, one data stream will have to sacrifice some bandwidth to the other.

So, what kind of impact will this have? We can't really say until we have hardware. Of course, DDR400 will only give us 3.2GB/s of bandwidth (or 6.4GB/s in dual channel mode), so transfers from memory to the GPU will actually be memory speed limited on systems that don't have 8GB/s of memory bandwidth. As for games, we can take a look at the history of the impact of increasing AGP bandwidths. It is possible that future games (and possibly games ported by lazy console developers) may want to use the CPU and main memory a great deal and therefore benefit from PCI Express, but again, we won't know until the hardware and the games are out there.

NVIDIA's NV5x line is slated to be native PCI Express, but they also plan on continuing AGP support through that line by flipping their bridge chip upside down. In fact, we were told that if demand persists into the future, it is very possible that NVIDIA will bridge its NV6x GPUs back to AGP as well. This bus transition is a really tough one to call. ISA took a long time to disappear, and PCI graphics cards are still selling (generally to those who want two graphics cards, but the point is that they are still out there). The difference here is that Intel is pushing really hard for this transition, and AGP is a much more targeted solution than either the ISA or PCI buses.

From a business standpoint, NVIDIA has made a safer choice than ATI. Of course, if performance ends up suffering, or if ATI shows effective, affordable applications for the PCI Express bus that NVIDIA can't take advantage of, NVIDIA will suffer. In this business performance is everything. Obviously having the full bandwidth of PCI Express is a desirable thing. Eventually everyone will be making native PCI Express GPUs. But when is eventually? And when will that available bandwidth be tapped, let alone necessary? Only time will tell, but hopefully we've filled in some of the gaps until that time comes.

You do what for a living? Final Words
Comments Locked

9 Comments

View All Comments

  • TrogdorJW - Tuesday, February 24, 2004 - link

    Ugh... IPS was supposed to be IPC.

    IPS has been proposed as an alternative to MHz as a processor speed measurement (Instructions Per Second = IPC * MHz), but figuring out the *average* number of instructions per clock is likely to bring up a whole new set of problems.
  • TrogdorJW - Tuesday, February 24, 2004 - link

    The AMD people will probably love this quote:

    "We still need to answer the question of how we are going to get from here to there. As surprising as it may seem, Intel's answer isn't to push for ever increasing frequencies. With some nifty charts and graphs, Pat showed us that we wouldn't be able to rely on increases in clock frequency giving us the same increases in performance as we have had in the past. The graphs showed the power density of Intel processors approaching that of the sun if it remains on its current trend, as well as a graph showing that the faster a processor, the more cycles it wastes waiting for data from memory (since memory latency hasn't decreased at the same rate as clock speed has increased). Also, as chips are fabbed with smaller and smaller processes, increasing clock speeds will lead to problems with moving data across around a chip in less than one clock cycle (because of interconnect RC delays)."

    Of course, this is nothing new. Intel has been pursuing clock speed with P4 and parallelism with P-M and Itanium. In an ideal world, you would have Pentium M/Athlon IPS with P4 clock speeds. Anyway, it looks like programmers (WOOHOO - THAT'S ME!) are going to become more important than ever in the future processor wars. Writing software to properly take advantage of multiple threads is still an enormously difficult task.

    Then again, if game developers for example would give up on the "pissing contest" of benchmarks and code their games to just run at a constant 100 FPS max, it might be less of an issue. If CPUs get fast enough that they can run well over 100 fps on games, then they could stop being "Real Time Priority" processes.

    It really irks me that most games suck up 100% of the processor power. If I could get by with 30% processor usage and let the rest be multi-tasked out to other threads while maintaining a good frame rate, why should the game not do so? This is especially annoying on games that aren't real-time, like the turn-based strategy games.
  • TrogdorJW - Tuesday, February 24, 2004 - link

    "As for an example of synthesis, we were shown a demo of realtime raytracing. Visualization being the infinitely parallelizable problem that it is, this demo was a software renderer running on a cluster of 23 dual 2.2GHz Xeon processors. The world will be a beautiful place when we can pack this kind of power into a GPU and call it a day."

    Heheheh.... I like that. It's a real-time raytracing demo! Woohoo! I've heard people talk about raytracing being a future addition to graphics cards. If you assume that the GPU with specialized hardware could do raytracing ten times faster than the software on the Xeons, we'll still need 5 GHz graphics chips to pull it off. Or two chips running at 2.5 GHz? Still, the thought of being able to play a game with Toy Story quality graphics is pretty cool. Can't wait for 2010!
  • Shuxclams - Tuesday, February 24, 2004 - link

    Oops, no comment before. Am I seeing things or do I see a southbridge, northbridge and memory controller?








    SHUX
  • Shuxclams - Tuesday, February 24, 2004 - link

  • HammerFan - Tuesday, February 24, 2004 - link

    Intel probably won't use an onboard mem controller for a long time...i've heard that their first experiences with them weren't good. Also, the northbridges are way too big to no have a mem controller on board.
    *new topic*
    That BTX case looks wacky to me...why such a big heatsink for the CPU?
    *new topic*
    I have the same question Cygni had: Are their any CTs in these pictures, or are there none out-and-about yet?
  • Ecmaster76 - Tuesday, February 24, 2004 - link

    I counted eight dimms on the first board and either six or eight on the second one. Dual core memory controller? If so it would help Intel keep the Xeon from being spanked by Opteron as they scale.
  • capodeloscapos - Tuesday, February 24, 2004 - link

    Quote: " It is possible that future games (and possibly games ported by lazy console developers) may want to use the CPU and main memory a great deal and therefore benefit from PCI Express"

    cough!, Halo, Cough!, Colin McRae 3, cough!...
    :)
  • Cygni - Tuesday, February 24, 2004 - link

    I like the attempt to hide the number of DIMM slots... but i think its still pretty easy to tell how many are there, becaouse of the top of the slots still showing, as well as a little of the bottom of the last slot.

    So, is Intel trying to hide that Lindenhurst is 64bit (XeonCE) compatible, or am i off base here?

Log in

Don't have an account? Sign up now