Original Link: http://www.anandtech.com/show/2019
NVIDIA Single Card, Multi-GPU: GeForce 7950 GX2by Derek Wilson on June 5, 2006 12:00 PM EST
- Posted in
Today marks the launch of the first GPU maker sanctioned single card / multi-GPU solution for the consumer market in quite some time. Not since Quantum3D introduced the Obsidian X24 have we seen such a beast (which, interestingly enough, did actual Scan Line Interleaving on a single card). This time around NVIDIA's flavor of SLI and PCIe are being used to connect two boards together for a full featured multi-GPU solution that works like a single card as far as the end user is concerned. No special motherboard is required, the upcoming 90 series driver will support the card, and there is future potential for DIY quad SLI. There is still a ways to go until NVIDIA releases drivers that will support quad SLI without the help of a system vendor, but they are working on it.
For now, we will take a look at the card and its intended use: a card using a single PCIe connection designed to be the fastest NVIDIA graphics board available. While there are some drawbacks of SLI still associated with the 7950 GX2 (certain games scale less than others), the major issues are quite nicely resolved: there is no need for an SLI motherboard, and it's much easier to make sure everything is hooked up correctly (with only one power connector, no SLI bridge needed, and only one card to plug in). The drivers start up and automatically configure support for multi-GPU rendering, and (after our motherboard's BIOS was flashed) we had no problem with the system recognizing the new technology.
While the potential for quad SLI is a reality, the usefulness is still fairly limited - only users with ultrahigh resolution monitors will see the benefits of four GPUs. At lower resolutions, CPU overhead becomes a factor, and some limitations of DX9 come into play. We certainly want to test quad SLI on the 7950 GX2, but we will have to wait until we get the equipment together and track down a driver that will support it. In this article, we will compare the 7950 GX2 with other high end NVIDIA and ATI cards, and we'll also take a look at how well it scales compared to it's close relative: the 7900 GT / 7900 GT SLI. But before we get to the benchmarks, let's take a look at how NVIDIA puts it all together in a way that avoids the necessity of an SLI motherboard or an external power supply.
There is a very significant distinction to be made between NVIDIA's implementation of multi-GPU on a single card and previous attempts. In the past, solutions that drop two GPUs on one PCB (printed circuit board) have relied on the capability of an SLI motherboard to configure a single physical X16 PCIe connection into two X8 data paths. While this solution works, it is not optimal in bringing multi-GPU performance to the masses. Requiring not only a chipset that will allow dynamic PCIe lane configuration, but also restricting NVIDIA based graphics boards to NVIDIA core logic based motherboards really cuts down on the potential market.
With its first in-house multi-GPU design, NVIDIA has lifted the requirement for an SLI chipset and enabled the use of their 7950 GX2 on any motherboard with an X16 PCIe slot (provided the manufacturer has proper BIOS support, but more on that later). This chipset agnostic implementation works is by incorporating a PCIe switch which acts as a bridge between the system's X16 interface and the two GPUs. Because of the way PCIe works, the operating system is able to see the two graphics cards as if they were independent parts. You can think of this as being similar to connecting a USB hub to a single USB port in order to plug in multiple devices. Only in this case, the devices and switch are all in one neat little package.
The PCIe switch itself is a 48 lane device, capable of routing each of the three x16 connections to any one of the other two depending on its intended destination. On their 7900 GX2, NVIDIA takes full advantage of this, but for the 7950 GX2, only 8 lanes are routed from the switch to each GPU. The end result is that what the chipset would have had to manage, NVIDIA's 7950 GX2 moves on board.
We mentioned BIOS compatibility, which can be a potential problem. The reason we could see some issues here is that, while PCI Express switches are perfectly valid and useful devices, we haven't seen any real commercial attempt that takes advantage of them on an add-in board. Combine this with the fact that many motherboard makers only recognize graphics hardware in their x16 PCIe slots, and we end up with some wrinkles which need to be smoothed. The system BIOS must be able to handle finding a PCIe switch, and furthermore it must be able to recognize that a graphics card is beyond the switch in order to load the video BIOS.
NVIDIA has been working hard with the rest of the industry to help get BIOS updates ready and available for launch. The list is relatively long at this point, and we can confirm that the 7950 GX2 will actually run in many ATI based motherboards right now with the proper BIOS update. Inevitably, there will be some systems which will not run the 7950 GX2 at launch. Just how large a problem this is remains to be seen, but we can't put too much of the burden on NVIDIA's shoulders for this problem. Motherboard makers do need to support more than just graphics devices in their X16 slots, and the proper handling of PCIe switches is important as well. It just so happens that NVIDIA has become the catalyst for vendors to roll out support for this type of device. While we do worry about some customers being left out in the cold, often this is the price of admission to the high-tech bleeding edge of computing. To be safe, we strongly recommend interested buyers confirm that their motherboard has proper support before purchasing.
This is also the first NVIDIA product line that will fully and completely support HDCP over DVI. This means that, when combined with a monitor or TV that also supports HDCP over DVI, content which requires HDCP to play will not have any problem. While the entire lineup of NVIDIA and ATI GPUs has been capable of supporting HDCP, no full product lines have actually implemented the required solution.
The reason this is a first is due to the requirements of HDCP. Not only must the hardware be capable of transmitting HDCP content, but it also must provide a vendor specific key. These keys are only provided to vendors after paying a hefty fee. Until now, with the lack of protected content and compatible display devices, graphics board makers have not wanted to shell out the cash for HDCP keys. These keys are actually stored on a chip that must be integrated on the graphics card, so even though older cards have the potential for HDCP, the lack of the HDCP chip means that they cannot support the feature.
While we could take a few thousand words here to editorialize the wastefulness of content "protection" in consumer markets, we'll keep our thoughts brief. Real pirates will always find a way to make their money by selling stolen content. Cost or technical barriers are not sufficient deterrents to people who make their living through illegal distribution of content. If it can be seen or heard in a decrypted format, it will always be possible to copy. Until it is mandatory that decryption hardware and software with a private key for everyone be implanted into our brains, media designed for mass distribution can never really have full protection from copying. Content protection is a flaming pit into which an industry terrified of change is demanding hardware designers, programmers and governments toss as much money as possible.
That being said, the inclusion of HDCP support on the 7950 GX2 is a good thing. There's no reason to make it more difficult on the end user who just wants to watch or listen to the media they paid for. If content providers are going to go down this route either way, then it is certainly better to be prepared. While we have not spoken with every vendor, NVIDIA assures us that every 7950 GX2 will have HDCP key hardware onboard.
The Card and The Test
This is one of the most unique consumer level graphics boards we have seen in quite a long time. While there have been a few card makers that have dropped two GPUs on one board, this is the first product where we have seen PCIe switch technology used to actually allow the connection of two independent PCIe devices in one slot. The 7950 GX2 is also the first consumer level graphics add-in product we've had in our labs to be built using two separate PCBs.
If we take a close look at the card itself, we can see the PCIe connection between the two boards. This is an X8 PCIe connection, which saves a bit on board routing requirements and physical connector width. Even though PCIe is a serial bus and each new lane only requires two additional wires, NVIDIA had a lot on their hands when designing this product. The first incarnation in the 7900 GX2 currently being used in Quad SLI systems routes all 16 lanes to each graphics card, but the board itself is much larger than the 7950 GX2. It doesn't seem like just cutting down the PCIe lanes would make enough difference to cut out so much board space, so it is likely that NVIDIA spent further time optimizing board features and layout.
The board does have an open SLI connector. Unfortunately, at this time, we have not been able to test DIY Quad SLI. NVIDIA has made it clear that they won't try to stop people from building their own Quad SLI systems, but they also won't actively support such activity. This is in line with their current stance on overclocking. At the moment, NVIDIA tells us that there are many roadblocks to configuring a working Quad SLI system with the 7950 GX2.
On top of BIOS issues with a single PCIe bridge in one X16 slot, supporting multiple bridges and four graphics cards might be beyond the capability of most motherboards at this point. Again, the physical ability is there, but BIOS support may still be lacking. Driver support is also a problem, as the drivers that support the 7950 GX2 do not support Quad SLI and vice versa. NVIDIA assures us that driver support should eventually come along, but until then we will be working on hacking our way around these issues.
As for multiple display capabilities, the 7950 GX2 is just like single GPU boards. There are only two DVI outputs onboard which both support dual-link bandwidth and are driven by only one of the GPUs. NVIDIA has confirmed that it is technically possible to provide 4 dual-link DVI outputs on one 7950 GX2 board, but it doesn't look like any board makers are going down that route. There may also be additional driver support necessary to make this happen, and demand might not ever get high enough to entice anyone to actually build a 7950 GX2 with all the necessary components. Still, it is nice to know that the only thing stopping someone from going down the quad output route is simply the cost of the connectors.
As we are still working on getting the 7950 GX2 set up in SLI with a second card, we will be focusing on a comparison with other single card solutions. The exception, of course, will be 7900 GT SLI. The specifications of the 7950 GX2 indicate that it should perform very similarly to a 7900 GT SLI setup without the hassle. Each GPU on the 7950 GX2 is essentially a higher clocked 7900 GT. Additionally, the 7950 GX2 incorporates 512MB of 1200MHz (effective data rate) GDDR3 for a total of 1GB of on board RAM. Here is a quick reference table NVIDIA provided showing the differences between the 7950 GX2, the 7900 GTX and the 7900 GT. We do take some issue with reporting memory bandwidth, fill rate, and verts/sec as simple aggregates of the two GPUs' capabilities as these quantities don't scale perfectly linearly in either an SLI setup or in the case of the 7950 GX2, but the data is still interesting.
For the rest of our comparisons, we will be looking at the solutions shown here:
AMD Athlon 64 FX-57
ASUS A8N32 SLI Deluxe NVIDIA nForce 4 X16 Motherboard
2GB 3:3:2:8 OCZ PC4000 EB
Seagate 7200.7 160GB HD
700W GameXStream PSU
NVIDIA GeForce 7950 GX2
NVIDIA GeForce 7900 GTX
NVIDIA GeForce 7800 GTX 512
NVIDIA GeForce 7900 GT (and SLI)
ATI Radeon X1900 XT
ATI Radeon X1900 GT
We will lead off with a side by side comparison of the 7950 GX2 and the 7900 GT SLI, and then we will take a look at how the 7950 GX2 stacks up against other high end parts. Before we get too that, here's a quick look at power.
Our power tests at idle show that the Radeon X1900 XT comes in at the prime spot, but the tables turn when we flip the switch on Splinter Cell. Under load, the 7950 GX2 drops in behind the X1900 XT in power consumption. This difference between the NVIDIA and ATI high end would be even more exaggerated if we tested the X1900 XTX which expends even more energy to increase performance only slightly. While the GeForce 7950 GX2 draws more power than it's NVIDIA brethren, we are still looking at a manageable power draw (especially when considering the incredibly high power requirements of 7900 GTX SLI or X1900 XT CrossFire in comparison).
One Card, or Two?
Our first look at the 7950 GX2 will be a direct comparison to its closest SLI relative: the 7900 GT SLI. The purpose of this investigation is to attempt to answer some questions about the how differences in this single-card/multi-GPU implementation affect performance relative to the two card approach. The 7950 GX2 employs faster core and slower memory clock speeds than the 7900 GT, but these differences should produce fairly consistent performance deltas. Each GPU on the 7950 GX2 has twice as much RAM as the 7900 GT cards, but in past investigations we haven't seen memory size make any difference at resolutions below 2048x1536. The attribute we are really interested in is the performance differences created by the onboard PCIe switch.
From our side by side comparison, we can only attribute a maximum of 7% performance gain to the increase core clock of the 7950 GX2 over the 7900 GT SLI. At the same time, with a 10% higher memory clock on the 7900 GT SLI, we should see better performance in memory bandwidth limited situations on the 7900 GT SLI. Under games and settings with a balanced compute and memory load, these differences should come out in the wash.
There are quite a few tests in which both the 7950 GX2 and the 7900 GT SLI configurations are CPU limited or perform very similarly, but at higher resolutions we do see some differentiation. Everything gets magnified at high resolution: shaders must be run on more pixels and memory is hit harder. We could see that performance is fairly similar at CPU limited resolutions, but as we push the limit up above 3 megapixels and enable AA, we do see an advantage in favor of the 7950 GX2. For example, Quake 4 at 2048x1536 with 4xAA shows an absolutely gigantic 32% performance advantage over 7900 GT SLI. This isn't the norm, but even BF2 indicates a 7950 GX2 advantage of 13% (which is more than simple clock speed advantage can account for). Even if we haven't seen it before, memory size could be contributing to this advantage, but it also seems likely that the onboard PCIe switch could be reducing the latency involved in sending the frame data from one GPU to another.
In order to test this theory, we went back and retested the 7950 GX2 with muti-GPU mode disabled. Thus, we are able to bring you a comparison of the performance scaling between 7900 GT and SLI and 7950 GX2 in single and multi-GPU mode. This should give us a better idea as to whether the performance advantage of the GX2 is due to memory size or the PCIe switch.
From the data we collected, it looks like the 7900 GT scales better at low resolutions in most cases. It doesn't look like there is a significant scaling advantage for the 7950 GX2 in any game but Quake 4 at 2048x1536 with 4xAA. At this point, we would say that Quake 4 appears to require more than 256MB of RAM when running HQ settings at 2048x1536 with 4xAA, resulting in the huge performance increase with 7950 GX2 over 7900 GT. In some cases, scaling does make a difference in where the performance falls between the 7950 GX2 and the 7900 GT SLI solutions, but it does look like the majority of the performance differences between the two solutions is due to clock speeds and other features which are constant between single and multi-GPU arrangements.
At the end of the day, regardless of how these two cards scale, the 7950 GX2 is consistently faster than a stock 7900 GT SLI setup. Even if there aren't any clear benefits in terms of efficiency on the 7950 GX2 platform, there aren't any drawbacks either. Let's take a closer look at performance.
Battlefield 2 Performance
At the lowest resolution with and without AA, both the highest end ATI card we tested (the X1900 XT) and the NVIDIA 7900 GTX lead the 7950 GX2 in performance. In most other (non CPU limited) cases, we will see the new NVIDIA flagship part come out on top, but this is one case where the added overhead of multi-GPU management gets in the way. Hopefully anyone who has one fo these cards will also own a display that does much higher resolutions than 1280x1024.
Again, at 1600x1200 without AA we see the 7950 GX2 running into a CPU limitation. When 4xAA gets enabled, we see the 7950 GX2 jump to the front of the class. In fact, enabling 4xAA only causes the 7950 GX2 to drop an average of 2.6 frames per second from the non-antialiased benchmark.
Running at our highest resolution, both with and without AA leaves the 7950 GX2 solidly in the performance lead under BF2. It isn't surprising that the closest competitor is the 7900 GT SLI setup, followed by the X1900 XT. At the maximum quality setting, the 7900 GTX falls a stunning 36% (or 19.1 average fps) behind the 7950 GX2. Not every game delivers results this impressive, but BF2 is certainly a good title to perform well under.
Black & White 2 Performance
Unlike BF2, we see the 7950 GX2 and 7900 GT SLI solutions neck and neck for the top performance spot under B&W2 at low resolutions with and without AA. As the resolution gets increase, we see the situation change slightly.
Even if we increase the resolution a little bit, it still appaers that the 7950 GX2 and 7900 GT are a little CPU limited. No reason to waste time here today, let's move on to the heavy impact test.
In this instance, we see a huge performance difference between the 7950 GX2 and its closest competitor (which is no longer the 7900 GT SLI). The X1900 XT drops in on second place thist time around and I don't even get so much as a nod.
These benchmarks stand to show that F.E.A.R. is in no way a CPU limited game. Even at 1280x1024, the 7950 GX2 shows about a 28% performance lead over the 7900 GTX. Enabling AA pushes that performance lead up over 46%. The point proved here is that there will come a time when more games will demand 7950 GX2 level power at even common desktop resolutions. F.E.A.R. is currently one of the exceptions to the rule, so we still don't recommend that gamers who commonly play at resolutions below 1600x1200 invest in this level of hardware.
The multi-GPU solutions continue to excel under F.E.A.R. as resolution in creases. While the 7900 GT and X1900 GT start to become borderline playable at 1600x1200 with 4xAA, there 7900 GT SLI and 7950 GX2 are still butter smooth.
Even at 2048x1536 with 4xAA, NVIDIA's new high end flagship sails on at an enjoyable 45 fps. The gap between the X1900 XT starts to close at this resolution, dropping back down to only about a 30% lead, but this is understandable considering the volume of data that needs to be sent from one GPU to another. Added stress on memory bandwidth could also be the reason we see the 7900 GT SLI closing the performance gap between itself and the 7950 GX2 (which has a 120Mhz lower effective data rate off of each GPU).
Half-Life 2 Episode One Performance
Even the newest installment of HL2 is CPU limited at the high end under 1600x1200. These tests do show that, in spite of the CPU limitation, the multi-GPU overhead isn't incredibly damaging under the Source engine.
The performance advantage of 7950 GX2 increases moving up in resolution and adding AA. While there is a benefit due to the hardware at this level of quality, framerates this high are just not necessary for playing HL2. Those with a 1600x1200 resolution limit who play HL2 style games won't need to drop $600 on hardware to get a great experience.
At the top end of our performance tests, we don't see any surprises. The 7950 GX2 is the king of HL2 as far as single board solutions go. Getting this baby in SLI for a quad GPU solution will be quite interesting indeed.
Quake 4 Performance
With Quake 4, low resolutions are very CPU limited, but we can start to see the advantage of 7950 GX2 when AA is enabled. Quake 4 scales very well on multi-GPU solutions, so its expected that the 7950 GX2 should come out on top in this benchmark.
Moving on, our theories are confirmed: the 7950 GX2 continues to remain that the top of the list in terms of average frame rate under Quake 4. Enabling AA is still necessary for the advantage to become significant, but the gap between the new fastest single card and the rest of the pack is definitely increasing.
At 2048x1536, even without AA the 7950 GX2 shows an 18% advantage over the next fastest single PCIe slot solution. Enabling 4xAA instantly boosts that lead to over 43%. In fact, running at 2048x1536 with 4xAA rather than one of the CPU limited resolutions on a 7950 GX2 will only cost you about 28% in performance over all.
Splinter Cell: Chaos Theory Performance
For this test, we once again used the built in lighthouse benchmark and scripts originally created at Beyond3d. For this benchmark, it is important to note that the non-AA numbers enable PS3.0 features, while the benchmarks with AA do not (as these two options are not supported simultaneously in-game). At low resolutions, SC3 absolutely benefits from multi-GPU solutions, and so shows a nice performance advantage for 7950 GX2. With either the HDR features available under PS3.0 rendering or 4xAA for smoothing the jaggies, the new NVIDIA card is the king of the hill. Like Half-Life 2, though, we would suggest that the performance increase at low resolutions isn't yet enough to warrant a $600 graphics solution if you are monitor limited to 1280x1024.
Increasing resolution shows us no change in the performance characteristics between the 7950 GX2 and the rest of the pack.
At the highest resolution we tested today, it's clear that 7950 GX2 is the way to go in order to get the best SM3.0 and HDR experience. Interestingly, the Radeon X1900 XT closes the gap a little bit under 4xAA settings as resolution increases. The 7950 GX2 still comes out on top, but SC3 is a game that really shines on ATI hardware as well.
For our final test, the 7950 GX2 comes out on top once again. While we are a little bit CPU limited at the very high end, the 7950 GX2 still pulls away from the pack.
High resolution increases the lead the 7950 GX2 has over the rest of the contenders. With X3, all these cards remain playable up to 2048x1536, and 4xAA is even an option as well. The power of the 7950 GX2 isn't really necessary for X3 unless resolutions over 2048x1536 are required.
Price and flexibility are really the key factors in the success of the 7950 GX2. NVIDIA has set their MSRP at a range of $600 - $650 USD. This is actually right on par for the cost of two overclocked 7900 GT boards (which generally run between $300 and $330). For those who prefer a stock 7900 GT SLI solution like the one we tested for this comparison, the setup can be put together for between $550 and $600. However, it is important to remember that 7900 GT SLI requires an SLI motherboard while the 7950 GX2 will work just fine in a board with only a single X16 PCIe slot (with proper BIOS support). Those who will be running at 2048x1536 and higher with AA enabled will benefit more from the 7950 GX2 for its scaling capabilities and the fact that Quad SLI will likely be a future option.
We haven't been able to test Quad SLI for this review, but those who want the potential to scale their system up to extremely high resolutions will certainly be attracted to the 7950 GX2. There is added incentive when noting that a pair of Radeon X1900XT in CrossFire will draw more power than Quad SLI with a pair of 7950 GX2 cards. Those who want the ultra high end in graphics won't be fazed by the price, but those without monitors that support 4 or 5 megapixel resolutions might want to consider the CPU limitations apparent at resolutions below 2048x1536.
At lower resolutions, it will still be possible to enable the advanced AA features and achieve performance more or less at the 7900 GT level with twice the antialiasing. Certainly, lower resolutions gain more from increasing AA levels, but in our experience some of these incredibly high AA modes are a bit overrated. Smaller pixels, provided there aren't any performance or monitor restrictions, generally produce better image quality than increasing levels of antialiasing.
With such a hefty price tag and the extreme settings required to see a significant benefit, it is difficult to recommend the 7950 GX2 to the average enthusiast or gamer. For those who really want 7900 GT SLI, the 7950 GX2 is a better solution for the money with Gigabyte's flavor up for preorder on Zipzoomfly the day before launch at $599. This part is faster in most cases than the 7900 GTX, and in cases where performance can't compete, image quality can be improved. For those who live on the bleeding edge, this lower power, higher performing, alternative to ATI's X1900XT is a solid way to go.