Original Link: http://www.anandtech.com/show/1698
ATI's Multi-GPU Solution: CrossFireby Anand Lal Shimpi & Derek Wilson on May 30, 2005 9:00 PM EST
- Posted in
IntroductionEver since the introduction of NVIDIA's SLI, the world has anticipated the release of ATI's competing solution. Many questions and rumors have circulated over the past few months. Could ATI release a multi GPU solution that can stand up to SLI? We remember ATI's previous dual GPU solution with the Rage Fury Maxx, and the fact that 3rd party developers built a quad 9800 solution a few years ago. Would ATI launch a single card multi GPU solution, or a two-card solution that paralleled NVIDIA's offering?
Well, we have all the answers here.
In many ways, ATI's CrossFire launch parallels NVIDIA's SLI launch. ATI is bringing together the launch of a graphics technology and a motherboard platform to support it. Motherboards will support 2 x16 PCI Express slots for two cards. These cards will be linked together, and one will send its data to the other for final compositing and display. Some of the same multi GPU rendering modes are implemented as well.
These similarities aside, CrossFire is a very different solution by necessity. ATI is in a position where they need to augment their GPUs in order to support this technology. At the same time, the solution that ATI produces needs to have a distinct edge over SLI in order to fight its way into the market. Coming out more than 6 months behind SLI (a virtual eternity in the graphics industry), CrossFire has some ground to make up.
Can they do it?
ATI's Answer to SLI: CrossFire (The Motherboard)As with NVIDIA's multi-card graphics solution launch, ATI is bringing to market their Radeon Xpress 200 CrossFire-Edition chipset. Motherboards based around CrossFire will feature 2 physical PCI Express x16 slots with x8 electrical connections. If the motherboard manufacturer implements it, the second PCI Express slot does not require a selector card and can be used with any other x8 or lower device. When only one graphics card is installed, the BIOS is capable of reconfiguring dynamically the number of lanes that the PCI Express slots run, provided that the motherboard includes support for this feature. Some vendors will sell motherboards set up just like NVIDIA's solutions (more on that later).
Limiting the cards to only 2 x8 PCI Express connections may be a bottleneck if a game makes heavy use of the bus. Of course, we will run into similar issues with NVIDIA's solutions when paired with games that are PCI Express intensive. Unfortunately, as none yet exist, testing of this category of application is very difficult.
The Xpress 200 CrossFire-Edition is also capable of supporting integrated graphics. One excellent feature of the Xpress 200 CrossFire system is that, on boards where the OEM has included integrated graphics and two display outputs, 6 displays can be driven simultaneously (two from the integrated graphics, two from the standard Radeon, and two from the CrossFire card). Having the ability to support so many displays, while also offering the speed up of multiple GPUs is quite compelling; especially if ATI allows multiple GPU operation alongside multiple display support.
The need to buy a new motherboard in order to upgrade to a multiple GPU solution will likely keep some people from upgrading, but NVIDIA's solutions have the same problem. The fact that CrossFire is only being offered for the X800 and X850 series does limit the upgrade potential at this point. We have been recommending against using multi GPU solutions as an upgrade path option, but offering that freedom is still a plus.
ATI has given us the indication that CrossFire should work on Intel Chipsets as well as their own. This could give new life to those Intel designs originally targeted at SLI. Though not explicitly stating that CrossFire will work in an NVIDIA SLI board, it definitely seems possible. From an adoption/compatibility standpoint, ATI is certainly "evaluating other options".
There is also no physical reason why SLI cards could not work in an ATI Xpress CrossFire-Edition motherboard. The only thing that should stand in the way of this combination is NVIDIA's driver support. With the price premium that NVIDIA charges for their SLI chipset, it is clear that they want to discourage users from going another route. As they had owned the market on multi GPU solutions until now, that was an option. Now that ATI is throwing some competition in the ring, it would not be smart to exclude potential customers from using SLI just because of their motherboard choice.
ATI's Answer to SLI: CrossFire (The Card)ATI does not have dedicated silicon on their GPUs for chip-to-chip communications as NVIDIA does. ATI bills this as a positive aspect of their solution, as their CrossFire solution is capable of running cards with two different (and even different speed) GPUs. NVIDIA's SLI solution is restricted to running on not only the same model of card, but cards with the same video BIOS. Timing is crucial when hooking the two GPUs together for SLI. In fact, an out-of-spec SLI bridge can even cause problems. We will only be able to see which solution performs better when we get hardware in our hands, but if all things are equal, ATI will have the advantage here.
In order to make up for their lack of a chip-to-chip interconnect, ATI includes a Compositing Engine chip on their CrossFire card. Because of this, the CrossFire card can be paired with any Radeon X800 or X850 (there will be a CrossFire card for each flavor). The driver controls clock speeds of each card automatically and manages synchronization as necessary. Synchronizing boards can be done on a general scale and doesn't need to be clock for clock. All CrossFire cards have 16 pixel pipelines, but disable 4 pipelines when running in tandem with a 12 pipe Radeon. This is what allows ATI to provide a limited number of CrossFire cards to work with multiple Radeons. Each card does need its own x16 PCI Express slot, and the boards communicate through an external cable.
This may look more like the older 3dfx SLI solution, but in reality, the Radeon X800 or X850 sends its data digitally from the DVI output to the input on the CrossFire card, which then handles the data and forwards the final frame to the display device. Under alternate frame rendering (AFR), the data is simply sent on unchanged, but the Compositing Engine handles the combination of a split or supertiled frame, as well as the final rendering of ATI's super AA modes (more on these later).
In order to function properly, the standard and CrossFire cards share some system RAM. This allows each card access to all necessary data that doesn't need to be unique for each frame. ATI's driver handles splitting the workload and configuring a unique command queue for each card based on the application and the rendering mode selected. Rendering modes are not user selectable, and are predetermined through Catalyst AI. Each card also has access to its own system memory as usual.
What Happened to the Selector Card?One of ATI's most interesting claims with their CrossFire solution is that you no longer need the selector card that is seen on nForce4 SLI motherboards. How is it that ATI is able to get around this requirement? Contrary to popular belief, it's not magic.
By default, the nForce4 SLI chipset sends all 16 PCI Express lanes from the North Bridge to the first PCI Express x16 slot on a nForce4 SLI motherboard.
Flipping the selector card the other way divides the PCI Express lanes and sends the first 8 lanes to the first slot and the remaining 8 lanes to the second slot.
ATI's MVP chipset works slightly differently; by default, the North Bridge sends 8 PCI Express lanes to each of the two PCI Express slots. If you have two cards installed, then this is the desired configuration. However, if you only have one card installed, in order to get a full PCI Express x16 slot, one of the following methods must be implemented by the motherboard manufacturer:
Option 1 - Terminator Card
The first option is a terminator card that is installed into the second PCI Express x16 slot. This card simply reroutes the 8 PCI Express lanes going into the slot back to the first PCI Express slot, giving it all 16 PCI Express lanes.
Obviously, the downside to this approach is that you have to use a terminator card; we haven't heard of many manufacturers doing this.
Option 2 - SLI Selector Card
The second option is to implement the same selector card that is used in nForce4 SLI motherboards. If you want a single x16 slot, flip the card one way. If you want two x8 slots, just change its orientation and you are good to go.
The obvious downside here is that the user has to play with the selector card, something that ATI wants to avoid. Despite ATI's desire to avoid this card, some manufacturers are implementing it:
Option 3 - Selector ICs
The third (and most expensive) option is for the motherboard manufacturer to place a series of selector ICs (Integrated Circuits) on the motherboard itself, which will allow for the user to switch between one x16 slot or two x8 slots from within the BIOS. This is the most desired implementation from the end user's standpoint, but it is the most expensive option.
Either 4 or 5 chips have to be placed on the motherboard in order to allow for this software-selection of PCI Express configuration. Each chip costs the motherboard manufacturer approximately $1, which isn't really a problem for high end motherboards.
Note that there is also no reason why the same technology can't be implemented on nForce4 SLI motherboards; in fact, ASUS has already implemented it on their latest nForce4 SLI solution:
Rendering ModesThe task of getting two separate video cards to handle drawing frames efficiently for a single application transparent to the programmer and end user is quite an undertaking. When life was fillrate limited, 3dfx solved the problem by having each card render odd and even scanlines, which were then combined in analog. This solution is no longer viable, but ATI and NVIDIA have both come up with ways to accomplish the goal.
The absolute most desirable mode of operation that both companies have come up with is alternate frame rendering (AFR). As the name implies, each card renders an entire, separate frame. The major advantage to allowing each card to render an entire frame is that each card is able to handle the geometry processing required as well as the pixel processing.
Alternate frame rendering can't always be used for various reasons (such as the case when one frame depends on the previous result). When alternate frame rendering cannot be used, splitting the current frame vertically is an alternative that both ATI and NVIDIA have implemented. When the work for a single frame is split between two cards, the geometry pipeline can't be divided as easily as the pixel pipeline. As a scene is being rendered, it is not easy to assign objects to different cards as all objects in a scene can affect any of the pixels.
After geometry is sorted, a guess can be made as to how much pixel power will be required for different areas of the screen, and NVIDIA takes advantage of this to distribute the workload more evenly across the two cards. If the top half of the screen isn't as difficult to render, more than half the screen is given to the card assigned to the top. This method definitely helps to keep cards rendering split frames evenly loaded. ATI is capable of splitting the rendering work 60/40 or 70/30 under their scissor mode, but the split is determined per application.
Evenly dividing work is a very important task, and ATI has taken it a step further with CrossFire. They are introducing a rendering mode, which they call Supertiling. This mode splits the entire screen up into 32x32 pixel tiles and hands out a checkerboard pattern to each graphics card for pixel processing. Doing this effectively takes the guess work out of load balancing pixel processing between two cards. The workload averages itself out when the cards share the pixels in areas so near each other.
The caveat of Supertiling is compatibility. It has come to our attention that the "small number of applications" for which Supertiling does not work includes all OpenGL based titles. This means that OpenGL has either AFR or split frame rendering options available. AFR is the most desirable mode, but it would be nice to have a middle ground with more effective load balancing.
In addition to these multi card render modes, ATI has gone a step further to include enhanced AA modes. This is made possible by taking advantage of programmable sample points and their hardware compositing engine.
Super AA ModesThere are some older games that wouldn't see any benefit from a multi-GPU solution, as these titles may not be GPU limited. In order to provide some benefit to these games (while at the same time offering higher image quality), ATI has devised four multi-card display modes. These modes are user selectable from the control panel and can help add smoothness and clarity to any title.
The compatibility of ATI's Super AA modes is not limited to any subset of titles because there is no workload split involved - each card renders the entire scene, each with a unique set of sample points. Before display, the compositing engine takes the output of each card and prepares a final image for display.
Two of the new modes simply make use of different sample points. 8xAA and 12xAA employ either 4x or 6x AA modes on each card. Of course, MSAA is limited in its ability to antialias certain aspects of a 3D scene. Multisample only works along polygon edges, while the slower supersample method works across the entire scene (including textures). SSAA has fallen out of use due to the rather large performance impact that it has on a single card. The modus operandi for SSAA is to render a scene at a higher resolution and then resample the image to the desired resolution. Of course, there are other ways of performing SSAA.
ATI is able to handle SSAA by rendering the entire scene at the desired resolution on each card with a half pixel diagonal shift. They combine this method with either their 8x or 12x MSAA modes in order to produce 10xAA (4x + 4x + 2xSS) and 14xAA (6x + 6x + 2xSS). These quality modes should prove to be phenomenal.
These 2xSS mode shouldn't be confused with a normal 2x vertical and 2x horizontal resolution mode. In that case, each pixel has 4 ordered sample points that scale down to one pixel. In ATI's mode, 2 sample points are used per pixel in a rotated grid fashion.
These modes add life to games that would not benefit otherwise from multiple graphics cards, as well as provide a compatibility mode to titles for which alternating or splitting frames is not an option. This is a key feature of ATI's CrossFire that separates it from NVIDIA, and we are very eager to get our hands on hardware and test it first hand.
Now that we know what ATI's CrossFire solution is and what it can do, let's take a look at how it stacks up to the competition.
CrossFire vs. SLISo, the question everyone wants answered is: how does CrossFire compare to SLI?
Well, it's very difficult to answer this question with no performance numbers. Obviously, if one solution outperforms the other in any significant way, pluses and minuses based on feature set fade into the background. But feature set is all we have to go on right now as we don't have final hardware in hand for a proper comparison.
One highly debated issue is ATI's claim of broader compatibility than NVIDIA. Our understanding of "compatibility" is that any title will be able to run in at least one CrossFire mode. This includes not only the performance enhancing modes, but the quality enhancing modes as well.
It doesn't seem plausible to us that ATI has found a way to split the graphics work between two cards in a more compatible way than NVIDIA. But enabling ATI's Super AA modes eliminates the need to split the work. With each card rendering the complete scene (only using different AA sample points), ATI can effectively offer something to all titles where NVIDIA cannot. Those who choose not to enable AA for these titles will likely see a trend similar to NVIDIA's performance - more than one card won't help performance.
As it is really difficult to tell from briefings, presentations, and white papers exactly where the lines of compatibility are drawn, we will simply have to wait until we get our hands on the cards before we finalize our conclusions.
Looking at all the features, if performance ends up equal or in ATI's favor, we have to consider CrossFire the more interesting solution. The flexibility of easily using multiple displays along side multi-GPU performance combined with the option of enabling higher quality AA (including rotated grid SSAA) is impossible to ignore. Add to that the ability to upgrade existing hardware without needing an exact match and we are sold.
Here's to hoping the performance of CrossFire lives up to the potential of its featureset.
The Problematic South BridgeWhile it's hardly talked about outside of Taiwan, ATI's South Bridge is quite buggy. The chip that is responsible for providing the motherboard's SATA and USB ports, as well as PCI slots is no where near final and many manufacturers are skeptical of ATI's ability to finish their own South Bridge in time. Note that ATI's own South Bridge does not support SATA-II or NCQ, regardless of actual bugs with the chip.
Luckily, ATI has partnered with ULi to offer working South Bridges that are compatible with ATI's CrossFire North Bridge. We've tested ULi's South Bridges and they seem to be problem-free, and our sentiments are echoed by many motherboard manufacturers who have decided to use ULi South Bridges with their ATI CrossFire motherboards.
However, ATI is pushing most of their partners to use ATI's own South Bridge despite its problems and is convinced that the problems will be sorted out in time. So a number of manufacturers at Computex are showing off CrossFire solutions with ATI's South Bridge, despite their complaints to us about the South Bridge.
At least this time around, it may be better for motherboard manufacturers to use ULi's South Bridge until ATI has had more time to get all of the kinks out of their solution. ULi's South Bridges have been in use for the past generation of ATI's chipsets, thanks to issues with ATI's South Bridges, and so far, we have not heard of any complaints.
ATI should be focused on the overall platform, not necessarily building up support for their South Bridge. Although, we do think that it is a bit embarrassing to have to turn to another chipset vendor to provide working South Bridges for your motherboard partners. It would be one thing if this were ATI's first chipset, but it most definitely is not.
PerformanceATI outfitted three motherboard manufacturers with fully functional CrossFire demo systems to show off at the show. The systems featured an ATI CrossFire reference board and a pair of graphics cards: a Radeon X850 XT and a CrossFire Radeon X850 XT.
The CrossFire X850 XT had a DVI dongle with two ports; one connected to the monitor, the second connected to a DVI cable, which was fed into the DVI output of the regular X850 XT card.
Even in CrossFire mode, the two graphics cards appear independently in device manager, which may allow for multi-monitor operation while in CrossFire mode:
Enabling CrossFire is done from within the ATI control panel, and unlike NVIDIA's SLI, no reboot is required:
With CrossFire enabled, the new AA modes are available for user selection:
Armed with one of these machines that ATI sent to their partners, we managed to get some benchmark time with CrossFire. Unfortunately, we didn't have much time to test nor did we have a full suite of benchmarks, so all we could run was Doom 3 (it was either Doom 3 or 3dmark 05).
The system that we used for testing featured an Athlon 64 FX-53, 512MB of memory and the two X850 XT graphics cards running under Windows XP Professional.
We ran all Doom 3 tests with 4X AA enabled at the High Quality presets in the unpatched retail version of Doom 3.
Even at this early stage, performance and stability were both impressive. The system that we were running had just been assembled hours earlier and didn't crash at all during our testing. In fact, the system was so new that the motherboard manufacturer who let us test with their hardware hadn't even seen it running - it was their first time as well as ours.
The performance of the solution was equally impressive; at 1024x768, the dual GPU CrossFire setup improved performance by 49%. At 1280x1024 and 1600x1200, the performance went up by 72% and 86% respectively. We had our doubts that ATI would be able to offer performance scaling on par with what we've seen on NVIDIA's SLI, but these initial numbers, despite being run on early hardware/drivers, are quite promising.
Pricing and AvailabilityWhat ATI has here at Computex is a very early sample of what CrossFire can do. Much like NVIDIA has had, ATI will encounter growing pains of their own with CrossFire. Most motherboard manufacturers are telling us to expect CrossFire motherboards by the end of July at the earliest, but more realistically, we can expect retail availability sometime in August.
The price point will be competitive with nForce4 Ultra (not SLI) motherboards, thanks to more aggressive chipset pricing on behalf of ATI. Also note that not all manufacturers will be producing both AMD and Intel CrossFire solutions. For example, MSI is producing an AMD CrossFire motherboard, while ASUS is currently only producing an Intel CrossFire solution. The problem is that most CrossFire manufacturers are also nForce4 SLI manufacturers and they have to be careful not to confuse their customers by offering two products that compete with one another. Choice is a good thing, but from a sales standpoint, it can sometimes be a difficult pill to swallow.
Despite the strong showing at Computex, most motherboard manufacturers have stated that they don't expect ATI's CrossFire chipsets to really make a dent in this year's shipments. ATI's high-end chipset market share will still remain very low in comparison to NVIDIA, but the long term outlooks are definitely positive. Much like they have done in the graphics industry, ATI will provide good balance to NVIDIA in the chipset business now that NVIDIA is king of the high-end AMD market.
The pricing of CrossFire X800 and X850 cards is listed in the following chart along with the existing products that each CrossFire part supports.
On the high end, we are definitely looking at an expensive upgrade. Those who want the ultimate in performance can be expected to shell out the cash. The X800 versions CrossFire are a little more compelling in terms of affordability.
It will be interesting to see how these CrossFire parts move in price as they are very targeted in application as opposed to NVIDIA's parts, which are marketed as standalone graphics cards that could be used in SLI.
Final WordsAnyone need a quick recap?
On the hardware side, ATI is launching a multi-GPU solution called CrossFire that can be added to any existing Radeon X800 or X850 graphics card when run in (to be verified) any motherboard with 2 physical x16 PCI Express slots. Also being announced is ATI's push further into the high end chipset space with their Radeon Xpress 200 CrossFire-Edition. This chipset can be used on motherboards for Intel or AMD solutions and will provide 2 x16 physical, x8 electrical PCI Express slots for CrossFire support. ATI CrossFire-Edition motherboards could also support NVIDIA's SLI cards if NVIDIA's drivers were properly adapted.
Setting up CrossFire on a system that uses selector ICs to allow BIOS control of PCI Express slots makes hardware installation easier than SLI. All that is required is to insert the graphics cards and then connect them with the external dongle. Of course, not all ATI solutions will include this feature. Using SLI, a bridge must be installed inside the case. This is a more elegant solution than a dongle, as it is out of sight, but switching from CrossFire mode to a 4-monitor setup is as easy as changing the way cables are plugged into the back of the computer. In order to use multi-monitor configurations on an NVIDIA SLI board, the SLI bridge must be removed. We don't consider the SLI selector card on the motherboard to be an advantage or disadvantage, as some of ATI's partners will be implementing selector cards or terminator cards rather than the BIOS configurable selector ICs. Really, both companies have pluses and minuses, and we leave it to the end user to decide whether the internal bridge or external dongle fits their needs better.
As far as software support goes, CrossFire will offer AFR, split frame, and supertiling rendering modes. Two new types of AA (called Super AA modes) will also be enabled by CrossFire: 8x/10x and 12x/14x with combine MSAA (the latter two modes including SSAA). We expect the same games (or types of games) to run using one of the three performance modes (AFR, split frame, or supertiling) and run well under SLI, while all games will be accelerated under any Super AA mode.
Had enough yet? Our initial performance test on prerelease hardware and drivers shows roughly 50 to 85 percent improved performance under Doom 3 from CrossFire. This indicates that we could see very good performance from CrossFire when it is finally released. Our initial tests aren't enough to draw any firm conclusions, especially in comparison to SLI performance, but we are looking forward to running a full suite of tests on the hardware.
The down sides of CrossFire mainly stem from the motherboard chipset. Either adding cost through the selector ICs or limiting convenience with a terminator card or SLI Style selector card is a tough call for vendors. Supporting two x8 electrical PCI Express slots does limit potential bandwidth and therefore, possibilities open to software developers. This isn't any better than what NVIDIA has to offer, so ATI need not worry much here. With vendors either using a ULi southbridge, or the (currently) buggy ATI southbridge, we may want to pay close attention to whose hardware is on the board. As far as the CrossFire card itself, we would prefer to see the 16 pipe CrossFire card not drop to 12 pipes when paired with a 12 pipe card (at least in split frame rendering). ATI's thinking is that the 16 pipe card would always be waiting on the 12 pipe, but in split frame rendering, giving more work to the 16 pipe card would help balance the performance. We just believe in people getting what they pay for.
If ATI can get CrossFire out to the market in good volume (for its potential demand), we could have an excellent alternative to SLI on our hands. ATI is also working on licensing CrossFire to SiS, so we may see SiS based boards with CrossFire support early next year as well. Exhaustive performance tests remain to be run, but from a feature standpoint, CrossFire looks good. We would like to see CrossFire offerings for Radeon cards slower than the X800, but other than that, we will have to sit back and wait for hardware to draw more conclusions.