It's Really Not Scanline Interleaving

So, how does this thing actually work? Well, when NVIDIA was designing NV4x, they decided it would be a good idea to include a section on the chip designed specifically to communicate with another GPU in order to share rendering duties. Through a combination of this block of transistors, the connection on the video card, and a bit of software, NVIDIA is able to leverage the power of two GPUs at a time.




NV40 core with SLI section highlighted.


As the title of this section should indicate, NVIDIA SLI is not Scanline Interleaving. The choice of this moniker by NVIDIA is due to ownership and marketing. When they acquired 3dfx, the rights to the SLI name went along with it. In its day, SLI was very well known for combining the power of two 3d accelerators. The technology had to do with rendering even scanlines on one GPU and odd scanlines on another. The analog output of both GPUs was then combined (generally via a network of pass through cables) to produce a final signal to send to the monitor. Love it or hate it, it's a very interesting marketing choice on NVIDIA's part, and the new technology has nothing to do with its namesake. Here's what's really going on.

First, software (presumably in the driver) analyses what's going on in the scene currently being rendered and divides for the GPUs. The goal of this (patent-pending) load balancing software is to split the work 50/50 based on the amount of rendering power it will take. It might not be that each card renders 50% of the final image, but it should be that it takes each card the same amount of time to finish rendering its part of the scene (be it larger or smaller than the part the other GPU tackled). In the presentation that NVIDIA sent us, they diagramed how this might work for one frame of 3dmark's nature scene.




This shows one GPU rendering the majority of the less complex portion of a scene.


Since the work is split on the way from the software to the hardware, everything from geometry and vertex processing to pixel shading and anisotropic filtering is divided between the GPUs. This is a step up from the original SLI, which just split the pixel pushing power of the chips.

If you'll remember, Alienware was working on a multiple graphics card solution that, to this point, resembles what NVIDIA is doing. But rather than scan out and use pass through connections or some sort of signal combiner (as is the impression that we currently have of the Alienware solution), NVIDIA is able to send the rendered data digitally over the SLI (Scalable Link Interface) from the slave GPU to the master for compositing and final scan out.




Here, the master GPU has the data from the slave for rendering.


For now, as we don't have anything to test, this is mostly academic. But unless their SLI has an extremely high bandwidth, half of a 2048x1536 scene rendered into a floating point framebuffer will be tough to handle. More normally used resolutions and pixel formats will most likely not be a problem, especially as scenes increase in complexity and rendering time (rather than the time it takes to move pixels) dominates the time it takes to get from software to the monitor. We are really anxious to get our hands on hardware and see just how it responds to these types of situations. We would also like to learn (though testing may be difficult) whether the load balancing software takes into account the time it would take to transfer data from the slave to the master.

Scalable Link Interface Final Words
Comments Locked

40 Comments

View All Comments

  • SpeekinSfear - Tuesday, June 29, 2004 - link

    I also prefer NVIDIA 6800s over ATI X800s (Especially the GT model) but I requiring two video cards to get the best peformance is an inconsiderate progression. They're even encouraging devs to design stuff specially for this. It almost makes it like they cant make better video cards anymore or else like they care enough to try hard. Almost like they wanna slow down the video card performance pace, get everyone to buy two cards and make money from quantity over quality. NVIDIA better easy up if they know what's good for them. They're already pushing us hard enough to get PCIe*16 mobos. If they get their heads to high up in the clouds, they may start to lose business because no one will be willing to pay for their stuff. Or maybe Im just reading too much into this. :)
  • Jeff7181 - Tuesday, June 29, 2004 - link

    I thought it was a really big deal when they started combining vga cards and 3d accelerator cards into an "all-in-one" package. Now to get peak performance you're going to have two cards again... sounds like a step back to me... not to mention a HUGE waste of hardware. If they want the power of two NV4x GPU's, make a GeForce 68,000 Super Duper Ultra Extreme that's a dual GPU configuration.
  • NFactor - Tuesday, June 29, 2004 - link

    NVIDIA's new series of chips in my opinion are more impressive than ATI's. ATI may be faster but Nvidia is adding new technology like an onchip video encoder/decoder or this SLI technology. I look forward to seeing it in action.
  • SpeekinSfear - Tuesday, June 29, 2004 - link

    DerekWilson

    I get what you're sayin'. I just think it's crazy! I try to stay somewhat up to pace but this is just too much.
  • DerekWilson - Tuesday, June 29, 2004 - link

    SpeekinSfear --

    If you've got the money to grab a workstation board and 2x 6800 Ultras, I think you can spring for a couple hundred dollar workstation power supply. :-)
  • SpeekinSfear - Tuesday, June 29, 2004 - link

    Im sorry but I thought lots of people were having a hard enough time powering up one 6800 Ultra. Either is absurd or I dont know something. What kind of PSU are gonna need to pull this off?
  • TrogdorJW - Monday, June 28, 2004 - link

    The CPU is already doing a lot of work on the triangles. Doing a quick analysis that determines where to send a triangle shouldn't be too hard. The only difficulty is the overlapping triangles that need to be sent to both cards, and even that isn't very difficult. The load balancing is going to be of much greater benefit than the added computation, I think. Otherwise, you risk instances where 75% of the complexity is in the bottom or top half of the screen, so the actual performance boost of a solution like Alienware's would only be 33% instead of 100%.

    At one point, the article mentioned the bandwidth necessary to transfer half of a 2048x1536 frame from one card to the other. At 32-bit color, it would be 6,291,456 bytes, or 6 MB. If you were shooting for 100 FPS rates, then the bandwidth would need to be 600 MB/s - more than X2 PCIe but less than X4 PCIe if it were run at the same clockspeed as PCIe.

    If the connection is something like 16 bits wide (looking at the images, that seems like a good candidate - there are 13 pins on each side, I think, so 26 pins with 10 being for grounding or other data seems like a good estimate), then the connection would need to run at 300 MHz to manage transferring 600 MB/s. It might simply run at the core clockspeed, then, so it would handle 650 MB/s on the 6800, 700 MB/s on the GT, and 750+ MB/s on the Ultra and Ultra Extreme. Of course, how many of us even have monitors that can run at 2048x1536 resolution? At 1600x1200, you would need to be running at roughly 177 FPS or higher to max out a 650 MB/s connection.

    With that in mind, I imagine benchmarks with older games like Quake 3 (games that run at higher frame rates due to lower complexity) aren't going to benefit nearly as much. I believe we're seeing well over 200 FPS at 1600x1200 with 4xAA in Q3 with high-end systems, and somehow I doubt that the SLI connection is going to be able to push enough information to enable rates of 400+ FPS. (1600x1200x32 at 400 FPS would need 1400 MB/s of bandwidth between the cards just for the frames, not to mention any other communications.) Not that it really matters, though, except for bragging rights. :) More complex, GPU-bound games like Far Cry (and presumably Doom 3 and Half-life 2) will probably be happy to reach even 100 FPS.
  • glennpratt - Monday, June 28, 2004 - link

    Uhh, there's still the same number of triangles. If this is to be transparent to the game's then the card's themselves will likely split up the information.

    You come to some pretty serious conclusions based on exactly zero fact or logic.
  • hifisoftware - Monday, June 28, 2004 - link

    How much CPU load does it add? As I understand every triangle is analyzed as to where it will end up (top or bottom). Then this triangle is sent to the appropriate video card. This will add a huge load on CPU. Is this thing is going to be faster at all?
  • ZobarStyl - Monday, June 28, 2004 - link

    I completely agree with the final thought that if someone can purchase a dual-PCI-E board and a single SLI enabled card with the thought of grabbing an identical card later on, then this will definitely work out well. Plus once a system gets old and is relegated to other purposes (secondary rigs) you could still seperate the two and have 2 perfectly good GPU's. I seriously hope this is what nV has in mind.

Log in

Don't have an account? Sign up now