Lightspeed Memory Architecture

To any of you that have been on top of the slew of NV20 rumors that have appeared over the past year you're probably expecting NVIDIA's next buzzword to explain to be Hidden Surface Removal. We're sorry to disappoint, but that particular term isn't in their vocabulary this time around, however there is something much better.

The GeForce3's memory controller is actually drastically changed from the GeForce2 Ultra. Instead of having a 128-bit interface to memory, there are actually four fully independent memory controllers that are present within the GPU in what NVIDIA likes to call their Crossbar based memory controller.

These four memory controllers are each 32-bits in width that are essentially interleaved, meaning that they all add up to the 128-bit memory controller we're used to, and they do all support DDR SDRAM. The crossbar memory architecture dictates that these four independent memory controllers are also load balancing in reference to the bandwidth they share with the rest of the GPU.

The point of having four independent, load balanced memory controllers is for increased parallelism in the GPU (is anyone else picking up on the fact that this is starting to sound like a real CPU?). The four narrower memory controllers come quite in handy when dealing with a lot of small datasets. If the GPU is requesting 64-bits of data, the GeForce2 Ultra uses a total of 256-bits of bandwidth (128-bit DDR) in order to fetch it from the local memory. This results in quite a bit of wasted bandwidth. However in the case of the GeForce3, if the GPU requests 64-bits of data, that request can be handled in 32-bit chunks, leaving much less bandwidth unused. Didn't your mother ever tell you that it's bad to leave food wasted? It looks like NVIDIA is finally listening to their mother.

The next part of this Lightspeed Memory Architecture is the Visibility Subsystem. This is actually the closest thing to "HSR" that exists in the GeForce3. As we are all aware of, when drawing a 3D scene, there is something called overdraw; where a pixel or polygon that isn't seen on the monitor, is rendered and outputted to the screen anyways. ATI managed to combat this through the use of what they called Hierarchical-Z. NVIDIA's Visibility Subsystem is identical to this.

What the feature does is simple; it is an extremely fast comparison of values in the z-buffer (what stores how "deep" pixels are on the screen, and thus whether they are visible or not) to discard those values (and their associated pixels) before sending them to the frame buffer.

This technology isn't perfect, because there are going to be a number of cases in which the Visibility Subsystem fails to reject the appropriate z-values and there is some remaining overdraw. If you recall back to our Radeon SDR Review, we actually measured the usefulness of Hierarchical-Z in UnrealTournament and enabling it increased performance by a few percent.

The remaining two features that make up ATI's HyperZ are also mirrored in the GeForce3. The GeForce3 features the same lossless Z-buffer compression as the Radeon. The compression savings can be as great as 4:1, which is identical to what ATI claims that the Radeon's z-buffer compression can do as well.

Finally, NVIDIA claims that they have an equivalent to ATI's Fast Z-Clear. This function does a very quick clear of all the data in the Z-buffer, which we actually found to be quite useful in our UnrealTournament performance tests not too long ago. Enabling Fast Z-Clear alone resulted in a hefty increase. Interestingly enough, NVIDIA downplayed the usefulness of this feature. It wasn't until we asked if they had anything similar to it in our meeting with them that they mentioned it existed in the GeForce3; they did mention that they did not see any major performance gains from the feature, which is contrary to what we saw with the Radeon.

Remember that the Radeon has significantly less peak memory bandwidth than the GeForce2 Ultra and thus the GeForce3. The GeForce3 however is still able to benefit from the aforementioned features, although maybe not as much as the Radeon was and possibly more so in different areas than the Radeon as NVIDIA seemed to indicate with their downplaying of the importance of a Fast Z-Clear-esque function on the GeForce3.

Taking advantage of it all The truth is heard: FSAA is important
Comments Locked

0 Comments

View All Comments

Log in

Don't have an account? Sign up now