To reduce the wasted bandwidth that results from texturing occluded pixels, Z-occlusion culling has been implemented. Efforts at understanding GeForce3 Z-occlusion culling were thwarted by NVIDIA's reluctance to reveal what they perceived as proprietary information. In the absence of an official explanation, we can only make a few deductions. A very important principle is that early Z-testing, i.e. prior to texture mapping, is a tradeoff between bandwidth reduction and overall rendering time. The more extensive the early Z-testing, the greater the time spent and the longer the overall rendering time. It is also important that the application sends objects for rendering approximately in a front to back sequence. With that in mind, three considerations for early Z-testing are discussed.

First, does early Z-testing involve evaluating whole polygons or the depth value of each pixel? Evaluating whole polygons may take place even earlier in the graphics pipeline, i.e prior to triangle setup. Triangular artifacts may result if the criteria for rejecting polygons are overly aggressive. In addition, occluding polygons that intersect may not be resolved satisfactorily. On the other hand, evaluating pixels are less prone to visual artifacts. From a reading of the GeForce3 whitepapers, one may infer a per pixel process is being used.

Second, we must apply early Z-testing very selectively when there is a reasonably good chance that the pixels or polygons that are being tested are in fact occluded. This especially holds true in a per pixel process. One way this may be achieved is by choosing pixels in that are in a foremost position. NVIDIA has a patent to greatly accelerate the generation of pixels with high pixel to texel ratios during triangle setup. A pixel with a high pixel to texel ratio would seem like a good candidate for use as an occluder. Furthermore, being able to generate such pixels a few orders of magnitude faster than the usual triangle setup is a big plus.

Third, is the early Z-testing on-chip or off-chip? Obviously, being able to keep operations on-chip will be tremendously fast, but there are limits to the amount of depth information that may be stored on-chip. So, having an off-chip repository might be a more flexible and a more scaleable as far as storage space is concerned. The bandwidth overhead may be amortized by a crossbar memory controller that load balances itself to service early Z checks.

Lightspeed Memory Architecture Crossbar memory controller
Comments Locked

0 Comments

View All Comments

Log in

Don't have an account? Sign up now