ATI’s Radeon 9700 (R300) – Crowning the New Kingby Anand Lal Shimpi on July 18, 2002 5:00 AM EST
- Posted in
ATI was the first to market with their occlusion culling technology that they have called HyperZ. The third generation of HyperZ is present in the R300 and it basically contains faster versions of the three components present HyperZ II on the Radeon 8500.
ATI has not shed too much light on the improvements, but at this point it just seems like HyperZ III is faster simply because of the faster clock rate of the R300 GPU, the inclusion of "Early Z" (explained below) and maybe increases in the number of pixels Hierarchical Z can discard at a single time; there’s nothing wrong with that, there’s just not much new about it.
In case you’ve forgotten, here’s a quick explanation of the role of HyperZ III:
ATI's HyperZ technology is essentially composed of three features that work in conjunction with one another to provide for an "increase" in memory bandwidth. In reality, the increase is simply a more efficient use of the memory bandwidth that is there. The three features are: Hierarchical Z, Z-Compression and Fast Z-Clear. Before we explain these features and how they impact performance, you have to first understand the basics of conventional 3D rendering.
As we briefly mentioned before, the Z-buffer is a portion of memory dedicated to holding the z-values of rendered pixels. These z-values dictate what pixels and eventually what polygons appear in front of one another when displayed on your screen, or, if you're thinking about it in a mathematical sense, the z-values indicate position along the z-axis.
A traditional 3D accelerator processes each polygon as it is sent to the hardware, without any knowledge of the rest of the scene. Since there is no knowledge of the rest of the scene, every forward facing polygon must be shaded and textured. The z-buffer, as we just finished explaining, is used to store the depth of each pixel in the current back buffer. Each pixel of each polygon rendered must be checked against the z-buffer to determine if it is closer to the viewer than the pixel currently stored in the back buffer.
Checking against the z-buffer must be performed after the pixel is already shaded and textured. If a pixel turns out to be in front of the current pixel, the new pixel replaces (or is blended with, in the case of transparency) the current pixel in the back buffer and the z-buffer depth updated. If the new pixel ends up behind the current pixel, the new pixel is thrown out and no changes are made to the back buffer (or blended in the case of transparency). When pixels are drawn for no reason, this is known as overdraw. Drawing the same pixel three times is equivalent to an overdraw of 3, which in some cases is typical. Once the scene is complete, the back buffer is flipped to the front buffer for display on the monitor.
What we've just described is known as "immediate mode rendering"
and has been used since the 1960's for still frame CAD rendering, architectural
engineering, film special effects, and now, in most 3D accelerators found inside
your PC. Unfortunately, this method of rendering results in quite a bit of overdraw,
where objects that are not visible are being rendered.
One method of attacking this problem is to implement a Tile Based Rendering architecture, such as what we saw with the PowerVR based KYRO II graphics accelerator from ST Micro. While that may be the ideal way of handling it, developing such an algorithm requires quite a bit of work; it took years for Imagination Technologies (the creator of the PowerVR chips) to get to the point they are today with their Tile Based Rendering architecture.
Although the R300 doesn't implement a Tile Based Rendering architecture, it does borrow some deferred rendering features to increase efficiency in memory requests. From the above example of how conventional 3D rendering works, you can guess that quite a bit of memory bandwidth is spent on accesses to the Z-buffer in order to check to see if any pixels are in front of the one being currently rendered. ATI's HyperZ increases the efficiency of these accesses, so instead of attacking the root of the problem (overdraw), ATI went after the results of it (frequent Z-buffer accesses).
The first part of the HyperZ technology is the Hierarchical Z feature. This feature basically allows for the pixel being rendered to be checked against the z-buffer before the pixel actually hits the rendering pipelines. This allows useless pixels to be thrown out early, before the R300 has to render them. The R300 does add what ATI is calling Early Z that further sub-divides the Z-buffer down to the pixel level so that the card can achieve close to 100% efficiency in discarding occluded (hidden) pixels.
Next we have Z-Compression. As the name implies, no data is lost during the compression This compression algorithm compresses the data in the Z-buffer, thus allowing it to take up less space, which in turn conserves memory bandwidth during accesses to the Z-buffer.
The final piece of the HyperZ puzzle is the Fast Z-Clear feature. Fast Z-Clear is nothing more than a feature that allows for the quick clearing of all data in the Z-buffer after a scene has been rendered. Apparently, ATI's method of clearing the Z-buffer is dramatically faster than other conventional methods.
As we mentioned before, ATI’s HyperZ III comes in handy with improving
AA performance courtesy of their Z-Compression.