ATI's Radeon 8500 & 7500: A Previewby Anand Lal Shimpi on August 14, 2001 2:54 AM EST
- Posted in
One of the biggest features the Radeon 8500 boasts is a function of its R200 chip: HyperZ II. ATI claims that HyperZ II gives the Radeon 8500 an effective 12GB/s of peak memory bandwidth in spite of the fact that its memory subsystem can only offer 8.8GB/s. Here's a quick look back at what HyperZ is:
ATI's HyperZ technology is essentially composed of three features that work in conjunction with one another to provide for an "increase" in memory bandwidth. In reality, the increase is simply a more efficient use of the memory bandwidth that is there. The three features are: Hierarchical Z, Z-Compression and Fast Z-Clear. Before we explain these features and how they impact performance, you have to first understand the basics of conventional 3D rendering.
As we briefly mentioned before, the Z-buffer is a portion of memory dedicated to holding the z-values of rendered pixels. These z-values dictate what pixels and eventually what polygons appear in front of one another when displayed on your screen, or, if you're thinking about it in a mathematical sense, the z-values indicate position along the z-axis.
A traditional 3D accelerator processes each polygon as it is sent to the hardware, without any knowledge of the rest of the scene. Since there is no knowledge of the rest of the scene, every forward facing polygon must be shaded and textured. The z-buffer, as we just finished explaining, is used to store the depth of each pixel in the current back buffer. Each pixel of each polygon rendered must be checked against the z-buffer to determine if it is closer to the viewer than the pixel currently stored in the back buffer.
Checking against the z-buffer must be performed after the pixel is already shaded and textured. If a pixel turns out to be in front of the current pixel, the new pixel replaces (or is blended with in the case of transparency) the current pixel in the back buffer and the z-buffer depth updated. If the new pixel ends up behind the current pixel, the new pixel is thrown out and no changes are made to the back buffer (or blended in the case of transparency). When pixels are drawn for no reason, this is known as overdraw. Drawing the same pixel three times is equivalent to an overdraw of 3, which in some cases is typical.
Once the scene is complete, the back buffer is flipped to the front buffer for display on the monitor.
What we've just described is known as "immediate mode rendering" and has been used since the 1960's for still frame CAD rendering, architectural engineering, film special effects, and now, in most 3D accelerators found inside your PC. Unfortunately, this method of rendering results in quite a bit of overdraw, where objects that aren't visible are being rendered.
One method of attacking this problem is to implement a Tile Based Rendering architecture, such as what we saw with the PowerVR Series 3 based KYRO graphics accelerator from ST Micro. While that may be the ideal way of handling it, developing such an algorithm requires quite a bit of work; it took years for Imagination Technologies (the creator of the PowerVR chips) to get to the point they are today with their Tile Based Rendering architecture.
Although the Radeon 8500 doesn't implement a completely Tile Based Rendering architecture, it does borrow some deferred rendering features to increase efficiency in memory requests. From the above example of how conventional 3D rendering works, you can guess that quite a bit of memory bandwidth is spent on accesses to the Z-buffer in order to check to see if any pixels are in front of the one being currently rendered. ATI's HyperZ increases the efficiency of these accesses, so instead of attacking the root of the problem (overdraw), ATI went after the results of it (frequent Z-buffer accesses).
The first part of the HyperZ technology is the Hierarchical Z feature. This feature basically allows for the pixel being rendered to be checked against the z-buffer before the pixel actually hits the rendering pipelines. This allows useless pixels to be thrown out early, before the Radeon has to render them.
Next we have Z-Compression. As the name implies, this is a lossless compression algorithm (no data is lost during the compression) that compresses the data in the Z-buffer thus allowing it to take up less space, which in turn conserves memory bandwidth during accesses to the Z-buffer.
The final piece of the HyperZ puzzle is the Fast Z-Clear feature. Fast Z-Clear is nothing more than a feature that allows for the quick clearing of all data in the Z-buffer after a scene has been rendered. Apparently, ATI's method of clearing the Z-buffer is dramatically faster than other conventional methods of doing so.
Now that you know how HyperZ works, let's have a look at how HyperZ II improves on that. Hierarchical Z works by dividing the screen into a bunch of blocks (or tiles) and discards overdrawn pixels one block at a time. The original Radeon used 8x8 blocks while the Radeon 8500 uses smaller blocks (4x4). The benefit here is mainly one of efficiency, provided by smaller blocks (much like how NVIDIA uses smaller memory accesses to increase memory bandwidth efficiency).
The next improvement is that the Radeon 8500 is capable of discarding 64 pixels per clock instead of 8 on the original Radeon. For comparison purposes, the GeForce3 can discard 16 pixels per clock.
ATI also implemented an improved Z-Compression algorithm that, according to their spec sheets, gives them a 20% increase in Z-Compression performance.