The Parhelia Pipeline (continued)

Moving on down the pipeline we have the four pixel rendering pipelines of the Parhelia-512. This quad-pipeline approach is similar to what NVIDIA introduced with the GeForce256 or ATI with the Radeon 8500 or even 3DLabs with the P10; it's a choice that makes sense and thus Matrox has stuck to it. Where Matrox does differ from the competition is in the Parhelia's ability to process four textures per pipeline per clock as opposed to two in all competing products. By being able to process four textures per pipeline per clock the Parhelia-512 can offer significantly higher performance in next-generation games that make heavy use of multiple-textures. This is a safe bet by Matrox since it's much easier for a developer to use more texture layers than it is to use pixel shader programs given the complexity of writing pixel shader programs and the very few cards in the hands of users with full DX8 pixel shader compliance. This will change in the future but for now it does make a lot of sense.

Each one of these "quad texturing units" is flexible enough to allocate processing resources depending on the application at hand. For example, in a predominantly dual-textured game such as Quake III Arena, the Parhelia-512 can use the unused texturing resources to perform 8-tap anisotropic and trilinear filtering at virtually no performance hit; granted that this is more of a feature for today's games than tomorrows.

In each one of the quad texturing units the texture coordinates are calculated, the textures are loaded and filtered and finally the pixels are sent on to the pixel shaders of the Parhelia-512.

These pixel shaders are no more programmable than what's in the GeForce4 meaning that they are still effectively register combiners and not fully programmable. At the same time they work on integer data and not 32-bit floating point values which is required for DX9 compliance. The reason the Parhelia cannot claim these two key features is because of, once again, a lack of die-space. As the chip is built on a 0.15-micron process with 80 million transistors, Matrox had to make a number of tradeoffs in order to pack excellent performance under current and future DX8 applications; one of those tradeoffs happens to be pixel shader programmability. Just as 3DLabs mentioned to us during our P10 briefing, in order to make the 3D pipeline entirely floating-point you need to be on at least a 0.13-micron process which won't be mature enough (at TSMC at least) until this fall to use for a mass production GPU.

Compared to a GeForce4, the Parhelia-512's pixel shading stage is superior in that it has five pixel shader stages in each rendering pipeline (compared to the GeForce4's two). This gives the Parhelia-512 the ability to multipass much less frequently than the competition as it is not only able to process 5 pixel shader operations in a single pass per pipeline but it can also process 10 pixel shader operations across two pixel pipelines in a single pass if necessary. And as you know, the fewer passes made the more bandwidth and resources are conserved.

That brings the basic 3D pipeline of the Parhelia to an end before the data is finally sent out to the 256-bit DDR memory bus (256 x 2 equals that magic 512 number again). But there are two very important parts of the extended pipeline that we haven't mentioned yet so let's tackle those next.

The Parhelia Pipeline Hardware Displacement Mapping
Comments Locked

0 Comments

View All Comments

Log in

Don't have an account? Sign up now