Let's talk precision (Stage 5 continued)

Historically the vertex engines have been composed of floating point units for much longer than the pixel pipes have been; remember, we're thinking about these stages not in generic marketing terms but in their true existence - as functional units.

With the transition to DX9 hardware came the move to floating point units in the shading/texturing stages. FPUs can provide greater precision (try representing 1.04542 as anything other than a decimal number and still retain full precision) but at the cost of much larger and more expensive hardware, primarily an increase in transistor count and die size. We've discussed the merits of FP precision in previous articles, for more information on what you can do with FP support hop over here.

Once again here's where the marketing folks do the industry a severe injustice when it comes to talking about the "precision" of these FPUs. Precision, when referring to any sort of floating point number, is given in the number of bits allocated to representing that number. In the case of DirectX 9, there are two modes that are supported - full and half precision.

Full precision, according to Microsoft's spec, calls for 24-bit representation for each color component for a total of 96-bits to represent each pixel. The DX9 specification also calls for partial precision support with 16-bit components, for 64-bit representation of every pixel. ATI's R3xx hardware supports both of these modes and helped define them in the original specification.

NVIDIA does things a little differently; they support the 16-bit partial precision mode, however their NV3x hardware does not support the 24-bit full precision mode. Instead, NVIDIA supports the IEEE-754 spec which calls for 32-bits of precision; this is not included as part of the DirectX 9 specification.

So when ATI mentions that NVIDIA does not support "full precision" they are, in some respects, correct - NVIDIA does not support the 24-bit precision mode as defined by Microsoft, they support a mode with greater precision. The question is, if a game doesn't request partial precision, what does NVIDIA's hardware do? According to NVIDIA, the NV3x will default to 32-bit precision unless requested to render in partial precision (16-bit) mode.

Now that we've got the precision issue squared away, let's talk about speed. Does executing in full precision occur any faster than if you execute in partial precision? The design of both ATI's and NVIDIA's hardware dictates that there is only one set of FPUs for the shading/texturing stage, and all operations regardless of their precision go through this set of FPUs. On ATI's R3xx GPUs this means that there are clusters of 4 x 24-bit FPUs, while on NVIDIA's NV3x GPUs there are clusters of 4 x 32-bit FPUs. All operations on these FPUs occur at the same latency, regardless of precision. So whether you're doing a 16-bit add or a 24/32-bit add, it occurs at the same rate.

The only penalty to using higher precision is of course in regards to memory bandwidth and memory size. So while it has been claimed in the past that full precision can only be had at the expense of speed, this isn't true from a computational standpoint, only from the perspective of the memory subsystem.

Stage 5: Shading/Texturing Stages 6 & 7: Raster Operations & DRAM accesses
Comments Locked

19 Comments

View All Comments

Log in

Don't have an account? Sign up now