Stage 1: "The Front End"

The first stage of the graphics pipeline, as we mentioned before, is the initial stage where the application (in most cases a game) communicates with the API/driver and in turn talks to the GPU.

Instructions from the game being run are sent to the API, telling it to tell the hardware what the game wants to do. The API then interfaces with the graphics driver and passes on the API code. There is a real-time compiler in the graphics driver that takes the API code and essentially maps it to instructions that the particular GPU can understand, this is the first place where ATI and NVIDIA are divided. While both ATI and NVIDIA have the same approach, ATI's real-time compiler obviously generates instructions that only their GPUs can understand, while NVIDIA's compiler is tailored to NVIDIA GPUs only; makes sense, no?

What is contained in these instructions that are sent to the GPU? For all pre-DX8 code ("non-programmable"), vertices, indeces, commands that instruct the GPU where to go fetch vertices/indeces from or states (e.g. z-buffer enable/disable, texture filtering mode = trilinear, etc…) are the types of instructions you can expect to see in this stage. As we mentioned in our overview of the graphics pipeline, this stage mainly tells all of the other stages what to do, which is exactly what's happening here.

For DX8/DX9 code things are a bit different, as there are actual vertex/pixel programs that are sent to the driver and then mapped to GPU machine code. The difference between this approach and pre-DX8 code is that pre-DX8 code would trigger a series of events that would happen in the pipeline (e.g. lookup this texture, apply it to these pixels, repeat, etc…) whereas DX8/DX9 code offers much more flexibility for the programmer. Now the developer can tell the vertex shader FPUs to perform any task they would like, even to the point of coding non-graphics programs to run on the GPU. When DX8/DX9 code is sent to the GPU, the processor acts much more like a conventional CPU; a GPU running DX8/DX9 code is a lot like a CPU running any application you would on your computer.

Both ATI and NVIDIA are very stringent on details involving these front end stages of their pipelines, but as you can guess it is quite difficult to compare the two here. The efficiency and performance of the driver's real-time compiler is extremely important to how well these first set of operations perform.

The one thing you have to keep fresh in your mind about the graphics pipeline is that many parts of the pipeline are extremely parallel in nature, and thus optimizing the amount of parallelism in the code the compiler generates is key to extracting the full performance potential out of any GPU. The compiler dictates what registers data will be stored in, it handles any and all bundling of instructions and as we just mentioned, attempts to extract as much Instruction Level Parallelism (ILP) from the code as possible.

If you've ever wondered how newer drivers can increase performance, improvements in the driver's real-time compiler are often the reason for performance gains. The beauty of having a real-time compiler in the driver is that we don't have to wait for applications to take advantage of the hardware (for the most part) for us to get a performance boost from a brand new architecture. This is in sharp contrast to how things work in the CPU world where it takes a number of revisions before an application is recompiled with optimizations for a brand new CPU architecture. The graphics world will continue to use real-time compilers until the shader programs grow to be long enough that compiling at runtime is impossible, but it seems like that's going to be some time from now.

Along these lines, a good deal of the performance improvement with the new Detonator FX drivers is due to compiler optimizations for the new NV3x architecture. The next question is who makes a better compiler - ATI or NVIDIA? Compiler research and development is definitely new stomping grounds for both companies, but one thing is for sure, the more experience any driver team has with a new architecture, the better their compiler will be. In the case of ATI, they have been working on the R3xx compiler on real hardware since July of last year, whereas NVIDIA hasn't had nearly as much time with NV3x. The other factor aside from raw experience, is which driver team is more talented. We'll leave that one alone as we're not here to review resumes...

The Graphics Pipeline (continued) Stage 2: Vertex Processing
Comments Locked

19 Comments

View All Comments

Log in

Don't have an account? Sign up now