Of the GPU and Shading

This is my favorite part, really. After the CPU has started sending draw calls to the graphics card, the GPU can begin work on actually rendering the frame containing the input that was generated somewhere in the vicinity of 3ms to 21ms ago depending on the software (and it would be an additional 1ms to 7ms for a slower mouse). Modern, complex, games will tend push up to the long end of that spectrum, while older games (or games that aren't designed to do a lot of realistic simulation like twitch shooters) will have a lower latency.

Again, the actual latency during this stage depends greatly on the complexity of the scene and the techniques used in the game.

These days, geometry processing and vertex shading tend to be pretty fast (geometry shading is slower but less frequently used). With features like instancing and the fact that the majority of detail is introduced via the pixel shader (which is really a fragment shader, but we'll dispense with the nit picking for now). If the use of tessellation catches on after the introduction of DX11, we could see even less actual time spent on geometry as the current level of detail could be achieved with fewer triangles (or we could improve quality with the same load). This step could still take a millisecond or two with modern techniques.

When it comes to actual fragment generation from the geometry data (called rasterization), the fixed function hardware and early z / z culling techniques used make this step pretty fast (yet this can be the limiting factor in how much geometry a GPU can realistically handle per frame).

Most of our time will likely be spent processing pixel shader programs. This is the step where every pin point spot on every triangle that falls behind the area of a screen space pixel (these pin point spots are called fragments) is processed and its color determined. During this step, texture maps are filtered and applied, work is done on those textures based on things like the fragments location, the angle of the underlying triangle to the screen, and constants set for the fragment. Lighting is also part of the pixel shading process.

Lighting tends to be one of the heaviest loads in a heavily loaded portion of the pipeline. Realistic lighting can be very GPU intensive. Getting into the specifics is beyond the scope of this article, but this lighting alone can take a good handful of milliseconds for an entire frame. The rest of the pixel shading process will likely also take multiple milliseconds.

After it's all said and done, with the pixel shader as the bottleneck in modern games, we're looking at something like 6ms to 25ms. In fact, the latency of the pixel shaders can hide a lot of the processing time of other parts of the GPU. For instance, pixel shaders can start executing before all the geometry is processed (pixel shaders are kicked off as fragments start coming out of the rasterizer). The color/z hardware (render outputs, render backends or ROPs depending on what you want to call them) can start processing final pixels in the framebuffer while the pixel shader hardware is still working on the majority of the scene. The only real latency that is added by the geometry/vertex processing portion of the pipeline is the latency that happens before the first pixels begin processing (which isn't huge). The only real latency added by the ROPs is the processing time for the last batch of pixels coming out of the pixel shaders (which is usually not huge unless really complicated blending and stencil technique are used).

With the pixel shader as the bottleneck, we can expect that the entire GPU pipeline will add somewhere between 10ms and 30ms. This is if we consider that most modern games, at the resolutions people run them, produce something between 33 FPS and 100 FPS.

But wait, you might say, how can our framerate be 33 to 100 FPS if our graphics card latency is between 10ms and 30ms: don't the input and CPU time latencies add to the GPU time to lower framerate?

The answer is no. When we are talking about the total input lag, then yes we do have to add these latencies together to find out how long it has been since our input was gathered. After the GPU, we are up to something between 13ms and 58ms of input lag. But the cool thing is that human response happens in parallel to input gathering which happens in parallel to CPU time spent processing game logic and draw calls (which can happen in parallel to each other on multicore CPUs) which happens in parallel to the GPU rendering frames. There is a sequential path from input to the screen, but we can almost look at this like a heavily pipelined path where each stage operates in parallel on a different upcoming frame.

So we have the GPU rendering the previous frame while simulation and game logic are executing and input is being gathered for the next frame. In this way, the CPU can be ready to send more draw calls to the GPU as soon as the GPU is ready (provided only that we are not CPU limited).

So what happens after the frame is finished? The easy answer is a buffer swap and scanout. The subtle answer is mounds of potential input lag.

Parsing Input in Software and the CPU Limit Scanout and the Display
Comments Locked

85 Comments

View All Comments

  • Zolcos - Thursday, July 16, 2009 - link

    The article is logically inconsistent. On page 1 it states "input lag is defined as the delay between the when a user does something with an input device and when that action is reflected on the monitor" and on page two it has "Input lag starts from before we even react".
  • DerekWilson - Thursday, July 16, 2009 - link

    i'll fix that...

    "The impact of input lag is compounded by what goes on before we even react."
  • yacoub - Thursday, July 16, 2009 - link

    The input lag everyone's most concerned with is the amount the display adds, because while all the rest is consistent, displays add a variable amount depending on which one you get. The ones that add more than ~20 ms add a NOTICEABLE amount (for most people) which takes input lag to the point that it becomes frustrating.
  • DerekWilson - Thursday, July 16, 2009 - link

    Part of the point was to explain that there is a lot at the end of the chain that can significantly impact performance and it's all about the display.

    If we do consider a 100ms threshold as valid, then based on our numbers from TF2 it is clear that we would end up in the >100ms input lag range with a monitor that adds more than 20ms of lag.

    And if we can't expect a twitch shooter to come in under the mark, how is everything else going to do? Not well I would imagine.

    I did think about looking at a wide array of monitors, but I feel like that might be better suited to a more focused review of monitor performance rather than an exploration of input lag in general.
  • yacoub - Thursday, July 16, 2009 - link

    Sure but for whatever reason, all of the lag prior to the display's lag is essentially transparent because it doesn't add up to be enough to be perceptible. This would equate to your threshold.

    When using a display with little or no noticeable display lag, any FPS game will feel very responsive and without discernible latency (assuming your GPU hardware is up to the task of rendering the frames quickly enough and you're not using one of the early optical mice from a decade ago that had terrible tracking refresh rates, etc etc).

    Yet simply switching to a display with higher latency is enough to make input latency noticeable and frustrating for FPS gamers. So the key issue is finding a TN or IPS display since those panel technologies have the least input lag. Of course most panels out there are -VA based panels because they are cheaper to produce than IPS, and TN may be snappy in display response but they have a number of other downsides.

    What matters most is getting panel makers focused on IPS-based displays (or new panel technologies that significantly reduce the input lag most non-TN displays presently suffer. And hey, the more they produce and sell, the lower the production cost per unit so the better the pricing can be and the more opportunities for improved technology to be added to the IPS design.
  • ocyl - Friday, July 17, 2009 - link

    @ yacoub
    Did you read the article at all?
  • yacoub - Friday, July 17, 2009 - link

    Yes. I must not be explaining myself well, so forget it.
  • DDuckMan - Saturday, December 18, 2010 - link

    While this article was great, I'm still not sure if I am better off disabling SLI to eliminate the syncronization lag or having the higher framerates with SLI enabled in twitch games. It seems to me that with 120Hz monitors, vsync (which I need for 3D) and SLI lag would not be as important as keeping the framerate above the monitor refresh rate. I don't have the equipment to properly test, so I am looking forward the the next article.

    http://hardforum.com/showthread.php?t=1569281
  • burner1980 - Thursday, March 10, 2011 - link

    Quote: "Input lag with multiGPU systems is something we will want to explore at a later time."

    I`m still waiting patiently and looking forward to a follow up investigation. The topic of input lag is VERY important to gamers who play FPS. I do notice it in racing games, too.
    I suggest to use true 120Hz monitors in the follow up article. They of course won`t reduce input lag, but help to reduce screen tearing and thus allowing to optimize one`s settings to reduce input lag while keeping screen tearing at a low enough level.
    I´m also courious if using a 3 screen setup a la Eyefinity oder Vision Surround using two GPUs will have an impact.
  • dmnwlv - Thursday, April 28, 2011 - link

    Impressive report.

    Regarding mouse polling rate (I may have missed it out):

    1) I believe the actual mouse input into the CPU is already calculated and the end result (of that action) already registered before you get to see it on screen. It does not wait for the GPU/monitor to finish processing before determining the end result. Hence the influence of mouse response is even more substantial if we take out the whole chunk of lag times that were included in the total lag calculation here - Derek Wilson, pls correct me if I am wrong.

    Coupled with the predictive ability of human (also reported here) to react accordingly in advance from the existing state of game situation, it seems to match and explain why it is hard to imagine a few milliseconds of difference in mouse lag can have an impact to the overall gaming experience. The brain and reaction is (trying its best) interpolating and working in tandem with the CPU than the monitor.

    2) And another scenario where the user already intended to do a series of / continuous / extended action (eg, drawing a long curve line), does the response rate of the mouse play a part in drawing the most accurate curve that the person input/intended? - Maybe Derek can help on this as well.

    Thanks for the great report.

Log in

Don't have an account? Sign up now