Of the GPU and Shading

This is my favorite part, really. After the CPU has started sending draw calls to the graphics card, the GPU can begin work on actually rendering the frame containing the input that was generated somewhere in the vicinity of 3ms to 21ms ago depending on the software (and it would be an additional 1ms to 7ms for a slower mouse). Modern, complex, games will tend push up to the long end of that spectrum, while older games (or games that aren't designed to do a lot of realistic simulation like twitch shooters) will have a lower latency.

Again, the actual latency during this stage depends greatly on the complexity of the scene and the techniques used in the game.

These days, geometry processing and vertex shading tend to be pretty fast (geometry shading is slower but less frequently used). With features like instancing and the fact that the majority of detail is introduced via the pixel shader (which is really a fragment shader, but we'll dispense with the nit picking for now). If the use of tessellation catches on after the introduction of DX11, we could see even less actual time spent on geometry as the current level of detail could be achieved with fewer triangles (or we could improve quality with the same load). This step could still take a millisecond or two with modern techniques.

When it comes to actual fragment generation from the geometry data (called rasterization), the fixed function hardware and early z / z culling techniques used make this step pretty fast (yet this can be the limiting factor in how much geometry a GPU can realistically handle per frame).

Most of our time will likely be spent processing pixel shader programs. This is the step where every pin point spot on every triangle that falls behind the area of a screen space pixel (these pin point spots are called fragments) is processed and its color determined. During this step, texture maps are filtered and applied, work is done on those textures based on things like the fragments location, the angle of the underlying triangle to the screen, and constants set for the fragment. Lighting is also part of the pixel shading process.

Lighting tends to be one of the heaviest loads in a heavily loaded portion of the pipeline. Realistic lighting can be very GPU intensive. Getting into the specifics is beyond the scope of this article, but this lighting alone can take a good handful of milliseconds for an entire frame. The rest of the pixel shading process will likely also take multiple milliseconds.

After it's all said and done, with the pixel shader as the bottleneck in modern games, we're looking at something like 6ms to 25ms. In fact, the latency of the pixel shaders can hide a lot of the processing time of other parts of the GPU. For instance, pixel shaders can start executing before all the geometry is processed (pixel shaders are kicked off as fragments start coming out of the rasterizer). The color/z hardware (render outputs, render backends or ROPs depending on what you want to call them) can start processing final pixels in the framebuffer while the pixel shader hardware is still working on the majority of the scene. The only real latency that is added by the geometry/vertex processing portion of the pipeline is the latency that happens before the first pixels begin processing (which isn't huge). The only real latency added by the ROPs is the processing time for the last batch of pixels coming out of the pixel shaders (which is usually not huge unless really complicated blending and stencil technique are used).

With the pixel shader as the bottleneck, we can expect that the entire GPU pipeline will add somewhere between 10ms and 30ms. This is if we consider that most modern games, at the resolutions people run them, produce something between 33 FPS and 100 FPS.

But wait, you might say, how can our framerate be 33 to 100 FPS if our graphics card latency is between 10ms and 30ms: don't the input and CPU time latencies add to the GPU time to lower framerate?

The answer is no. When we are talking about the total input lag, then yes we do have to add these latencies together to find out how long it has been since our input was gathered. After the GPU, we are up to something between 13ms and 58ms of input lag. But the cool thing is that human response happens in parallel to input gathering which happens in parallel to CPU time spent processing game logic and draw calls (which can happen in parallel to each other on multicore CPUs) which happens in parallel to the GPU rendering frames. There is a sequential path from input to the screen, but we can almost look at this like a heavily pipelined path where each stage operates in parallel on a different upcoming frame.

So we have the GPU rendering the previous frame while simulation and game logic are executing and input is being gathered for the next frame. In this way, the CPU can be ready to send more draw calls to the GPU as soon as the GPU is ready (provided only that we are not CPU limited).

So what happens after the frame is finished? The easy answer is a buffer swap and scanout. The subtle answer is mounds of potential input lag.

Parsing Input in Software and the CPU Limit Scanout and the Display
POST A COMMENT

84 Comments

View All Comments

  • PrinceGaz - Friday, July 17, 2009 - link

    Just to add

    "For input lag reduction in the general case, we recommend disabling vsync"

    It is rather ironic that you used that phrase, when in the your previous article you were strongly stating the case for v-sync always being used (preferably with triple-buffering).

    Unless you are certain that nVidia's OpenGL implementation of triple-buffering works how you think it does (and not how most people think it does), posting articles may be unwise.
    Reply
  • DerekWilson - Sunday, July 19, 2009 - link

    In the "general case" we mean /when triple buffering is not available/ (real triple buffering is not available in the majority of games), in order to reduce input lag, vsync should be disabled.

    Here's a better break down of what we recommend:

    Above 60FPS:

    triple buffering > no vsync > vsync > flip queue (any at all)

    Always EXACTLY 60 FPS (very unlikely to be perfectly consistent naturally)

    triple buffering == vsync == flip queue (1 frame) > no vsync

    Below 60FPS (non-twitch shooter):

    triple buffering == flip queue (1 frame) > no vsync > vsync

    Below 60FPS (twitch shooter or where lag is a big issue):

    no vsync >= triple buffering == flip queue (1 frame) > vsync

    ... to us the fact that at less than 60 FPS you start on the exact same frame with either triple buffering, no vsync or with a 1 frame flip queue is enough -- you will get less than 1 frame of difference in the lag and the difference you get with no vsync is only on part of the frame anyway.

    ...

    And ...

    NVIDIA has absolutely CONFIRMED that they do OpenGL triple buffering in the way that I described it in the previous article. This confirmation is directly from their OGL driver team.

    I was just able to sit down with AMD two days ago and, while they don't do things in exactly the way I describe them, their OpenGL triple buffering does do something quite interesting at > 60FPS ... which I'll talk about in an upcoming article. They also said they are looking into what it would take to do what I was talking about -- but that they haven't yet because their workstation customers tend to care more about what happens at < 60fps (which is a case where a 1 frame flip queue == triple buffering).

    The situation with DirectX is a little more restrictive with both NVIDIA and AMD... I will also address this issue in an upcoming article.
    Reply
  • BoFox - Friday, July 17, 2009 - link

    Anandtech does yet another ground-breaking article, thanks to Derek Wilson and the use of a high-speed camera this time!

    It is rather interesting how you mention the human reaction time! To detect and process the image signal in your brain, and to react to it (usually faster if through reflex), are all part of the entire reaction time of 200-300 ms, correct? Well, we should be testing it on the professional gamers like Jonathan 'Fatal1ty' Wendel, to see how much he has tuned his reaction time in making it so reflexive that it's as low as 150ms or even lower. Why not try interviewing him and use your high-speed camera to test that on him, and on an average gamer?!? That would be a SUPER-awesome article here to read!!! :D

    About the rendering ahead, some games are designed to render ahead by as much as 3 frames, and changing that to 0 could incur a large performance hit--sometimes by as much as 50%. I have experienced this with some games a while back, cannot remember which ones exactly, and that was on a single video card (not when I was using SLI). The performance penalty would usually bring down the frame rates to below the refresh rate, thus making it very undesirable. There are some other articles cautioning against it, but as long as it does not make a difference in the game that you are playing, then great. At least, as long as it does not drop the frame rates to below the refresh rate, then great!

    Reply
  • DerekWilson - Sunday, July 19, 2009 - link

    If your framerate is above refresh rate (except for the multiGPU case) you should definitely not use anything more than 0 frames of render ahead. Triple buffering as I described in my previous article (and as NVIDIA does in OpenGL) will be a good option, but double buffered w/ vsync or no vsync will definitely be better than any sort of render ahead.

    If your performance is below refresh rate, you'll want to either use no vsync, exactly 1 frame render ahead, or triple buffering. double buffered vsync (0 flip queue/pre-rendered frames with vsync) will give you a significant drop in performance and increase in input lag when performance is below refresh. This is as you say -- by as much as 50%. But only when performance is already below refresh.
    Reply
  • nvmarino - Friday, July 17, 2009 - link

    Hey Derek, thanks for the great article. Nice job addressing many of the BS theories and comments about lag going on in the threads of your triple buffering article...

    And now I've got even more fuel for the fire of my lust for someone to release a 1920x1080 resolution display with support for 120Hz refresh rate at the INPUT and good black levels. If only I could set my PC to spit out 120Hz to my display, my 24Fps Blu-ray and 60Fps TV content would play without judder, AND my games would have zero tearing (well as long as they don't run above 120Fps) and minimal lag. Not only that, but I'd also have an ideal setup for shutter-based steroscopic 3d, and it would open the door for some of the new stuff like smoothing video via frame interpolation of 24P content to 120P currently going on in the PC videophile community. Can you say HTPC nirvana! Where's my 120Hz!!!!
    Reply
  • DerekWilson - Friday, July 17, 2009 - link

    yeah, i want a true 120hz 1920x1080 display too :-) mmmm that would be tasty. Reply
  • Belinda fox - Friday, July 17, 2009 - link

    If someones reaction times are up to 220ms and your results are around the 100ms mark surely this says something about the validity of results and conclusion.
    Sorry i never read all the article just first and last one, due to reaction times being mentioned as part of lag.
    Reply
  • BoFox - Friday, July 17, 2009 - link

    Just read the article first.

    If you had better common sense, you would figure out that the human response time of >200ms comes AFTER seeing what is on the monitor, which takes 100ms to display in the first place.

    Let's say that I appear just around the corner right in front of you, in a FPS game, like Unreal Tournament 3, for example. First, it takes like 50ms ping time for your computer to receive that information. Second, it takes your monitor 100ms to display the information. Third, it takes you 200ms to react to what you're seeing.
    Reply
  • lyeoh - Sunday, July 19, 2009 - link

    And AFTER you react to what you are seeing you press a key/button or move your mouse and the relevant portion of the input lag takes into effect (input device, game, network).

    With a 100+ms input lag handicap even on the LAN you could be "dead" before you even see anything or be able to do anything about it.

    So would be good if have some figures on wireless keyboards and mice vs wired (usb and ps2).

    I personally think wireless stuff will add a lot more lag due to the modulation/demodulation, and other "tricks" they have to do.
    Reply
  • DerekWilson - Friday, July 17, 2009 - link

    That is definitely an issue ...

    i didn't get into network play as it gets really sticky though ... the local client does do limited prediction based on last known info about other players -- if this prediction data is accurate enough at small intervals and if ping isn't bad then it isn't a huge issue ... but at the same time, servers may resolve hits (as apparent to the client) as hits even if they are misses (in reality as per what the player did) or it may not (the difference results in either increased or reduced effectiveness of techniques like bunny hopping). but it is way more complex and depends on how the client predicts things between server updates and how the server resolves differences between clients and all sorts of funky stuff.

    that's why i wanted to stick with mouse-to-display issues here ...

    but you are right that there are other factors (including human reaction time) that add up to make it so input lag can be an important slice of a whole chain of events and can affect the gaming experience.
    Reply

Log in

Don't have an account? Sign up now