Scanout and the Display

Alright. So depending on the game, we are up to somewhere between 13ms and 58ms after our mouse was moved. The GPU just finished rendering and swapped the finished frame to the front buffer. What happens next is called scanout: the frame is sent out the DVI-I port over the cable and to the monitor.

If our monitor's refresh rate is 60Hz (as is typical these days), it will actually take something like 16ms to send the full frame to the monitor (plus there's about half a millisecond of "blanking" between frames being sent) giving us 16.67ms of transmission delay. In this case we are limited by the bandwidth capabilities of DVI, HDMI and DisplayPort and the timing standards put forth by VESA. So to send a full frame of anything to the display we will have 16.67ms of input lag added. Some monitors will display this data as it is received, but others will latch input meaning the full frame must be sent before it can be displayed (but let's not get too far ahead of ourselves). Either way, we will consider the latency of this step to be at least one frame (as the monitor will still take 16ms to draw the image either way).

So now we need to talk about vsync. Let's pretend we aren't using it. Let's pretend our game runs at a rock solid exact 60 FPS and our refresh rate is 60Hz, but the buffer swap happens half way between each vertical sync. This means every frame being scanned out would be split down the middle. The top half of the frame will be an additional 16.67ms behind (for a total of 33.3ms of lag). Of course, the bottom half, while 16.67ms newer than the top, won't have it's own top half sent until the next frame 16.67ms later.

In this particular case, the way the math works out if we average the latency of all the pixels on a split frame we would get the same average latency as if we enabled vsync. Unfortunately, when framerate is either higher or lower than refresh rate, vsync has the potential to cause tons of problems and this equivalence doesn't carry in the least.

If our frametime is just longer than 16.67ms with vsync enabled, we will add a full additional frame of latency (with no work being done on the GPU) before we are able to swap the finished buffer to the front for scanout. The wasted work can cause our next frame not to come in before the next vsync, giving us up to two frames of latency (one because we wait to swap and one because of the delay in starting the next frame). If our framerate is higher than 60 FPS, our GPU will have to stop working after rendering until the next vsync. This is a waste of resources and decreases overall performance, but definitely not by as much as if we use vsync at less than the monitor refresh. The upper limit of additional delay is 16.67ms minus frametime (less than one frame) rather than two full frames.

When framerate is lower than refresh rate, using either a 1 frame flip queue with vsync or triple buffering will allow the graphics hardware to continue doing rendering work while adding between 0 and 16.67ms of additional latency (the average will be between the two extremes). So you get the potential benefits of vsync (no tearing and synchronization) without the additional decrease in performance that occurs when no work gets done on the GPU. At framerates higher than refresh rate, when using a render queue, we do end up adding an additional frame of latency per number of frames we render ahead, so this solution isn't a very good one for mitigating input latency (especially in twitch shooters) in high framerate games.

Once the data is sent to the monitor, we've got more delay in store.

We've already mentioned that some LCDs latch the entire frame before display. Beyond this delay, some displays will perform image processing on the input (including scaling if this is not done on the graphics hardware). In some cases, monitors will save two frames to overdrive LCD cells to get them to respond faster. While this can improve the speed at which the picture on the monitor changes, it can add another 16.67ms to 33.3ms of latency to the input (depending on whether one frame is processed or two). Monitors with a game mode or true 120Hz monitors should definitely add less input lag than monitors that require this sort of processing.

Add, on top of all this, the fact that it will take between 2ms and 16ms for the pixels on the LCD to actually switch (response time varies between panels and depending on what levels the transition is between) and we are done: the image is now on the screen.

So what do we have total after the image is flipped to the front buffer?

One frame of lag for transmission (to display a full frame), up to 1 frame of lag if we enable triple buffering (or 1 frame render ahead and we run at less than refresh rate), up to two frames of lag if we just turn on vsync, at framerates higher than the refresh rate we we'll add an additional frame of lag for every frame we render ahead with vsync on, and zero to 2 frames of lag for the monitor to display the image (if it does extensive image processing).

So after crazy speed from the mouse to the front buffer, here we are waiting ridiculous amounts of time to get the image to appear on the screen. We add at the very very least 16.67ms of lag in this stage. At worst we're taking on between 66.67ms and 83.3ms which is totally unacceptable. And that's after the computer is completely done working on the image.

This brings our totals up to about 33ms to 80ms input lag for typical cases. Our worst case for what we've outlined, however, is about 135ms of latency between mouse movement and final display which could be discernible and might start to feel mushy. Sometimes game developers stray a bit and incur a little more input lag than is reasonable. Oblivion and Fallout 3 come to mind.

But don't worry, we'll take a look at some specific cases next.

Of the GPU and Shading Realworld Testing w/ High Speed Video
POST A COMMENT

84 Comments

View All Comments

  • psilencer - Tuesday, August 18, 2009 - link

    First time poster, so be gentle!

    For each of the cases you analyze the bandwidth and take the lag to be the inverse of the bandwidth. This is incorrect. Lag and bandwidth not related as such. Consider a road with a constant speed limit. Lag would be related to the length of the road (the time it takes for some signal starting at A to reach it's destination B). Bandwidth is related to number of lanes (how many signals you can send from A to B within some time). Although there is some relationship between the two, it is not the inverse.

    With this in mind, everything analyzed by this article is incorrect.

    Consider a mouse that has 500 reports/second. Taking the inverse gives 2ms, which is the average time between completed reports. However, you don't consider that multiple "reports" may be pipelined in the mouse. Say for example, your mouse has a camera, some simple processing logic to decipher the data from the camera, and then the usb interface. For simplicity, assume that these units process one and only report at a time (and bandwidth/latency would have the inverse relationship). In that case, each section works at 500 reports/second, and would have a latency of 2ms. However the total latency of the mouse would be at 6ms, since each report needs to go through each section.


    This also applies to the CPU and GPU.

    Sorry, if I'm completely wrong, just ignore this =P

    Reply
  • siberx - Thursday, July 30, 2009 - link

    Fantastic article - I smile each time AnandTech posts one of these groundbreaking articles that just cuts straight through the BS and gets to the truth behind issues that have been muddled in hearsay and rumours for years.

    I am personally particularly sensitive to input lag, and with my current LCD even in a fast game like TF2 or UT I find the lag intolerable if vsync is enabled - I have to run with it disabled in just about any game demanding fast response.

    My question, however, is the effect that multi-gpu solutions have on input lag. I have never seen something describing exactly how both ATI and nVidia's multi-gpu solutions affect lag, as well as how different multi-gpu rendering modes (AFR, SFR, etc...) affect lag. I would assume that using a multi-gpu solution would, in most cases incur at least an extra frame of delay to mix or move frames between cards, etc... but an actual analysis of this would be very useful. It may, in fact, be worthwhile to disable multi-gpu when running an older twitch game to improve latency...

    Additionally, testing with a couple other LCDs to see how they compare latency-wise would be interesting - I get the feeling your Dell panel is a fair step faster than your standard-issue modern panel doing overdriving to reduce switching times...
    Reply
  • race2 - Saturday, August 01, 2009 - link

    When you say that all non-Nvidia driver Triple Buffering for OpenGL programs are simply one frame flip queues, do you mean that D3DOverrider's forced Triple Buffering is a one frame flip queue as well? Reply
  • race2 - Saturday, August 01, 2009 - link

    Sorry, first time posting here. Previous comment was not meant to be a reply. Reply
  • arcsign - Sunday, July 26, 2009 - link

    It's nice to know that the whole input lag issue is finally getting some attention. I've been trying to find ways to improve it, without buying new hardware, for a little while now, and came across some options that might be of interest for future articles. (I don't have access to much in terms of equipment to measure these things, so my testing hasn't been so much empirical as it has "well, that seems a bit better... maybe.")

    -- The two that stick out in my mind as far as software options go are (at least for WinXP) the boot.ini options "/INTAFFINITY," and "/TIMERES= xxxxx." The former assigns all interrupts to the highest numbered core, and the latter changes the resolution of the Windows kernel timer.

    -- It would also be interesting to see what sort of effects overclocking might have on various latencies, as I've noticed that Windows doesn't always agree with the BIOS/CPU-Z as to the processor's speed, and in cases where a game uses Windows Performance Counters to calculate time deltas for networking/inputs/etc, if there are any counters that depend on an accurate cpu speed, this could present a problem. (Although this isn't directly related to input lag, it is related to the interaction between the game and the player...)

    -- AHCI multimedia timers versus TSC's (more of an issue in XP than more recent OS's, as I believe Vista and 7 both require the use of the AHCI timers) may also have a significant effect on gameplay.

    Anyways, nice article, and keep up the good work.
    Reply
  • William Gaatjes - Saturday, July 25, 2009 - link

    Hello, you might find something interesting on the website of Avago .

    Avago technologies manufactures optical mouse chips.
    Another manufacturer is SGS thomson or st electronics.

    Here is a link to avago chips.

    http://www.avagotech.com/pages/en/navigation_inter...">http://www.avagotech.com/pages/en/navig.../navigat...

    You might find some information you seek there.




    I noticed you where writing about 3 keynumbers but you mention 4 on the page : "Reflexes and Input Generation".
    Reply
  • William Gaatjes - Saturday, July 25, 2009 - link

    And a very nice article i forgot to add.

    Reply
  • camylarde - Tuesday, July 21, 2009 - link

    Now all that remains is to incorporate a multiplayer fps game and dissect how network comunication affects it, and how that knowledge can be used to clearly select wallhackers and aimbotters from the regular pack, just by watching a demo of them, and doing basic math counts of their reported network lag. Reply
  • DerekWilson - Monday, July 20, 2009 - link

    This is something we would love to do, and while it is on the table we may not have the time in the near term to get something like that up right now.

    But trust me, we've been thinking of many cool ways to use high speed footage :-)
    Reply
  • JimboMahoney - Monday, July 20, 2009 - link

    I also found Fallout 3 extremely laggy until I edited the Fallout.ini file from this

    iPresentInterval=1

    to this:

    iPresentInterval=0

    (Thanks to TweakGuides.com for this tip).

    It seems that Fallout 3 has VSync enabled at all times, even if you disable it in the menu, unless you make this change. The game was pretty unpleasant to play before I did this (I never use VSync).
    Reply

Log in

Don't have an account? Sign up now