Parsing Input in Software and the CPU Limit

Before we get into software, for the sake of sanity, we are going to ignore context switching and we'll pretend that only the operating system kernel and the game are running and always get processor time exactly when they need it for as long as its needed (and never need it at the same time). In real life desktop operating systems, especially on single core processors, there will be added delay due to process scheduling between our game and other tasks (which is handled by the operating system) and OS background tasks. These delays (in extreme cases called starvation) can be somewhere between a handful of nanoseconds or on the microsecond level on modern systems depending on process prioritization, what else is happening, and how the scheduler is implemented.

Once the mouse has sent its report over USB to the PC and the USB root hub receives the data, it is up to the OS (for our purposes, MS Windows) to handle the data next. Our report travels from the USB root hub over the system bus (southbridge through the north bridge to the CPU takes +/- some nanoseconds depending on load), is put on an input stack (in this case the HID (Human Interface Device) stack), and a Windows OS message (WM_INPUT) is generated to let any user space software monitoring raw mouse input know that new data has arrived. Software written to take full advantage of hardware will handle the WM_INPUT message by reading the appropriate data directly from the HID stack after it gets the message that data is waiting.

This particular part of the process (checking windows messages and handling the WM_INPUT message) happens pretty fast and should be on the order of microseconds at worst. This is a hard delay to track down, as the real time this takes is dependent on what the programmer actually does. Latencies here are not guaranteed by either the motherboard chipset or Windows.

Once the software has the data (after at least 1ms and some microseconds in change), it needs to do something with it. This is hugely variable, as developers can choose to implement doing something with input at any of a number of points in the process of updating the game state for the next frame. The thing that makes the most sense to me would be to run your AI based on the previous input data, step through any scripted actions, update physics per object based on last state and AI decisions, then get user data and update player state/physics based on previous state and current input.

There are cases or design decisions that may require getting user input before doing some of these other tasks, so the way I would want to do it might not be practical. This whole part of the pipeline can be quite long as highly intelligent AI and immersive physics (along with other game scripting and state updates) can require massive amounts of work. At the least we have lots of sorting, branching, and necessarily serial computations to worry with.

Depending on when input is collected and the depth and breadth of the simulation, we could see input lag increase up to several milliseconds. This is highly game dependent, but it isn't something the end user has any control over outside of getting the fastest possible CPU (and this still won't likely change things in a perceivable way as there are memory and system latencies to consider and the GPU is largely the bottleneck in modern games). Some games are designed to be highly responsive and some games are designed to be highly accurate. While always having both cranked up to 11 would be great, there are trade offs to be made.

Unfortunately, that leaves us with a highly variable situation. The only way to really determine the input lag caused by game code itself is profile the code (which requires access to the source to be done right) or ask a developer. But knowing the specifics aren't as necessary as knowing that there's not much that can be done by the gamer to mitigate this issue. For the purposes of this article, we will consider game logic to typically add somewhere between 1ms and 10ms of input lag in modern games. This considers things like decoupling simulation and AI threads from rendering and having work done in parallel among other things. If everything were done linearly things would very likely take longer.

When we've got our game state updated, we then setup graphics for rendering. This will involve using our game state to update geometry and display lists on the CPU side before the GPU can start work on the next frame. The speed of this step is again dependent on the implementation and can take up a good bit of time. This will be dependent on the complexity of the scene and the number of triangles required. Again, while this is highly dependent on the game and what's going on, we can typically expect something between 1ms and 10ms for this part of the process as well if we include the time it takes to upload geometry and other data to the GPU.

Now, all the issues we've covered on this page go into making up a key element of game performance: CPU time. The total latency from front to back in this stage of a game engine creates a CPU limit on performance. When what comes after this (rendering on the GPU) takes less time than everything up to this point, we have hit the CPU limit. We can typically see the CPU limit when we drop resolution down to something ridiculously low on a high end card without seeing any real performance gain between that and the next highest resolution.

From the examples I've given here, if both the game logic and the graphics/geometry setup come in at the minimum latencies I've suggested should be typical, we could be CPU limited at as much as 500 frames per second. On the flip side, if both portions of this process push up to the 10ms level, we would never see a frame rate over 50 FPS no matter how fast the GPU rendered anything.

Obviously there is variability in games, and sometimes we see a CPU limit at less than 60 FPS even at the lowest resolution on the highest end hardware. Likewise, we can see framerates hit over 2000 FPS when drawing a static image (where game logic and display lists don't need to be updated) with a menu in front of it (like when a user hits escape in Oblivion with vsync off). And, again, multi-threaded software design on multi-core CPUs really middies up the situation. But this is near enough to illustrate the point.

And now it's on to the portion of realtime 3D graphics that typically incurs the most input lag before we leave the computer: the graphics hardware.

Reflexes and Input Generation Of the GPU and Shading
Comments Locked

85 Comments

View All Comments

  • psilencer - Tuesday, August 18, 2009 - link

    First time poster, so be gentle!

    For each of the cases you analyze the bandwidth and take the lag to be the inverse of the bandwidth. This is incorrect. Lag and bandwidth not related as such. Consider a road with a constant speed limit. Lag would be related to the length of the road (the time it takes for some signal starting at A to reach it's destination B). Bandwidth is related to number of lanes (how many signals you can send from A to B within some time). Although there is some relationship between the two, it is not the inverse.

    With this in mind, everything analyzed by this article is incorrect.

    Consider a mouse that has 500 reports/second. Taking the inverse gives 2ms, which is the average time between completed reports. However, you don't consider that multiple "reports" may be pipelined in the mouse. Say for example, your mouse has a camera, some simple processing logic to decipher the data from the camera, and then the usb interface. For simplicity, assume that these units process one and only report at a time (and bandwidth/latency would have the inverse relationship). In that case, each section works at 500 reports/second, and would have a latency of 2ms. However the total latency of the mouse would be at 6ms, since each report needs to go through each section.


    This also applies to the CPU and GPU.

    Sorry, if I'm completely wrong, just ignore this =P

  • siberx - Thursday, July 30, 2009 - link

    Fantastic article - I smile each time AnandTech posts one of these groundbreaking articles that just cuts straight through the BS and gets to the truth behind issues that have been muddled in hearsay and rumours for years.

    I am personally particularly sensitive to input lag, and with my current LCD even in a fast game like TF2 or UT I find the lag intolerable if vsync is enabled - I have to run with it disabled in just about any game demanding fast response.

    My question, however, is the effect that multi-gpu solutions have on input lag. I have never seen something describing exactly how both ATI and nVidia's multi-gpu solutions affect lag, as well as how different multi-gpu rendering modes (AFR, SFR, etc...) affect lag. I would assume that using a multi-gpu solution would, in most cases incur at least an extra frame of delay to mix or move frames between cards, etc... but an actual analysis of this would be very useful. It may, in fact, be worthwhile to disable multi-gpu when running an older twitch game to improve latency...

    Additionally, testing with a couple other LCDs to see how they compare latency-wise would be interesting - I get the feeling your Dell panel is a fair step faster than your standard-issue modern panel doing overdriving to reduce switching times...
  • race2 - Saturday, August 1, 2009 - link

    When you say that all non-Nvidia driver Triple Buffering for OpenGL programs are simply one frame flip queues, do you mean that D3DOverrider's forced Triple Buffering is a one frame flip queue as well?
  • race2 - Saturday, August 1, 2009 - link

    Sorry, first time posting here. Previous comment was not meant to be a reply.
  • arcsign - Sunday, July 26, 2009 - link

    It's nice to know that the whole input lag issue is finally getting some attention. I've been trying to find ways to improve it, without buying new hardware, for a little while now, and came across some options that might be of interest for future articles. (I don't have access to much in terms of equipment to measure these things, so my testing hasn't been so much empirical as it has "well, that seems a bit better... maybe.")

    -- The two that stick out in my mind as far as software options go are (at least for WinXP) the boot.ini options "/INTAFFINITY," and "/TIMERES= xxxxx." The former assigns all interrupts to the highest numbered core, and the latter changes the resolution of the Windows kernel timer.

    -- It would also be interesting to see what sort of effects overclocking might have on various latencies, as I've noticed that Windows doesn't always agree with the BIOS/CPU-Z as to the processor's speed, and in cases where a game uses Windows Performance Counters to calculate time deltas for networking/inputs/etc, if there are any counters that depend on an accurate cpu speed, this could present a problem. (Although this isn't directly related to input lag, it is related to the interaction between the game and the player...)

    -- AHCI multimedia timers versus TSC's (more of an issue in XP than more recent OS's, as I believe Vista and 7 both require the use of the AHCI timers) may also have a significant effect on gameplay.

    Anyways, nice article, and keep up the good work.
  • William Gaatjes - Saturday, July 25, 2009 - link

    Hello, you might find something interesting on the website of Avago .

    Avago technologies manufactures optical mouse chips.
    Another manufacturer is SGS thomson or st electronics.

    Here is a link to avago chips.

    http://www.avagotech.com/pages/en/navigation_inter...">http://www.avagotech.com/pages/en/navig.../navigat...

    You might find some information you seek there.




    I noticed you where writing about 3 keynumbers but you mention 4 on the page : "Reflexes and Input Generation".
  • William Gaatjes - Saturday, July 25, 2009 - link

    And a very nice article i forgot to add.

  • camylarde - Tuesday, July 21, 2009 - link

    Now all that remains is to incorporate a multiplayer fps game and dissect how network comunication affects it, and how that knowledge can be used to clearly select wallhackers and aimbotters from the regular pack, just by watching a demo of them, and doing basic math counts of their reported network lag.
  • DerekWilson - Monday, July 20, 2009 - link

    This is something we would love to do, and while it is on the table we may not have the time in the near term to get something like that up right now.

    But trust me, we've been thinking of many cool ways to use high speed footage :-)
  • JimboMahoney - Monday, July 20, 2009 - link

    I also found Fallout 3 extremely laggy until I edited the Fallout.ini file from this

    iPresentInterval=1

    to this:

    iPresentInterval=0

    (Thanks to TweakGuides.com for this tip).

    It seems that Fallout 3 has VSync enabled at all times, even if you disable it in the menu, unless you make this change. The game was pretty unpleasant to play before I did this (I never use VSync).

Log in

Don't have an account? Sign up now