Exploring Input Lag Inside and Out

Name: Exploring Input Lag Inside and Out
Item: Exploring Input Lag Inside and Out
Author: Derek Wilson

by Derek Wilson on July 16, 2009 12:45 PM EST

Posted in
GPUs

85 Comments | Add A Comment

85 Comments

Parsing Input in Software and the CPU Limit

Before we get into software, for the sake of sanity, we are going to ignore context switching and we'll pretend that only the operating system kernel and the game are running and always get processor time exactly when they need it for as long as its needed (and never need it at the same time). In real life desktop operating systems, especially on single core processors, there will be added delay due to process scheduling between our game and other tasks (which is handled by the operating system) and OS background tasks. These delays (in extreme cases called starvation) can be somewhere between a handful of nanoseconds or on the microsecond level on modern systems depending on process prioritization, what else is happening, and how the scheduler is implemented.

Once the mouse has sent its report over USB to the PC and the USB root hub receives the data, it is up to the OS (for our purposes, MS Windows) to handle the data next. Our report travels from the USB root hub over the system bus (southbridge through the north bridge to the CPU takes +/- some nanoseconds depending on load), is put on an input stack (in this case the HID (Human Interface Device) stack), and a Windows OS message (WM_INPUT) is generated to let any user space software monitoring raw mouse input know that new data has arrived. Software written to take full advantage of hardware will handle the WM_INPUT message by reading the appropriate data directly from the HID stack after it gets the message that data is waiting.

This particular part of the process (checking windows messages and handling the WM_INPUT message) happens pretty fast and should be on the order of microseconds at worst. This is a hard delay to track down, as the real time this takes is dependent on what the programmer actually does. Latencies here are not guaranteed by either the motherboard chipset or Windows.

Once the software has the data (after at least 1ms and some microseconds in change), it needs to do something with it. This is hugely variable, as developers can choose to implement doing something with input at any of a number of points in the process of updating the game state for the next frame. The thing that makes the most sense to me would be to run your AI based on the previous input data, step through any scripted actions, update physics per object based on last state and AI decisions, then get user data and update player state/physics based on previous state and current input.

There are cases or design decisions that may require getting user input before doing some of these other tasks, so the way I would want to do it might not be practical. This whole part of the pipeline can be quite long as highly intelligent AI and immersive physics (along with other game scripting and state updates) can require massive amounts of work. At the least we have lots of sorting, branching, and necessarily serial computations to worry with.

Depending on when input is collected and the depth and breadth of the simulation, we could see input lag increase up to several milliseconds. This is highly game dependent, but it isn't something the end user has any control over outside of getting the fastest possible CPU (and this still won't likely change things in a perceivable way as there are memory and system latencies to consider and the GPU is largely the bottleneck in modern games). Some games are designed to be highly responsive and some games are designed to be highly accurate. While always having both cranked up to 11 would be great, there are trade offs to be made.

Unfortunately, that leaves us with a highly variable situation. The only way to really determine the input lag caused by game code itself is profile the code (which requires access to the source to be done right) or ask a developer. But knowing the specifics aren't as necessary as knowing that there's not much that can be done by the gamer to mitigate this issue. For the purposes of this article, we will consider game logic to typically add somewhere between 1ms and 10ms of input lag in modern games. This considers things like decoupling simulation and AI threads from rendering and having work done in parallel among other things. If everything were done linearly things would very likely take longer.

When we've got our game state updated, we then setup graphics for rendering. This will involve using our game state to update geometry and display lists on the CPU side before the GPU can start work on the next frame. The speed of this step is again dependent on the implementation and can take up a good bit of time. This will be dependent on the complexity of the scene and the number of triangles required. Again, while this is highly dependent on the game and what's going on, we can typically expect something between 1ms and 10ms for this part of the process as well if we include the time it takes to upload geometry and other data to the GPU.

Now, all the issues we've covered on this page go into making up a key element of game performance: CPU time. The total latency from front to back in this stage of a game engine creates a CPU limit on performance. When what comes after this (rendering on the GPU) takes less time than everything up to this point, we have hit the CPU limit. We can typically see the CPU limit when we drop resolution down to something ridiculously low on a high end card without seeing any real performance gain between that and the next highest resolution.

From the examples I've given here, if both the game logic and the graphics/geometry setup come in at the minimum latencies I've suggested should be typical, we could be CPU limited at as much as 500 frames per second. On the flip side, if both portions of this process push up to the 10ms level, we would never see a frame rate over 50 FPS no matter how fast the GPU rendered anything.

Obviously there is variability in games, and sometimes we see a CPU limit at less than 60 FPS even at the lowest resolution on the highest end hardware. Likewise, we can see framerates hit over 2000 FPS when drawing a static image (where game logic and display lists don't need to be updated) with a menu in front of it (like when a user hits escape in Oblivion with vsync off). And, again, multi-threaded software design on multi-core CPUs really middies up the situation. But this is near enough to illustrate the point.

And now it's on to the portion of realtime 3D graphics that typically incurs the most input lag before we leave the computer: the graphics hardware.

Reflexes and Input Generation Of the GPU and Shading

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

85 Comments

View All Comments

PrinceGaz - Friday, July 17, 2009 - link
Just to add

"For input lag reduction in the general case, we recommend disabling vsync"

It is rather ironic that you used that phrase, when in the your previous article you were strongly stating the case for v-sync always being used (preferably with triple-buffering).

Unless you are certain that nVidia's OpenGL implementation of triple-buffering works how you think it does (and not how most people think it does), posting articles may be unwise.
DerekWilson - Sunday, July 19, 2009 - link
In the "general case" we mean /when triple buffering is not available/ (real triple buffering is not available in the majority of games), in order to reduce input lag, vsync should be disabled.

Here's a better break down of what we recommend:

Above 60FPS:

triple buffering > no vsync > vsync > flip queue (any at all)

Always EXACTLY 60 FPS (very unlikely to be perfectly consistent naturally)

triple buffering == vsync == flip queue (1 frame) > no vsync

Below 60FPS (non-twitch shooter):

triple buffering == flip queue (1 frame) > no vsync > vsync

Below 60FPS (twitch shooter or where lag is a big issue):

no vsync >= triple buffering == flip queue (1 frame) > vsync

... to us the fact that at less than 60 FPS you start on the exact same frame with either triple buffering, no vsync or with a 1 frame flip queue is enough -- you will get less than 1 frame of difference in the lag and the difference you get with no vsync is only on part of the frame anyway.

...

And ...

NVIDIA has absolutely CONFIRMED that they do OpenGL triple buffering in the way that I described it in the previous article. This confirmation is directly from their OGL driver team.

I was just able to sit down with AMD two days ago and, while they don't do things in exactly the way I describe them, their OpenGL triple buffering does do something quite interesting at > 60FPS ... which I'll talk about in an upcoming article. They also said they are looking into what it would take to do what I was talking about -- but that they haven't yet because their workstation customers tend to care more about what happens at < 60fps (which is a case where a 1 frame flip queue == triple buffering).

The situation with DirectX is a little more restrictive with both NVIDIA and AMD... I will also address this issue in an upcoming article.
BoFox - Friday, July 17, 2009 - link
Anandtech does yet another ground-breaking article, thanks to Derek Wilson and the use of a high-speed camera this time!

It is rather interesting how you mention the human reaction time! To detect and process the image signal in your brain, and to react to it (usually faster if through reflex), are all part of the entire reaction time of 200-300 ms, correct? Well, we should be testing it on the professional gamers like Jonathan 'Fatal1ty' Wendel, to see how much he has tuned his reaction time in making it so reflexive that it's as low as 150ms or even lower. Why not try interviewing him and use your high-speed camera to test that on him, and on an average gamer?!? That would be a SUPER-awesome article here to read!!! :D

About the rendering ahead, some games are designed to render ahead by as much as 3 frames, and changing that to 0 could incur a large performance hit--sometimes by as much as 50%. I have experienced this with some games a while back, cannot remember which ones exactly, and that was on a single video card (not when I was using SLI). The performance penalty would usually bring down the frame rates to below the refresh rate, thus making it very undesirable. There are some other articles cautioning against it, but as long as it does not make a difference in the game that you are playing, then great. At least, as long as it does not drop the frame rates to below the refresh rate, then great!
DerekWilson - Sunday, July 19, 2009 - link
If your framerate is above refresh rate (except for the multiGPU case) you should definitely not use anything more than 0 frames of render ahead. Triple buffering as I described in my previous article (and as NVIDIA does in OpenGL) will be a good option, but double buffered w/ vsync or no vsync will definitely be better than any sort of render ahead.

If your performance is below refresh rate, you'll want to either use no vsync, exactly 1 frame render ahead, or triple buffering. double buffered vsync (0 flip queue/pre-rendered frames with vsync) will give you a significant drop in performance and increase in input lag when performance is below refresh. This is as you say -- by as much as 50%. But only when performance is already below refresh.
nvmarino - Friday, July 17, 2009 - link
Hey Derek, thanks for the great article. Nice job addressing many of the BS theories and comments about lag going on in the threads of your triple buffering article...

And now I've got even more fuel for the fire of my lust for someone to release a 1920x1080 resolution display with support for 120Hz refresh rate at the INPUT and good black levels. If only I could set my PC to spit out 120Hz to my display, my 24Fps Blu-ray and 60Fps TV content would play without judder, AND my games would have zero tearing (well as long as they don't run above 120Fps) and minimal lag. Not only that, but I'd also have an ideal setup for shutter-based steroscopic 3d, and it would open the door for some of the new stuff like smoothing video via frame interpolation of 24P content to 120P currently going on in the PC videophile community. Can you say HTPC nirvana! Where's my 120Hz!!!!
DerekWilson - Friday, July 17, 2009 - link
yeah, i want a true 120hz 1920x1080 display too :-) mmmm that would be tasty.
Belinda fox - Friday, July 17, 2009 - link
If someones reaction times are up to 220ms and your results are around the 100ms mark surely this says something about the validity of results and conclusion.
Sorry i never read all the article just first and last one, due to reaction times being mentioned as part of lag.
BoFox - Friday, July 17, 2009 - link
Just read the article first.

If you had better common sense, you would figure out that the human response time of >200ms comes AFTER seeing what is on the monitor, which takes 100ms to display in the first place.

Let's say that I appear just around the corner right in front of you, in a FPS game, like Unreal Tournament 3, for example. First, it takes like 50ms ping time for your computer to receive that information. Second, it takes your monitor 100ms to display the information. Third, it takes you 200ms to react to what you're seeing.
lyeoh - Sunday, July 19, 2009 - link
And AFTER you react to what you are seeing you press a key/button or move your mouse and the relevant portion of the input lag takes into effect (input device, game, network).

With a 100+ms input lag handicap even on the LAN you could be "dead" before you even see anything or be able to do anything about it.

So would be good if have some figures on wireless keyboards and mice vs wired (usb and ps2).

I personally think wireless stuff will add a lot more lag due to the modulation/demodulation, and other "tricks" they have to do.
DerekWilson - Friday, July 17, 2009 - link
That is definitely an issue ...

i didn't get into network play as it gets really sticky though ... the local client does do limited prediction based on last known info about other players -- if this prediction data is accurate enough at small intervals and if ping isn't bad then it isn't a huge issue ... but at the same time, servers may resolve hits (as apparent to the client) as hits even if they are misses (in reality as per what the player did) or it may not (the difference results in either increased or reduced effectiveness of techniques like bunny hopping). but it is way more complex and depends on how the client predicts things between server updates and how the server resolves differences between clients and all sorts of funky stuff.

that's why i wanted to stick with mouse-to-display issues here ...

but you are right that there are other factors (including human reaction time) that add up to make it so input lag can be an important slice of a whole chain of events and can affect the gaming experience.

Exploring Input Lag Inside and Out

Parsing Input in Software and the CPU Limit

Post Your Comment

85 Comments

View All Comments

PrinceGaz - Friday, July 17, 2009 - link

DerekWilson - Sunday, July 19, 2009 - link

BoFox - Friday, July 17, 2009 - link

DerekWilson - Sunday, July 19, 2009 - link

nvmarino - Friday, July 17, 2009 - link

DerekWilson - Friday, July 17, 2009 - link

Belinda fox - Friday, July 17, 2009 - link

BoFox - Friday, July 17, 2009 - link

lyeoh - Sunday, July 19, 2009 - link

DerekWilson - Friday, July 17, 2009 - link

Log in

Don't have an account? Sign up now