Realworld Testing w/ High Speed Video

Inspired by Mythbusters, we wanted to use a high speed camera to really measure what's happening with millisecond resolution. We were disappointed when we first looked into this, as "real" high speed cameras cost in excess of 10k USD. But then we stumbled upon the Casio Exilim EX-F1 and it's horrific quality but hugely fast video capability (actually, quality isn't that bad in VERY high light situations). At 1200 frames per second, we get video output at a resolution of 336x96, which is freaking tiny. But it's enough. All we need to do is count the frames between two events and multiply it by 0.833 (the number of milliseconds per frame) and we can assess the duration of incredibly short term events.

First up, we're looking at response time for the Dell 3007WFP display. Rather than relying on the manufacturer reported response time (which look at a limited number of cases and likely don't include worst case performance), we're using our camera to watch a few frames go by and observe how long it takes for pixels to transition from one color to the next. As we can see in the video, it takes nearly a full frame (~16ms) for some colors to change (pay attention to the yellow and black stripes with the orange lettering at the bottom of the screen). We timescaled this video down 10x over the already very slow 2.5% realtime speed to 0.25% realtime to help in seeing what's going on.

Getting to the actual game testing, we wanted to look at two titles: one a twitch shooter, and another with notoriously bad input lag. We chose Team Fortress 2 for our twitch shooter and Fallout 3 for our laggy game. We also did further testing on a CRT with TF2 just to see how low we could actually get input latency (with some pretty impressive results). Our test methodology was to set up the camera to capture both input generation (either a mouse click or move) and the display. The resolution of our camera was such that we had to really work to cram everything in the frame, but we got enough to be useful. We ran multiple tests and counted frames between when the hand hit the mouse and when we could see the result on the monitor.

For Team Fortress 2 we looked at three scenarios: no vsync, vsync enabled, and vsync enabled with our flip queue (render ahead / pre-rendered frames) set to zero. Our frametime at 2560x1600 was typically 9ms give or take small amount (but we never dropped below 100 FPS let alone refresh rate) with our GTX 280 remaining the performance bottleneck (CPU time was still significantly less than frametime).

First we'll look at the case with no vsync. Look for when the finger stops moving to when the shotgun blast appears on the wall (Valve makes sure to calculate the hit before it even starts the gun firing animation; sometimes hit and gun fire happen in the same frame, sometimes they are one frame separated). This certainly isn't frame by frame, but you should be able to download the clip from youtube and step through it yourself if you are so inclined.

This test took about 45 frames: between 37ms and 38ms from input generation to display. This is very good considering what we predicted as a best case. Average over multiple runs was a little higher, resulting in 51ms of typical input lag plus or minus about 12ms (our maximum being 63ms). This fluctuation is due to how all the factors we talked about either line up or don't.

When we turn on vsync, we see a lot more delay.

We can see even better how the hit happens before the gunfire animation again. This is very key in any twitch shooter. The lag is significantly longer with vsync enabled. This example shows about 94 or 95 frames (~79ms) input lag, which was our lowest input lag time in this test. Our average was about 89ms ranging up to 96ms at maximum. In this case, we lose the input latency advantage of no vsync but we gain more consistent input lag times and no tearing.

But we also decided to check and make sure that we weren't getting stuck with any penalties due to flip queuing as the average latency increase seemed a little high. So we set maximum pre-rendered frames to zero in the NVIDIA control panel (formerly maximum render ahead) to see what happened. Whenever your framerate is always higher than your refresh rate you never want any flip queueing going on (unless you are using a multiGPU configuration). So let's check it out.

And we see 70ms as our new best. Our worst case is 85ms, which is better than our previous average. Average latency here is 76ms +/- about 6ms (with the one 85ms exception). From this data, it seems as though Valve uses a 1 frame flip queue (1 frame render ahead) when vsync is enabled unless it is forced off in the driver. When the game has framerates of less than the refresh rate then this is fine, but when framerate is always higher than refresh it will absolutely incur a performance penalty along the lines of what we are seeing here.

For those who want vsync in TF2 and have consistently >60fps performance without multiGPU, absolutely set the flip queue to zero in the driver.

Next up is our Fallout 3 test and we'll compare a notoriously laggy game with a notoriously responsive game. Our framerate in this game was consistently between 38 and 45 frames per second during the testing, meaning that frametime will play a bigger role as it will add at something between 22ms and 27ms of latency.

And we have 136ms. And this test was much more consistent with our average latency being 136ms as well. Our low was 130ms and our high was 142ms, giving us the same tight spread of +/- 6ms we saw with TF2 and vsync, only vsync was not enabled here (which suggests some internal rate monitoring to me). Of course, this variability is now a much lower percentage of the overall input lag; not that that offers much consolation.

The number of complaints we've seen on the net about Fallout 3 and input lag (though notably a subjective improvement over Oblivion) combined with our own experience lead us to believe that somewhere between TF2's snappy 89ms worst case input latency (which we still couldn't feel) and Fallout 3's ~50% longer latency we cross the point of perceptibility and distraction. That 100ms number certainly looks like a good point for developers to keep input latency below.

Let's take a look at what happens when we turn on vsync.

Once again, 136ms. Our input lag stats were actually the same. And yes we rechecked to make sure vsync was enabled here and disabled there. Since Oblivion benefited from reducing the maximum number of frames rendered ahead, we tested this out as well. When set to 1 or 0 we saw no change in performance with Fallout 3 at least in this test. It's unclear whether different settings or performance levels would benefit, but for now it seems like Fallout 3 does really well at producing a consistently high input lag.

So that does it for our LCD testing. But we did want to learn whether or not the LCD panel added any extra delay over a CRT display. So we pulled out a Sony GDM F-500 encased in the purple and grey of a Sun monitor from an age gone by. We tested at lower resolutions as the display couldn't muster anything closer to 2560x1600, but at 1600x1200@60Hz our numbers still made sense comparatively: we would expect them to be a little lower but not by much.

So, our average here is about 43ms, which is a lower average than on the Dell LCD. Our minimum is 35ms and max is 52ms. In the best case, 35ms (for the CRT) isn't much better than 37ms (for the LCD) at all. Our worst case in this test was much better (as was our average) than the LCD. But it's clear that the LCD is capable of input latency as low as the test with this CRT. We could (and likely do) have an advantage on the CRT side due to resolution. We do have to consider that framerate was higher for the CRT test. It is likely that this difference reduced our worst case and average performance.

But let's take a look at the CRT with vsync.

In this test our best case is 78ms and worst case is 88ms. The average is 84ms. This is, again, very similar in worst case to the LCD test, but the average case is less better this time around. This makes sense as vsync normalizes performance to 60FPS but there may still be a slight advantage for the lower resolution. It seems as if the Dell 3007WFP isn't significantly disadvantaged when compared to a CRT.

But there is one more advantage over LCD panels: refresh rate. We tested 120Hz, though we did have to lower resolution once more to make it work. This will increase performance on its own, but the results are impressive.

With an average of 25ms input latency, minimum of 17ms and maximum of 29ms, the results here show an incredible impact on input latency when running at a higher framerate and refresh rate. These numbers come in really low. The 1152x864 resolution does provide about 200 FPS which will also have an impact on latency, but the fact that 1600x1200 had a higher framerate than 2560x1600 but showed similar latencies, we suspect that the limit was refresh rate. Although it could be that the higher refresh rate was able to allow the advantages of the higher framerate to shine through.

Either way, this is incredibly low input lag.

Scanout and the Display Combating Input Lag and Final Words
Comments Locked

85 Comments

View All Comments

  • DerekWilson - Sunday, July 19, 2009 - link

    It was bound to happen wasn't it?

    This has been around for a few years now, but (for obvious reasons) never made it into the mainstream gaming community. And, really, now that high performance mice are much more available it isn't as much of an issue.
  • Kaihekoa - Saturday, July 18, 2009 - link

    From the conclusion this point wasn't clear to me.
  • DerekWilson - Sunday, July 19, 2009 - link

    at present triple buffering in DirectX == a 1 frame flip queue in all cases ...

    so ... it is best to disable triple buffering in DirectX if you are over refresh rate in performance (60FPS generally) ...

    and it is better to enable triple buffering in DirectX if you are under 60 FPS.
  • Squall Leonhart - Wednesday, March 30, 2011 - link

    This is not always the case actually, there are some DirectX engines specifically the age of empires 3 engine as an example, that have hitching when moving around the map unless triple buffering is forced on the game.
  • billythefisherman - Saturday, July 18, 2009 - link

    First of all I'd like to say well done on the article you're probably the first person outside of game industry developers to have looked at this rather complex topic and certainly the first to take into account the whole hardware pipeline as well.

    Sadly though there are some gaping holes in your analysis mainly focused around the CPU stage. Sadly your CPU isn't going to run any faster than your GPU (and actually the same is correct in reverse) as one is dependent on the other (the GPU is dependent on the CPU). As such the CPU may finish all of its tasks faster than the GPU but the CPU will have to wait for the GPU to finish rendering the last frame before it can start on the next frame of logic.

    No game team in the world developing for a console is going to triple buffer their GPU command list.

    I intentionally added 'developing for a console' as this is also an important factor I'd say around 75% (being very conservative) of mainstream PC games now are based on cross platform engines. As such developers will more than likely gear their engines to the consoles as these make up the largest market segment by far.

    The consoles all have very limited memory capacities
    in comparison to their computational power and so developers will more than likely try to save memory over computation thus a double buffered command list is the norm. Some advanced console specific engines actually dropping down to a single command buffer and using CPU - GPU synchronisation techniques because of CPU's being faster than GPU's. This kind of thing isnt going to happen on the PC because the GPU is invariably faster than the CPU.

    When porting a game to PC a developer is very unlikely to spend the money re-engineering the core pipline because of the massive problems that can cause. This can be seen in most 'DirectX 10' games, as they simply add a few more post processing effects to soak up the extra power - you may call it lazy coding, I don't, it's just commercial reality these are businesses at the end of the day.

    So both your diagrams on the last page are wrong with regards to the CPU stage as they will be roughly the same amount of time as the GPU in the vast majority of frames because of frame locality ie one frame differs little to the next frame as the player tends not to jump around in space and so neighbouring frames take similar amounts of time to render.

    Onto my next complaint :
    "If our frametime is just longer than 16.67ms with vsync enabled, we will add a full additional frame of latency (with no work being done on the GPU) before we are able to swap the finished buffer to the front for scanout. The wasted work can cause our next frame not to come in before the next vsync, giving us up to two frames of latency (one because we wait to swap and one because of the delay in starting the next frame)."

    What are you talking about man!?! You don't drop down to 20fps (ie two more frames of latency) because you take 17ms to render your frame - you drop down to 30fps! With vsync enabled your graphics processor will be stalled until the next frame but thats all and you could possibly kick off your CPU to calculate the next frame to take advantage of that time. Not that thats going to make the slightest jot of difference if you're GPU bound because you have to wait for the GPU to finish with the command buffer its rendering (as you don't know where in the command buffer the GPU is).

    As I've said on the consoles there are tricks you can do to synchronise the GPU with the CPU but you don't have that low level control of the GPU on the PC as Nvidia/ATI don't want the internals of thier drivers exposed to one another.

    And as I've said not that you'd want to do such a thing on PC as the CPU is usually going to be slower than the GPU and cause the GPU to stall constantly hence the reason to double buffer the command buffer in the first place.

    I've also tried to explain in my posts to your triple buffering article why there's a lot cobblers in the next few paragraphs.
  • DerekWilson - Sunday, July 19, 2009 - link

    Fruit pies? ... anyway...

    Thanks for your feedback. On the first issue, the console development is one of growing importance as much as I would like for it not to be. At some point, though, I expect there will be an inflection point where it will just not be possible to build certain types of games for consoles that can be built on PCs ... and we'll have this before the next generation of consoles. Maybe it's a pipedream, but I'm hoping the development focus will shift back to the PC rather than continue to pull away (I don't think piracy is a real factor in profitability though I do believe publishers use the issue to take advantage of developers and consumers).

    And I get that with GPU as bottleneck you have that much time to use the CPU as well ... but you /could/ decouple CPU and GPU and gain performance or reduce lag. Currently, it may make sense that if we are GPU limited the CPU stage will effectively equal the GPU stage in latency -- and likewise that if we are CPU limited, the GPU state effectively equals the CPU stage (because of stalling) in input latency.

    Certainly it is a more complex topic than I illustrated, and if I didn't make that clear then I do apologize. I just wanted to get across the general idea rather than a "this is how it always is" kind of thing ... clearly Fallout 3 has even more input lag than any of my worst case scenarios account for even with 2 frame of image processing on the monitor ... I have no idea what they are doing ...

    ...

    As for the second issue -- you can get up to two frames of INPUT LAG with vsync enabled and 17ms GPU time.

    you will get up to these two frames (60Hz frames) of input lag at 30FPS ...

    I'm not talking about the frame rate dropping to 2 frames then 1 frame (20 FPS) ... I'm talking about the fact that, at best, your input is gathered 17ms before your frame completes on the GPU (1 frame of input lag) and (because it missed vsync) it will take another frame for that to hit the screen (for a total of two).
  • billythefisherman - Monday, July 20, 2009 - link

    I have to re-iterate: well done on tackling this rather complex issue, I applaud you! (I just wish you hadn't whipped up your punters so much in the benefits of triple buffering!)
  • Gastra - Saturday, July 18, 2009 - link

    For (quite a lot if you follow the links) of information on what an optical mouse see:
    http://hackedgadgets.com/2008/10/15/optical-mouse-...">http://hackedgadgets.com/2008/10/15/optical-mouse-...
  • DerekWilson - Sunday, July 19, 2009 - link

    That's pretty cool stuff ... And it lines up pretty well with our guess at mouse sensor resolution for the G9x.

    It'd still be a lot nicer if we could get the specs straight from the manufacturer though ...
  • PrinceGaz - Friday, July 17, 2009 - link

    "For input lag reduction in the general case, we recommend disabling vsync. For NVIDIA card owners running OpenGL games, forcing triple buffering in the driver will provide a better visual experience with no tearing and will always start rendering the same frame that would start rendering with vsync disabled."

    I'm going to ask this again I'm afraid :) Are you sure Derek? Does nVidia's triple-buffer OpenGL driver implementation do that, or is it just the same as what most people take triple-buffer rendering to be, that is having one additional back buffer to render to so as to provide a steady supply of frames when the framerate dips below the refresh rate? Have you got confirmation either from screenshots or something else (like nVidia saying that is how it works) that OpenGL triple-buffering is any different from Direct3D rendering, or how AMD handle it?.

    Because if you don't, then all you are saying is that triple-buffering is a second back-buffer which is filled to prevent lags when the framerate falls below the refresh rate. Do you know for sure that nVidia OpenGL drivers render constantly when in triple-buffer mode or are you only assuming they do so?

Log in

Don't have an account? Sign up now