Digging Deeper: Galloping Horses Example

Rather than pull out a bunch of math and traditional timing diagrams, we've decided to put together a more straight forward presentation. The diagrams we will use show the frames of an actual animation that would be generated over time as well as what would be seen on the monitor for each method. Hopefully this will help illustrate the quantitative and qualitative differences between the approaches.

Our example consists of a fabricated example (based on an animation example courtesy of Wikipedia) of a "game" rendering a horse galloping across the screen. The basics of this timeline are that our game is capable of rendering at 5 times our refresh rate (it can render 5 different frames before a new one gets swapped to the front buffer). The consistency of the frame rate is not realistic either, as some frames will take longer than others. We cut down on these and other variables for simplicity sake. We'll talk about timing and lag in more detail based on a 60Hz refresh rate and 300 FPS performance, but we didn't want to clutter the diagram too much with times and labels. Obviously this is a theoretical example, but it does a good job of showing the idea of what is happening.

First up, we'll look at double buffering without vsync. In this case, the buffers are swapped as soon as the game is done drawing a frame. This immediately preempts what is being sent to the display at the time. Here's what it looks like in this case:

 


Good performance but with quality issues.


 

The timeline is labeled 0 to 15, and for those keeping count, each step is 3 and 1/3 milliseconds. The timeline for each buffer has a picture on it in the 3.3 ms interval during which the a frame is completed corresponding to the position of the horse and rider at that time in realtime. The large pictures at the bottom of the image represent the image displayed at each vertical refresh on the monitor. The only images we actually see are the frames that get sent to the display. The benefit of all the other frames are to minimize input lag in this case.

We can certainly see, in this extreme case, what bad tearing could look like. For this quick and dirty example, I chose only to composite three frames of animation, but it could be more or fewer tears in reality. The number of different frames drawn to the screen correspond to the length of time it takes for the graphics hardware to send the frame to the monitor. This will happen in less time than the entire interval between refreshes, but I'm not well versed enough in monitor technology to know how long that is. I sort of threw my dart at about half the interval being spent sending the frame for the purposes of this illustration (and thus parts of three completed frames are displayed). If I had to guess, I think I overestimated the time it takes to send a frame to the display.

For the above, FRAPS reported framerate would be 300 FPS, but the actual number of full images that get flashed up on the screen is always only a maximum of the refresh rate (in this example, 60 frames every second). The latency between when a frame is finished rendering and when it starts to appear on screen (this is input latency) is less than 3.3ms.

When we turn on vsync, the tearing goes away, but our real performance goes down and input latency goes up. Here's what we see.

 


Good quality, but bad performance and input lag.


 

If we consider each of these diagrams to be systems rendering the exact same thing starting at the exact same time, we can can see how far "behind" this rendering is. There is none of the tearing that was evident in our first example, but we pay for that with outdated information. In addition, the actual framerate in addition to the reported framerate is 60 FPS. The computer ends up doing a lot less work, of course, but it is at the expense of realized performance despite the fact that we cannot actually see more than the 60 images the monitor displays every second.

Here, the price we pay for eliminating tearing is an increase in latency from a maximum of 3.3ms to a maximum of 13.3ms. With vsync on a 60Hz monitor, the maximum latency that happens between when a rendering if finished and when it is displayed is a full 1/60 of a second (16.67ms), but the effective latency that can be incurred will be higher. Since no more drawing can happen after the next frame to be displayed is finished until it is swapped to the front buffer, the real effect of latency when using vsync will be more than a full vertical refresh when rendering takes longer than one refresh to complete.

Moving on to triple buffering, we can see how it combines the best advantages of the two double buffering approaches.

 


The best of both worlds.


 

And here we are. We are back down to a maximum of 3.3ms of input latency, but with no tearing. Our actual performance is back up to 300 FPS, but this may not be reported correctly by a frame counter that only monitors front buffer flips. Again, only 60 frames actually get pasted up to the monitor every second, but in this case, those 60 frames are the most recent frames fully rendered before the next refresh.

While there may be parts of the frames in double buffering without vsync that are "newer" than corresponding parts of the triple buffered frame, the price that is paid for that is potential visual corruption. The real kicker is that, if you don't actually see tearing in the double buffered case, then those partial updates are not different enough than the previous frame(s) to have really mattered visually anyway. In other words, only when you see the tear are you really getting any useful new information. But how useful is that new information if it only comes with tearing?

What are Double Buffering, vsync and Triple Buffering? Wrapping It Up
Comments Locked

184 Comments

View All Comments

  • vegemeister - Tuesday, August 6, 2013 - link

    >in fact, at higher framerates there is always a higher chance of tearing than at lower frame rates (at 300 FPS tears will happen every frame, whereas at 40 FPS, tears cannot possibly happen every frame -- the lower the frame rate, the less likely or often tearing occurs).

    With vsync off, effectively every rendered frame tears. The only time you render a frame and don't get tearing is if you get lucky and accidentally swap during vblank.
  • Schmide - Friday, June 26, 2009 - link

    Actually if you read PrinceGaz and my discussion.

    When vsinc is on, the rendering of the next frame actually starts immediately after the previous frame and would provide no delay as long as the rendering time was less than the current refresh rate.

    The only real cost is memory.
  • DerekWilson - Friday, June 26, 2009 - link

    When rendering time is more than refresh time, double buffering with vsync can incur up to almost two full frames of lag (~33ms) in addition to the frame time.

    with triple buffering, this will be reduced to at most one frame of lag (~16.7ms) but this is the worst case scenario for triple buffering. average case will absolutely be less than this. average case for double buffering without vsync will be equal to this for the first frame that started being drawn to the screen (before any tear that may or may not happen). average case for double buffering with vsync will always be higher than triple buffering.
  • SonicIce - Friday, June 26, 2009 - link

    if you think vsync with triple buffering has the same performance as double buffering then i feel sorry for you
  • PrinceGaz - Friday, June 26, 2009 - link

    Your explanation of double-buffering and enabling vertical-sync are certainly correct, but your explanation of how triple-buffering works is not how I understand it works (I've read a few articles on such things over the years, yeah... sad).

    I believe triple-buffering is as follows:

    You have three buffers: A (front-buffer), B (back-buffer), C (third-buffer or second back-buffer). In my explanation I'm going to refer to any buffer swapping as copying from one buffer to another; how it is implemented by the hardware is irrelevant.

    Your explanation is correct up until the point that all three buffers have a frame written to them, so A is currently being displayed, B has the next frame to be displayed, and C has just been filled with another frame. At that point, you say C is moved to B, and a new frame starts being rendered into C; in other words the card is constantly rendering frames to the two back-buffers as fast as possible and updating B at every opportunity. So that a recently completed frame is available in B to be moved to A at the vertical-refresh.

    The way I understand triple-buffering works is that once B and C both have frames rendered to them waiting to be displayed, the graphics-card then pauses until the vertical-refresh, at which point B is copied to A to be displayed, C is moved to B, and the card is free to start work on rendering a new frame to fill the now empty C. No frames are thrown away, and the card is not constantly churning out frames which won't be displayed.

    The whole point behind triple-buffering was to prevent framerate slowdowns caused by stalling when using double-buffering with vsync at framerates BELOW the refresh rate, NOT to minimise lag caused by vsync with double-buffering at framerates ABOVE the refresh rate (slight lag when the card was churning out frames faster than the refresh-rate was not seen as a problem with vsync, but big framerate drops like from 55fps to 30fps when it couldn't keep up were a major problem worth fixing).

    It should be noted that there is no difference between the two methods (constantly updating the back-buffers like you say, or stalling once both are filled like I've read elsewhere) at framerates below the refresh-rate as the two back-buffers are never both filled; a frame will always be moved from B to A to be displayed, before the one being drawn to C is completed (which means it can be immediately be moved to B and work continued on another one to fill C).

    The difference is when the framerate is considerably higher than the refresh-rate. In your scenario, when the refresh occurs, the last frame the card has just rendered is displayed. At 100fps, that would be a frame completed no more than 0.01 seconds ago (because thats how quickly the card is churning out frames and pushing them into buffer B), meaning there is negligible lag (between 0 and 0.010 seconds).

    In my scenario, the new frame is one which began rendering two refreshes previous (it was rendered into C very quickly two refreshes back, moved to B at the last refresh, and at this refresh is finally moved to A and displayed). The lag is therefore always exactly two frames provided the card is capable of rendering a frame faster than the refresh-rate. At 60hz refresh the lag will therefore be a constant 0.033 seconds regardless of the framerate the card is capable of (provided it can maintain at least 60fps).

    Whilst the longer lag (0.033 vs 0-0.010) would be a disadvantage in some cases (your best option there is to use double-buffering with no vsync), it is a consistent lag which in most games will feel better. It also means your graphics-card isn't constantly producing frames many of which will never be seen.

    The only problem is I don't know who is right. What I've said happens is what I've read on several other sites over quite a few years. Your article today Derek is the first time I've heard of a triple-buffering which involves the card continually updating the back-buffers.
  • Touche - Friday, June 26, 2009 - link

    I agree. Every site and topic I've read about triple buffering said that it works like you've explained. That's why most people hate it. It does resolve framerate drop issues of DB+vsync, but introduces too much lag. I would really like Anandtech to check this and get back to us.
  • DerekWilson - Saturday, June 27, 2009 - link

    The problem and discrepancy come from the fact that MS implements render ahead in DX, and because the default is 3 frames people took this to be "triple buffering", but you could do 2 frame render ahead and no one is going to call it "double buffering" ...

    It's really a render queue rather than a page flipping method.

    This article describes what, when people are talking about page flipping, "triple buffering" should refer to. This is also the way OpenGL works when triple buffering is enabled.
  • Touche - Sunday, June 28, 2009 - link

    Have you seen this?

    http://msdn.microsoft.com/en-us/library/ms796537.a...">http://msdn.microsoft.com/en-us/library/ms796537.a...
    http://msdn.microsoft.com/en-us/library/ms893104.a...">http://msdn.microsoft.com/en-us/library/ms893104.a...
  • DerekWilson - Wednesday, July 1, 2009 - link

    What they are showing is 1 frame render ahead with vsync. In MS DX terms, this is a flip chain with 2 back buffers and a present interval of one.

    This is them calling it triple if uses three total buffers. This is still a flip queue and should be referred to as such to avoid confusion.
  • DerekWilson - Saturday, June 27, 2009 - link

    Actually, I need to clarify and say that this is my understanding of the way triple buffering with OpenGL works under windows at this time.

Log in

Don't have an account? Sign up now