The Start: The Rendering Pipeline In Detail

Before we can even discuss the concept of stuttering and other frame timing anomalies, we need to first take a look at a high-level overview of the Windows rendering pipeline. The pipeline isn’t particularly complex, but understanding where various stages of the process are in the hands of Windows, the CPU, the driver, and the video card is necessary to understand where bottlenecks and delays can occur.

At its most fundamental level, rendering a frame is a 3 part process. An application needs to pass data to Windows, Windows needs to manage the process and interface with the drivers, and finally once Windows and driver preparation is complete, a frame can be passed off to the GPU for final rendering and display.

At the top of the chain is the application itself. This is where user input is being handled and where in the context of a game the simulation is being executed. From a technical perspective, it is the application that is the first arbitrator for game smoothness; applications are responsible for adjusting the simulation rate in order to keep the flow of frames smooth. If the application cannot ensure an even rate, then nothing else that follows will really matter.

The reality of course is that this is harder than it sounds. It is not an insurmountable problem, but PCs are devices with a wide spectrum of performance and capabilities. A dual-core processor with an iGPU performs very different from a hex-core processor with a small army of GPUs, and an application needs to be able to accommodate this so that the simulation operates as evenly as possible in both CPU and GPU-bottlenecked scenarios.

Ultimately any timing model is going to be reactive, adjusting itself in response to prior events and how long previous frames took to render. Though another option is to shortcut this process entirely and operate at a fixed (or capped) simulation rate, either basing a game around 30Hz/60Hz operation, or decoupling rendering from the simulation entirely. Anyone who has uncapped id Software’s Rage for example will find that the game simply does not behave correctly without its 60Hz cap.

Static or dynamic, once a simulation has a suitable timing model in place we can then begin to look further down the chain, which is where we first encounter Direct3D, Windows’ primary 3D rendering API. Direct3D is nothing short of an enormous, complex structure of API calls and features. We tend to reduce it to version numbers and marque features for the sanity of ourselves and our readers – as we will here – but it goes without saying that Direct3D takes years to master; and for a GPU manufacturer it’s made all the more complex by the simultaneous existence of the modern iteration of Direct3D (DX10+), and the classic iteration that is DX9 and its predecessors.

For the purpose of the rendering pipeline Direct3D has a few different jobs. First and foremost, it is collecting draw calls from the application, combining them, and processing them for further work. Once a complete frame’s worth of draw calls has been collected, Direct3D passes its processed work over to the first component of the video card driver stack, the User Mode Driver (UMD).

It’s the UMD that is primarily responsible for taking the output of Direct3D and turning it into work batches the GPU can handle. These work batches, command buffers (aka Display Lists), are collections of instructions and data suitable for processing by the target GPU. Among other things, the UMD is responsible for shader compilation and assigning rendering elements to the correct (and best) surface formats for the GPU.


A logical view of single command buffer; from Microsoft's Direct3D documentation

When the UMD’s work is complete, it passes its command buffer back over to Direct3D. Direct3D in turn passes that command buffer to the context queue, our first real bottleneck. We’ll get back to why this is a bottleneck in a bit, but briefly, the context queue is responsible for queuing the individual command buffers in order to smooth out the rendering process. Queuing command buffers at this stage increases frame rendering latency, but by providing a buffer of buffers it allows the rendering pipeline to absorb any variances in rendering time or simulation time to more smoothly render frames.

The context queue has also gone by other names over the years, such as the flip queue and the pre-rendered frames queue. This is the source of the 3 frame render-ahead limit in Windows that is sometimes exposed in games and drivers, as Windows will by default queue up to 3 frames in this manner. This can be controlled by application developers, but most will leave it at 3 so long as a game is smoothly moving along.

Beyond the context queue we have Windows’ GPU scheduler, which is what regulates the popping of command buffers off of the context queue to be fed to the kernel mode GPU driver (KMD). Beyond this point the rest of the pipeline is rather simple, with the KMD taking the command buffer and feeding it to the GPU, all the while the KMD and GPU work together to manage the operation of the GPU. When a frame is finally completed, the GPU generates an interrupt to inform the KMD and OS about the completion.

At the end of this process we have a rendered frame sitting in the GPU’s back-buffer, but the frame itself is not displayed automatically. At the end of a batch of command buffers – effectively making the beginning and ends of frames – is the Direct3D Present() call. Present is the command that is responsible for telling the GPU to flip the back buffer to the front and to present the rendered frame to the user. Only once the Present call executes does a frame get displayed. The Present call, though not a command buffer object, still follows the same rendering path as the command buffers, including queuing up in the Context Queue.

Introduction Just What Is Stuttering?
Comments Locked

103 Comments

View All Comments

  • JPForums - Tuesday, March 26, 2013 - link

    Their stance wasn't "Everyone else stutters so why should we bother with it." It was closer to "Everyone else stutters and we have no more control over it that they do ... Wait, you're saying they don't stutter as bad as we do ... and fixing our stutter would've actually helped our performance. Aw, nuts." I agree with Spoelie, it was ineptitude, not ill-will.
  • extide - Wednesday, March 27, 2013 - link

    You make absolutely no sense.
  • Galidou - Saturday, March 30, 2013 - link

    Lots of people seem to think AMD does so much mistakes that everytime it happens the world speaks about it more intensely because they actually admit it instead of trying to HIDE things, they even let the sites like anandtech make reviews on their problem. Right now, at home, I have two very comparable systems based on overclocked SNB. One with a 7950, and the other with a gtx 660 ti. Both play games extremely well but I have more trouble with my 660 ti. I get lots of ''Display adapter has stopped responding'' while surfing the internet, when waking up from long idle states and when playing League of Legends. I switched drivers, did clean install and I can't get rid of it totally. No my card is not overclocked and I can make it furmark all day long without a problem.

    I decided to play on my girlfriend's computer(which has the 7950) when I play league of legends because you just can't be interrupted in this kind of game. It even froze in LoL a couple times, at first I thought it was my SSD but after reading the dump files, found out it was the 660 ti. But hey, Nvidia is perfect(we just don't see them speaking of their problems that's it)... 4 months of it now, thanks alot... I wrote on nvidia forums did my research did everything they told me. Good thing it's only with LoL, internet and long idles, everything else runs flawlessly.
  • HisDivineOrder - Tuesday, March 26, 2013 - link

    Actually, I think you should reread the article. It's true that fixing these stuttering issues has given them some frame rate improvements, but the article seemed relatively clear on the point that they were focused on frame rates and not frame latency or frame interval or whatever they're choosing to call it today.

    They had all their focus on one aspect of driver development and that actually cost them in that area because they weren't considering a more well-rounded approach.

    I'll grant you AMD was pretty inept though. This has been problem they've had for years and it's taken them this long to suss it out...
  • piroroadkill - Tuesday, March 26, 2013 - link

    Hey, what about PC Perspective: http://www.pcper.com/reviews/Graphics-Cards/Frame-...

    Looks to me as being the best way to measure it, as it uses a DVI capture card to actually capture what the user sees, forgoing any overhead software may have..
  • Rick83 - Tuesday, March 26, 2013 - link

    Yes, that is the correct way to measure frame time.
  • KikassAssassin - Tuesday, March 26, 2013 - link

    Yeah, their method looks like the best of both worlds. It gets around the limitations of FRAPS without the complexity and difficulty of using GPUView.
  • Anand Lal Shimpi - Tuesday, March 26, 2013 - link

    The ideal method would be something that gives us timestamps at both ends of the pipeline, but that's a tall order. The PCPer method is very interesting indeed... ;)
  • Mopar63 - Tuesday, March 26, 2013 - link

    First very informative article. The issue at hand is that this so called concern is based on an individuals perception. Remember we are not talking about a stuttering that was so bad as to be noticeable to all gamers. Scott basically had to make a video specifically designed to point out the issue for others to see it originally.

    Because there is no real way to quantify the personal experience we have an issue in the fact that we now have a measurement craze that is being treated as fact when it is based in the end on subjection for the final result.

    Having access to various levels of AMD and NVidia based machines I can tell you that my gaming experience across them has been pretty uniform in most cases. The cases when I had a bad experience, probably a wash as they are on both platforms.

    I think the biggest issue is we sometimes get to caught up in the technology. We let benchmarks and measurement programs dictate to us what we will get the most enjoyment from with our gaming experience. A game is not the frame rates but the play that matters. While frame rates might play a roll it is not the measurement of them that makes that part of the fun.

    At the end of the day the single best test of a video card is not a benchmark suite or tool to measure frame rendering time. The best tool is to play the games you want and see if you get the game experience you desire. Turn off the benchmark and turn on the game, that is the ONLY true test of what is best.
  • HisDivineOrder - Tuesday, March 26, 2013 - link

    Speak for yourself. I've noticed this problem between nVidia vs AMD for years. For many years, gamers have said that nVidia cards are "smoother." People didn't listen because they didn't want to hear the truth or because they were likely stuck with one high end from AMD and a low end or medium end from nVidia.

    But comparing equivalent cards, I can tell you my experience has always led inescapably to the "feeling" that the nVidia card is smoother at the same or even slightly lower frame rate.

    This just proves what I "felt" was the case was in fact really the case. If you didn't see it, then that's a fail on the part of your visual acuity or perhaps you had a bias you wanted to see, so you saw less than everything present.

    But the stutter was always there. Now even AMD admits it.

Log in

Don't have an account? Sign up now