AMD Comments on GPU Stuttering, Offers Driver Roadmap & Perspective on Benchmarking

Name: AMD Comments on GPU Stuttering, Offers Driver Roadmap & Perspective on Benchmarking
Item: AMD Comments on GPU Stuttering, Offers Driver Roadmap & Perspective on Benchmarking
Author: Ryan Smith

by Ryan Smith on March 26, 2013 2:28 AM EST

Posted in
GPUs
AMD

103 Comments | Add A Comment

103 Comments

The Tools of the Trade: FRAPS & GPUView

Now that we have a basic understanding of the rendering pipeline and just what stuttering is, it’s time to talk about the tools that are commonly used to measure these issues. We’ll start with FRAPS, both because FRAPS is well understood by many of our readers and because FRAPS is what brought stuttering to the forefront of review sites in the first place.

AMD, quite bluntly, has a problem with how FRAPS is being used in some cases. To be clear here FRAPS is a wonderful tool, and without it we would be unable to include a number of different games in our hardware reviews. AMD’s problem with FRAPS is not its existence, what it does, or even how it does things. AMD’s problem with FRAPS comes down how it’s interpreted.

To get to that problem, we’re going to have to take a look at how FRAPS measures framerates. Going back to our diagram of the rendering pipeline, FRAPS hooks into the pipeline very early, at the application stage.

By injecting its DLL into the application, FRAPS then serves to intercept the Direct3D Present call as it’s being made to Direct3D. From here FRAPS can then delay the call for a split second to insert the draw commands to draw its overlay, or FRAPS can simply move on. When it comes to measuring framerates and frametimes what FRAPS is doing is to measure the Present calls. Every time it sees a new present call get pushed out, it counts that as a new frame, does any necessary logging, and then passes that Present call on to Direct3D.

This method is easy to accomplish and works with almost any application, which is what makes FRAPS so versatile. When it comes to measuring the average FPS over a benchmark run for example, FRAPS is great because every Present call it sees will eventually end up triggering a frame to be displayed. The average framerate is merely the number of Present calls FRAPS sees, divided by how long FRAPS was running for.

The problem here is not in using FRAPS to measure average framerates over the run of a benchmark, but rather when it comes to using FRAPS to measure individual frames. FRAPS is at the very start of the rendering pipeline; it’s before the GPU, it’s before the drivers, it’s even before Direct3D and the context queue. As such FRAPS can tell you all about what goes into the rendering pipeline, but FRAPS cannot tell you what comes out of the rendering pipeline.

So to use FRAPS in this method as a way of measuring frame intervals is problematic. Considering in particular that the application can only pass off a new frame when the context queue is ready for it, what FRAPS is actually measuring is the very start of the rendering pipeline, which not unlike a true pipe is limited by what comes after it. If the pipeline is backed up for whatever reason (context queue, drivers, etc), then FRAPS is essentially reporting on what the pipeline is doing, and not the frame interval on the final displayed frames. Simply put, FRAPS cannot tell you the frame interval at the end of the pipeline, it can only infer it from what it’s seeing.

AMD’s problem then is twofold. Going back to our definitions of latency versus frame intervals, FRAPS cannot measure “latency”. The context queue in particular will throw off any attempt to measure true frame latency. The amount of time between present calls is not the amount of time it took a frame to move through the pipeline, especially if the next Present call was delayed for any reason.

AMD’s second problem then is that even when FRAPS is being used to measure frame intervals, due to the issues we’ve mentioned earlier it’s simply not an accurate representation of what the user is seeing. Not only can FRAPS sometimes encounter anomalies that don’t translate to the end of the rendering pipeline, but FRAPS is going to see stuttering that the user cannot. It’s this last bit that is of particular concern to AMD. If FRAPS is saying that AMD cards are having more stuttering – even if the user cannot see it – then are AMD cards worse?

To be clear here the goal is to minimize stuttering throughout, and in a bit we’ll see how AMD is doing that and why it was a problem for them in the first place. But AMD is concerned about FRAPS being used in this manner because it can present data that makes stuttering look worse than it is. And in what’s a very human reaction, people pay more attention to bad news than good news; bad data more than good data. Or more simply put, it’s very easy to look at the data FRAPS produces and to see a problem that does not exist. FRAPS doesn’t just lack a good view of the rendering pipeline, but FRAPS data alone doesn’t provide context to decide what data matters and what does not.

Ultimately due to its mechanisms FRAPS is too coarse grained. It doesn’t have a complete picture of the rendering pipeline, and it’s taking readings from the wrong point in the rendering pipeline. In an ideal world we would like to be able to watch a frame in flight from the start to the end; to see what millisecond of a game simulation a frame is from, and to compare that against the frame intervals of successive frames. Baring that we would at least like to see the frame interval at the end of the rendering pipeline where the user is seeing the results, and unfortunately FRAPS can’t do that either.

Adding weight to the whole matter is the fact that FRAPS is one of the few things both AMD and NVIDIA can agree on. In our talks with NVIDIA and in past statements made to the press, NVIDIA dislikes FRAPS being used in this manner for roughly the same reason. The fact that it’s measuring Present calls instead of the time a frame is actually shown to the user impacts them just as well, and muddles the picture when it comes to trying to differentiate themselves from AMD. Again, not to say that NVIDIA thinks FRAPS is a bad tool, but there seems to be a general agreement with AMD’s stance that beyond a certain point it’s the wrong tool for measuring stuttering.

For our part, when we first went into our meeting with AMD we were expecting something a little more standoffish on the matter of FRAPS. Instead what we found was that we were in agreement on the same issues for the same reasons. As you, our readers are quick to point out, we do not currently do frame interval measurements. We do not do that because we do not currently have any meaningful tools to do so beyond FRAPS, for which we have known for years now about how it works and its limitations. There are tools in development that will change this, and this is something we’re hopefully going to be able to talk about soon. But in the meantime what we will tell you is the same thing AMD and NVIDIA will tell you: FRAPS is not the best way to measure frame intervals. There is a better way.

Finally, though we’ve just spent a great deal of time talking about FRAPS’ shortfalls when it comes to measuring frame intervals, we’re not going to dismiss it entirely. FRAPS may be a coarse tool, but even a coarse tool is going to catch big problems. And this is exactly what Scott Wasson and other reviewers have seen. At the very start of this odyssey AMD’s single-GPU frame interval problem was so bad that even FRAPS could see it. FRAPS did in fact “bust” AMD as it were, and for that AMD is even grateful. But as AMD resolves their problems and moves on to finer grained problems, the tools need to become finer grained too. And FRAPS as it currently is cannot make that jump.

GPUView

While we’ve spent most of our discussion on tools discussing FRAPS and why both AMD and NVIDIA find it insufficient, there are other tools out there. AMD and NVIDIA of course have access to far better tools than we do, and people with the knowledge to use them. This includes their internal tools, tools that are part of their respective SDKs, and other 3^rd party tools.

AMD’s tool of choice here actually comes from Microsoft, and it’s called GPUView.

GPUView is a GPU performance profiling tool, and it gives very near a top-to-bottom overview of the rendering pipeline. GPUView can see the command buffers, the Present calls, the context queue, the CPU utilization of various threads, the drivers, and more. In fact short of being able to tell us the simulation time, GPUView is the kind of massive data dump a GPU developer, programmer, or even reviewer could ever want.

The only problem with GPUView is that it’s incredibly complex. We’ve tried to use it before and we’re simply overwhelmed with the data it provides. Furthermore it still doesn’t show us when a GPU buffer swap actually takes place and the user sees a new frame, and that remains the basis of any kind of fine-grained look into stuttering. Ultimately GPUView is a tool meant for seasoned professionals and it shows.

So why bring up GPUView at all? First and foremost, it’s one of the same tools AMD is using. Understanding something about the tool they use will bring us closer to understanding how they are (or are not) identifying problems in order to fix them. The second reason is that GPUView can show us in practice what up until now we’ve discussed only in theory: where some of the bottlenecks are in the GPU rendering process that lead to stuttering.

AMD’s presentation to use included two slides on GPUView, which in turn we’re including in this article. The first slide is of Crysis 3, and in it we can see a number of frames in flight. Notably we can also see the periods where there are several idle CPU threads, showing us there is some GPU bottlenecking going on.

The second slide is of GPUView with Unigine Heaven, presenting us with a textbook situation of where the GPU is the bottleneck, as Heaven is designed from the start to be a GPU benchmark and has limited CPU usage as a result. Of note, we can see the behavior of Heaven as it waits for the context queue to open up to take another frame. Heaven runs with the standard context queue limit of 3, and we can clearly see the 3 Presents, representing the 3 frames in the queue.

Ultimately GPUView is just one of many tools, but it does give us a better idea of what’s occurring in the middle of the rendering pipeline. And in AMD’s case it’s one of the better ways to break down the rendering pipeline and track down the issues that have led to their stuttering problems.

Just What Is Stuttering? AMD & Single-GPU Stuttering: Causes & Solutions

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

103 Comments

View All Comments

Tuvok86 - Tuesday, March 26, 2013 - link
This is great victory for all of the tech press.
When people started complaining about stuttering years ago we were only dreaming of getting so much attention from gpu brands.
I still remember someone constantly saying "micro-stuttering doesn't exist", I wonder how they feel now that they enjoy the fps and smoothness benefits.
In any case I praise constructive journalism that triggered a significant leap in the technology.
BrightCandle - Tuesday, March 26, 2013 - link
One important fact I feel is missing in your treatment of what it is fraps is measuring and why its more representative of problems than you and AMD think it is. For some reason everyone who makes this argument that fraps is isn't very useful seems to skip this one, but its really really important.

Fraps measures at the present call and that isn't a random choice. Because the present call has a few different modes of operation, but all games use blocking mode. What that means is that if the context queue is full (which it normally is) then game thread is held up waiting for that present call to complete. Subsequent present calls are regulated by the GPU's driver in this case as the thread is held and when it chooses to accept the completion of that frame only then can the games thread continue. Since Fraps is measuring this it can see when the driver is accepting frames in an uneven fashion, so while you might see even frames presented to the monitor due to the buffering there is still a knock on effect.

Game simulations produce particular frames of their simulation, sometimes in the same thread as the present call and sometimes in a different thread. Regardless they use the release of the present call as the end of their rendering step and that allows another frame to be started or delivered. So if the present calls are coming back unevenly the game simulation itself will stutter as it tries to produce as many simulation steps as the rendering is producing. If the present calls are stuttering there is a feedback loop into the game simulation that is too causing it to stutter.

Its this feedback loop on the rendering and game simulation which causes much of the problem, and it starts in the GPU driver. It might very well be caused by Windows but the big difference we see in the manufacturers solutions tells us that its almost entirely the manufacturers fault when it happens and impacts on gameplay.

So quite rightly fraps does not measure stuttering out to the screen, it measures the GPUs regulation of the frame rate of the game rendering and its simulation and that does cause real stuttering, both of the subsequent present calls and the game simulation.

Of course pcperspective have now shown that AMD's SLI stuttering out the DVI port is considerably worse than Fraps, so much so they considered what they are doing is a cheat as the frames aren't real. But you need bothperspectives, the output and the input to the pipeline to see the impact on the game. Its not just the frames themselves that have to be regulated to be smooth its also the game simulation that must run smoothly, and it is regulated by the handling of the context queue.
JPForums - Tuesday, March 26, 2013 - link
There are two things you need to keep in mind:
1) Nvidia also agrees with the limitation of FRAPS. In fact, IIRC they were the first to voice the issue that FRAPS recordings are in the wrong place and can only infer what actually needs to be recorded. The author is correct, when Ati and Nvidia agree, we should at least pay attention.

2) Though your your points are AFAIK correct and well articulated, they still point to the issue of FRAPS inferring, rather than recording the the targeted information. The difference is, rather than consistency of output frames, you are looking for consistency of simulation steps. I agree that this is a metric that really needs to be covered. In fact, I would even go as far as matching simulation steps to their corresponding frame times to expose issues when short steps are accompanied by long frames or vice versa.

Unfortunately, FRAPS can't measure any of this directly and even for your points proves to be limited to inference. That said, until a reviewer gets tools that can reveal this information, inference via FRAPS is better than no information at all. Pcperspective's comments on AMD's stuttering issues are related (as they state) to crossfire setups. I could see the differences between CF and SLI in blind tests (though SLI also has some microstutter) and this only confirms it. The runt frames only add fuel to the fire. I'm open to using AMD in single GPU builds, but only use Nvidia for multiGPU builds. Perhaps this will change in July, but I'm guessing there will still be plenty of work to do.
JPForums - Tuesday, March 26, 2013 - link
I should probably expand a little on what I consider a limitation of FRAPS for stutter caused by simulation steps. FRAPS inserts itself at the output of the render and is therefore subject to a variable delay between the simulator time step through the render. Important information can still be inferred, like simulation stutter in AMD's heartbeat waveform. However, I'd still rather get a timestamp directly at the output of the simulator rather than at the output of the renderer, if it ever becomes an option. Unfortunately, that would probably require cooperation with the game developer, so I'm not sure that will ever happen.
tipoo - Tuesday, March 26, 2013 - link
The third page makes me wonder, just how much would a real time operating system improve performance? QNX on BB10 is real time, the PS4 OS may be too.
juampavalverde - Tuesday, March 26, 2013 - link
Time to update the GPU review template guys... At least copy&paste PCPer and TechReport methods.
cjb110 - Tuesday, March 26, 2013 - link
Sounds like there's a market for a tool then, something that does what GPUView does but in simpler manner (like Fraps presents).
drbaltazar - Tuesday, March 26, 2013 - link
sadly the issue they find isn't exsactly caused by the gpu!it is at the os end!data fragmentation at various level is often the cause.and this happen everywhere,at the processor cache level to the server cache level!ms say it doesn't mather !they re wrong!it affect everything related to image quality.bufferbloat also is the main problem.mtu,udp fragmentation ,multithreading and rss fragmentation etc etc etc!oh they say they can reconstruct the data in the proper maner that wont impact performance or quality!again ms is either wrong or unknowing of the problem these various issue cause .I haven't event started on the gpu side yet!all that data manipulation etc is the main issue !how to fix it?mm!probably use official standard limit like the 1460 for mtu and add udp to that also so that it is also at 1460.(just a random exemple cause these will need to be tweaked ,why?so that packet don't get fragmented anywhere in the computer or the server.or they tell people how to make it happen ,because right now not many have 1080p quality even most have a 1080p monitor.so imagine if amd is using window idea to tweak their gpu?like .net4 etc !(yep it become a nightmare)hopefully they ll fix this but all side have been on a race for performance .(wouldn't want to sell a = performing w8 instead of w7 .it wouldn't sell!i am all for getting better performance but not at the expense of subpixel quality of graphic.nvidia is probably better because they noticed ms error and have worked to avoid the os mistake by using standard and proper ways .I aint saying ms is wrong maybe they can really fragment packet and have everything being fine and dandy looking in 1080p.but I will tell you this.in most area of computing it feels like this:os is saying 255.0.0 and at the other end for some reason its like our old phone game,at the other end what is being done isn't at all what the os said the beginning (and viceversa)hopefully these idea of new data mining and testing tool will go deeper and test what is actually going on in our computer,network and server datapath so they all can work together.cause right now?our game look 1080i even tho we are all set at 1080p
mi1stormilst - Tuesday, March 26, 2013 - link
I love you guys, but this article comes off a bit like sour grapes. The Tech Report dove into this issue head first and admitted from the beginning the testing methods may not be perfect. They have continued to be clear on this and you made no mention of the high speed video tests that they performed. The bottom line is The Tech Report is primarily responsible for getting AMD to get on the ball with this issue. Regardless of AMD's bag of excuses and their sudden clarity on the best methods for testing we would not be where we are without the sold work of The Tech Report. I feel that if the FRAPS method of testing was sufficient for bringing these issues to light then a job well done. The situation will only improve from there and Scott Wasson and company deserve more praise than this sour attempt of an article to discredit the good work they have done. If that we not your intention then I apologize, but it comes off as such.
brybir - Tuesday, March 26, 2013 - link
I did not see it this way at all. Instead, I read it as TechReport started a trend in evaluating stuttering that most were not looking for, and that while there is some merit to their methods, there are other better ways of evaluating the issue. I did not see any effort to hide, obscure, or otherwise show "sour grapes" to them for their testing.

As to the merit of the article, if AMD, Nvidia, and Anandtech folks all agree that the methods used by TechReport are okay but could be improved upon with better tools, then the end result will be better for everyone. Much as standard bench-marking software has evolved a lot over the the last decade, the bench-marking for this type of testing will change dramatically as people find interesting and new ways to really get in depth with the issue and generate data that is easy to aggregate and report. I think that is a net benefit for all of us!

AMD Comments on GPU Stuttering, Offers Driver Roadmap & Perspective on Benchmarking

The Tools of the Trade: FRAPS & GPUView

GPUView

Post Your Comment

103 Comments

View All Comments

Tuvok86 - Tuesday, March 26, 2013 - link

BrightCandle - Tuesday, March 26, 2013 - link

JPForums - Tuesday, March 26, 2013 - link

JPForums - Tuesday, March 26, 2013 - link

tipoo - Tuesday, March 26, 2013 - link

juampavalverde - Tuesday, March 26, 2013 - link

cjb110 - Tuesday, March 26, 2013 - link

drbaltazar - Tuesday, March 26, 2013 - link

mi1stormilst - Tuesday, March 26, 2013 - link

brybir - Tuesday, March 26, 2013 - link

Log in

Don't have an account? Sign up now