AMD Comments on GPU Stuttering, Offers Driver Roadmap & Perspective on Benchmarking

Name: AMD Comments on GPU Stuttering, Offers Driver Roadmap & Perspective on Benchmarking
Item: AMD Comments on GPU Stuttering, Offers Driver Roadmap & Perspective on Benchmarking
Author: Ryan Smith

by Ryan Smith on March 26, 2013 2:28 AM EST

Posted in
GPUs
AMD

103 Comments | Add A Comment

103 Comments

The Tools of the Trade: FRAPS & GPUView

Now that we have a basic understanding of the rendering pipeline and just what stuttering is, it’s time to talk about the tools that are commonly used to measure these issues. We’ll start with FRAPS, both because FRAPS is well understood by many of our readers and because FRAPS is what brought stuttering to the forefront of review sites in the first place.

AMD, quite bluntly, has a problem with how FRAPS is being used in some cases. To be clear here FRAPS is a wonderful tool, and without it we would be unable to include a number of different games in our hardware reviews. AMD’s problem with FRAPS is not its existence, what it does, or even how it does things. AMD’s problem with FRAPS comes down how it’s interpreted.

To get to that problem, we’re going to have to take a look at how FRAPS measures framerates. Going back to our diagram of the rendering pipeline, FRAPS hooks into the pipeline very early, at the application stage.

By injecting its DLL into the application, FRAPS then serves to intercept the Direct3D Present call as it’s being made to Direct3D. From here FRAPS can then delay the call for a split second to insert the draw commands to draw its overlay, or FRAPS can simply move on. When it comes to measuring framerates and frametimes what FRAPS is doing is to measure the Present calls. Every time it sees a new present call get pushed out, it counts that as a new frame, does any necessary logging, and then passes that Present call on to Direct3D.

This method is easy to accomplish and works with almost any application, which is what makes FRAPS so versatile. When it comes to measuring the average FPS over a benchmark run for example, FRAPS is great because every Present call it sees will eventually end up triggering a frame to be displayed. The average framerate is merely the number of Present calls FRAPS sees, divided by how long FRAPS was running for.

The problem here is not in using FRAPS to measure average framerates over the run of a benchmark, but rather when it comes to using FRAPS to measure individual frames. FRAPS is at the very start of the rendering pipeline; it’s before the GPU, it’s before the drivers, it’s even before Direct3D and the context queue. As such FRAPS can tell you all about what goes into the rendering pipeline, but FRAPS cannot tell you what comes out of the rendering pipeline.

So to use FRAPS in this method as a way of measuring frame intervals is problematic. Considering in particular that the application can only pass off a new frame when the context queue is ready for it, what FRAPS is actually measuring is the very start of the rendering pipeline, which not unlike a true pipe is limited by what comes after it. If the pipeline is backed up for whatever reason (context queue, drivers, etc), then FRAPS is essentially reporting on what the pipeline is doing, and not the frame interval on the final displayed frames. Simply put, FRAPS cannot tell you the frame interval at the end of the pipeline, it can only infer it from what it’s seeing.

AMD’s problem then is twofold. Going back to our definitions of latency versus frame intervals, FRAPS cannot measure “latency”. The context queue in particular will throw off any attempt to measure true frame latency. The amount of time between present calls is not the amount of time it took a frame to move through the pipeline, especially if the next Present call was delayed for any reason.

AMD’s second problem then is that even when FRAPS is being used to measure frame intervals, due to the issues we’ve mentioned earlier it’s simply not an accurate representation of what the user is seeing. Not only can FRAPS sometimes encounter anomalies that don’t translate to the end of the rendering pipeline, but FRAPS is going to see stuttering that the user cannot. It’s this last bit that is of particular concern to AMD. If FRAPS is saying that AMD cards are having more stuttering – even if the user cannot see it – then are AMD cards worse?

To be clear here the goal is to minimize stuttering throughout, and in a bit we’ll see how AMD is doing that and why it was a problem for them in the first place. But AMD is concerned about FRAPS being used in this manner because it can present data that makes stuttering look worse than it is. And in what’s a very human reaction, people pay more attention to bad news than good news; bad data more than good data. Or more simply put, it’s very easy to look at the data FRAPS produces and to see a problem that does not exist. FRAPS doesn’t just lack a good view of the rendering pipeline, but FRAPS data alone doesn’t provide context to decide what data matters and what does not.

Ultimately due to its mechanisms FRAPS is too coarse grained. It doesn’t have a complete picture of the rendering pipeline, and it’s taking readings from the wrong point in the rendering pipeline. In an ideal world we would like to be able to watch a frame in flight from the start to the end; to see what millisecond of a game simulation a frame is from, and to compare that against the frame intervals of successive frames. Baring that we would at least like to see the frame interval at the end of the rendering pipeline where the user is seeing the results, and unfortunately FRAPS can’t do that either.

Adding weight to the whole matter is the fact that FRAPS is one of the few things both AMD and NVIDIA can agree on. In our talks with NVIDIA and in past statements made to the press, NVIDIA dislikes FRAPS being used in this manner for roughly the same reason. The fact that it’s measuring Present calls instead of the time a frame is actually shown to the user impacts them just as well, and muddles the picture when it comes to trying to differentiate themselves from AMD. Again, not to say that NVIDIA thinks FRAPS is a bad tool, but there seems to be a general agreement with AMD’s stance that beyond a certain point it’s the wrong tool for measuring stuttering.

For our part, when we first went into our meeting with AMD we were expecting something a little more standoffish on the matter of FRAPS. Instead what we found was that we were in agreement on the same issues for the same reasons. As you, our readers are quick to point out, we do not currently do frame interval measurements. We do not do that because we do not currently have any meaningful tools to do so beyond FRAPS, for which we have known for years now about how it works and its limitations. There are tools in development that will change this, and this is something we’re hopefully going to be able to talk about soon. But in the meantime what we will tell you is the same thing AMD and NVIDIA will tell you: FRAPS is not the best way to measure frame intervals. There is a better way.

Finally, though we’ve just spent a great deal of time talking about FRAPS’ shortfalls when it comes to measuring frame intervals, we’re not going to dismiss it entirely. FRAPS may be a coarse tool, but even a coarse tool is going to catch big problems. And this is exactly what Scott Wasson and other reviewers have seen. At the very start of this odyssey AMD’s single-GPU frame interval problem was so bad that even FRAPS could see it. FRAPS did in fact “bust” AMD as it were, and for that AMD is even grateful. But as AMD resolves their problems and moves on to finer grained problems, the tools need to become finer grained too. And FRAPS as it currently is cannot make that jump.

GPUView

While we’ve spent most of our discussion on tools discussing FRAPS and why both AMD and NVIDIA find it insufficient, there are other tools out there. AMD and NVIDIA of course have access to far better tools than we do, and people with the knowledge to use them. This includes their internal tools, tools that are part of their respective SDKs, and other 3^rd party tools.

AMD’s tool of choice here actually comes from Microsoft, and it’s called GPUView.

GPUView is a GPU performance profiling tool, and it gives very near a top-to-bottom overview of the rendering pipeline. GPUView can see the command buffers, the Present calls, the context queue, the CPU utilization of various threads, the drivers, and more. In fact short of being able to tell us the simulation time, GPUView is the kind of massive data dump a GPU developer, programmer, or even reviewer could ever want.

The only problem with GPUView is that it’s incredibly complex. We’ve tried to use it before and we’re simply overwhelmed with the data it provides. Furthermore it still doesn’t show us when a GPU buffer swap actually takes place and the user sees a new frame, and that remains the basis of any kind of fine-grained look into stuttering. Ultimately GPUView is a tool meant for seasoned professionals and it shows.

So why bring up GPUView at all? First and foremost, it’s one of the same tools AMD is using. Understanding something about the tool they use will bring us closer to understanding how they are (or are not) identifying problems in order to fix them. The second reason is that GPUView can show us in practice what up until now we’ve discussed only in theory: where some of the bottlenecks are in the GPU rendering process that lead to stuttering.

AMD’s presentation to use included two slides on GPUView, which in turn we’re including in this article. The first slide is of Crysis 3, and in it we can see a number of frames in flight. Notably we can also see the periods where there are several idle CPU threads, showing us there is some GPU bottlenecking going on.

The second slide is of GPUView with Unigine Heaven, presenting us with a textbook situation of where the GPU is the bottleneck, as Heaven is designed from the start to be a GPU benchmark and has limited CPU usage as a result. Of note, we can see the behavior of Heaven as it waits for the context queue to open up to take another frame. Heaven runs with the standard context queue limit of 3, and we can clearly see the 3 Presents, representing the 3 frames in the queue.

Ultimately GPUView is just one of many tools, but it does give us a better idea of what’s occurring in the middle of the rendering pipeline. And in AMD’s case it’s one of the better ways to break down the rendering pipeline and track down the issues that have led to their stuttering problems.

Just What Is Stuttering? AMD & Single-GPU Stuttering: Causes & Solutions

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

103 Comments

View All Comments

mi1stormilst - Tuesday, March 26, 2013 - link
All of us will benefit from the light shed on the subject with better testing and companies paying closer attention to issues and work arounds related to the subject. Still we would not even be talking about better testing methods right now without the attention it got from The Tech Report. I look forward to more sites implementing some type of real world testing methods that results in a true user experience evaluation. I reread the article and still standby my original conclusion. The Tech Report gets credit, but rather then stopping there this article seems to attack their methodology when they themselves had already admitted that it was less then perfect. To date there are still not better tools being used for reviews and The Tech Report still got the point across with what was available. I am a huge fan for what they did over there as I could not pinpoint why my AMD experience was less than optimal. It forced me to early retire my 6950 grab a very affordable 660 OC and enjoy a much smoother game experience. This is my first nVidia card since my trusty 4200ti and I am not looking back until AMD is on par with nVidia in the stuttering department...it was literally making me motion sick )-:
SPBHM - Tuesday, March 26, 2013 - link
"holding back one frame but not another can sometimes make the frame display evenly, but from a simulation step only a few milliseconds after the previous step"

wouldn't this also happen with the single GPU "heartbeat stuttering"?
BrightCandle - Tuesday, March 26, 2013 - link
Yes it would, which is exactly the problem with the heartbeat pattern that AMD's problem causes. You can deliver the frames evenly out to the monitor but their contents has a noticeable stutter due to the graphics driver accepting the frames unevenly. The heartbeat is a sign of a real problem without a doubt, all non smooth frame time captures are. What they are not is a sign that the DVI monitor is seeing frames at those periods, but then no one ever said that was what was being measured anyway.

The best way to think about it is that this is the problem going into the pipeline, measuring the output also needs to be done to get the smoothness on the output. Only with both can you understand the impact. We have half the picture, and that half is accurately measured by fraps.
Gunbuster - Tuesday, March 26, 2013 - link
Design and launch a product. Ignore user feedback.

Did we forget about those people with $2000 laptops sporting AMD mobile card drivers that didn't work correctly for over a year due to some bug with the graphics switching MUX? This seems to be a pattern that revolves around AMD software people being wholly out of their depth, overworked, or just not caring. They don’t even seem to be able to figure out when they have a fix. The laptop GPU story here on AT was presented as AMD sending over beta drives and asking “Did we fix it this time?”
rootheday - Tuesday, March 26, 2013 - link
One minor correction to the description of the submission of commands through the stack - the DirectX runtime under Windows Vista and later does NOT accumulate a frames' worth of draw calls before sending them to the UMD. I believe it sends state and draw calls to the UMD immediately.

The UMD accumulates commands in the command buffer and flushes them to the KMD either when a present call occurs, when the command buffer is full, or when the application requests to read back the results of enqueued rendering (Map/Lock/read Query result).

It used to be true under Windows XP that the dx runtime accumulated calls and dispatched them to the driver - but that is because in XP, the driver ran in kernel mode and it was too expensive to make the user mode->kernel mode transition on every "SetState", etc call.
tynopik - Tuesday, March 26, 2013 - link
"frame latter than it would have" -> later (pg 3)
cactusdog - Tuesday, March 26, 2013 - link
As a long time ATI/AMD fan this report doesn't fill me with confidence. It appears AMD is using anandtech for their public relations spin on the stuttering issue. I don't blame anandtech for running the story, AMD's comments are newsworthy and anandtech deserves credit for being honest about AMD's intentions. On the negative side, the explanation about fraps not being an effective tool only need to be said once, it seems (by the number of times it was mentioned) that AMD's message is to make sure everyone knows Fraps its not accurate, but doesn't explain why Nvidia performs better.

On the issue, it sounds like AMD is conceding and preparing us for much of the same. No where in the explanation do they mention why Nvidia performs better in the latency tests, other than to say its not what the end user is seeing. Well I disagree, users have been complaining about stuttering for years. I just don't believe that AMD have never looked into this issue before. Also with the multi-gpu stuttering. It has been an issue since crossfire/SLI first appeared and nothing has really happened there.

Im a fan of AMD cards but I use both brands and personally I have noticed Nvidia do a better job with latency and general responsiveness in game, whereas ATI/AMD has the edge with image quality. Its subtle, and probably not something the average user notices but a lot of people do notice.. If AMD can solve this issue they would sell many more cards but by the sounds of this article, its too big and complex for them to solve completely without major work. Hence the excuses. Nvidia has to play by the same rules, the same OS etc and they do a better job at latency/stuttering, hopefully AMD can fix it enough to at least perform as good as a NVidia card.
WaltC - Tuesday, March 26, 2013 - link
"NVIDIA made a big deal about moving away from timedemos and average frame rates during the early GeForce FX (NV30) days, when its cards might have delivered a decent gaming experience but were slaughtered in most benchmarks."

Well, that's not really what happened at all...;) The chip "slaughtering" everything nVidia made in those days was the ATi R300. Seems rather strange to tell just half of that story. And the problem nVidia had with benchmarks wasn't technical--it was that nVidia was found to be actively cheating in 3dMark (camera on rails), among other cheats/shortcuts/optimizations in their drivers. The benchmarks told a story nVidia couldn't abide, and that was how much better the R300 was than anything nVidia had at the time. R300 was in every sense a revolution in the 3d gpu markets, blowing everything else away. All gpus on the market today are descended from R300 (just as all Intel and AMD x86 cpus are descended from AMD's original 64-bit Opterons.) nVidia did eventually own up to all of it, right before cancelling the nV30 after a month or two in production, however. People kept publishing proof after proof of what nVidia was doing until finally the company said "uncle." nVidia has been a better company since, imo. At least, its products are certainly better.

I'm using a single ATi gpu and over the last few years I have to say that I haven't seen any stuttering worth mentioning. Whenever I have seen stuttering it is usually due to some software condition or other, and rectified by the appropriate patch. I do appreciate your pointing out that Fraps isn't perfect and I think TR should stop pretending that it is. Fraps as you point out was never intended to measure this kind of latency and so using it to produce data other than frame-rate data is an "off-label" use of the program, imo. And also as you point out, I use vsync more often than not.

Really, though, I would loathe seeing AMD optimizing its drivers just to look better in TR's off-label Fraps usage...!...;) Let's hope that doesn't happen as I got quite a belly full of that sort of thing back in the nV30 days--enough to last me a lifetime.
beginner99 - Tuesday, March 26, 2013 - link
How can FRAPS detect any vendor-specific stuttering if it injects itself before the gpu-driver is called?
The second thing is that v-sync is just crap. I'm not a professional gamer, not even close but in certain games turning it off made me a much better player and the difference is huge. even more annoyingly it was not directly noticeable. I did not "feel" anything changed. Except that my stats were better. Tearing and stuttering: no issue for me so far.
DanNeely - Tuesday, March 26, 2013 - link
The timing at the point it's measuring is normally blocked until the queue the GPUs feeding from has an open slot?

AMD Comments on GPU Stuttering, Offers Driver Roadmap & Perspective on Benchmarking

The Tools of the Trade: FRAPS & GPUView

GPUView

Post Your Comment

103 Comments

View All Comments

mi1stormilst - Tuesday, March 26, 2013 - link

SPBHM - Tuesday, March 26, 2013 - link

BrightCandle - Tuesday, March 26, 2013 - link

Gunbuster - Tuesday, March 26, 2013 - link

rootheday - Tuesday, March 26, 2013 - link

tynopik - Tuesday, March 26, 2013 - link

cactusdog - Tuesday, March 26, 2013 - link

WaltC - Tuesday, March 26, 2013 - link

beginner99 - Tuesday, March 26, 2013 - link

DanNeely - Tuesday, March 26, 2013 - link

Log in

Don't have an account? Sign up now