AMD & Single-GPU Stuttering: Causes & Solutions

Thus far we’ve discussed stuttering and the rendering pipeline in theory, and taken a look at an example of the rendering pipeline in practice through GPUView. With a basic understanding of those principles, we can finally get into explaining AMD’s specific situation. Why did AMD have a single-GPU stuttering problem?

The shortest answer also the bluntest answer: AMD had a stuttering problem because AMD wasn’t looking for a stuttering problem. AMD does a great deal of competitive analysis (read: seeing what NVIDIA is doing) on overall performance, but AMD was never doing competitive analysis for stuttering.

Because stuttering is such a complex issue and AMD had such great knowledge into their drivers, AMD assumed that stuttering was occurring due to the applications and the OS, things that were out of their control. Furthermore because those things were out of their control, AMD assumed that they were happening to NVIDIA and Intel GPUs too. After all, there wasn’t any kind of competitive analysis to scientifically confirm this. AMD never saw that NVIDIA cards weren’t experiencing as much stuttering, and consequently never saw that they did in fact have more control over stuttering than they first thought.

Ultimately it wasn’t until Scott Wasson and other journalists went to work with FRAPS and kept it up that it became obvious to AMD that they had a problem. FRAPS may be a coarse tool, but even it could see some of AMD’s stuttering issues.

Since that time AMD has been hard at work on fixing the issue, producing new driver builds later last year and in the first part of this year to address the issue. AMD’s latest drivers have been fixing bugs, engaging workarounds, and otherwise taking care of this issue so that they can be competitive with NVIDIA when it comes to stuttering.

There is still work to do – AMD quickly fixed their DX9 issues, while DX10 fixes are in the process of being rolled out – but in many ways this is a post-mortem on the issue rather than being an explanation of what AMD will do in the future. Not every game is fixed yet, but many are. Scott Wasson’s most recent results show an incredible improvement for AMD compared to where they were even 6 months ago.

The biggest changes AMD has made from here on out are that they’re now doing competitive analysis on stuttering and they’re explicitly looking for it in their tools, to ensure that stuttering issues don’t return (at least in as much as they are able to control). With many of the bugs and issues that lead to stuttering in the first place already fixed, AMD can use what they’ve learned to analyze future games and try to catch issues before the game is released, or at the very least fix it as quickly as possible.

AMD’s gameplan aside, there are two remaining questions on the subject that need to be dealt with. The first is what happened at a technical level to cause the stuttering in the first place, and what, if anything, can be done about the remaining stuttering.

The answer to the first question is that what went wrong depends on the game. AMD did not go into specific detail about individual games, but they did lay out the types of issues they ran across. For example, resource limits may occur in the application or the driver, triggering a stall that in turn triggers stuttering. Discarding the constant or vertex buffers too often was one such cause of this, as it would mean the driver would need to wait for one of the buffers to actually become freed up before the job could proceed.

Other times the issue was the driver itself misbehaving in a way AMD never expected. In one such case AMD’s driver was sporadically consuming far more CPU time than AMD intended, something AMD never even realized was possible. The end result was yet another block that triggers a stall that triggers stuttering.

Yet still other problems are in the application and the OS itself. As we’ve mentioned before AMD cannot fix these issues because they’re not under AMD’s control, but as it turns out AMD can effectively trick the OS and applications into behaving better. So AMD has implemented workarounds in their drivers for these application/OS issues, which doesn’t strictly fix the problem but will mitigate it.

A recurring theme in all of these issues was that they were easy to fix. It may only take AMD an hour to find the cause of a stuttering instance, and then even less time to make a driver change to deal with it. Stuttering on the whole is still a complex issue, but in AMD’s case they were easy fixes once AMD started looking for the problem.

Perhaps the most interesting thing about this entire process – and the most embarrassing thing for AMD – is not just that stuttering was occurring and they weren’t looking for it, but by not looking for stuttering they were leaving performance on the table. Stuttering doesn’t just impact the frame intervals, but many of those stalls where stuttering was occurring were also stalling the GPU entirely, reducing overall performance. One figure AMD threw around was that when they fixed their stuttering issue on Borderlands 2, overall performance had increased by nearly 13%, a very significant increase in performance that AMD would normally have to fight for, but instead exposed by an easy fix for stuttering. So AMD’s fixing their stuttering has not only resolved that issue, but in certain cases it has helped performance too.

This isn’t to say that AMD can fix all forms of stuttering. As we’ve already discussed, Windows isn’t a real time operating system and the PC platform itself is highly variable. Especially in resource constrained scenarios it’s simply not going to be possible to fix all forms of stuttering. If the CPU gets busy and the Present call from the application gets held up, then there’s nothing AMD can do other than to process it once it does arrive. This is the purpose of the context queue, to help smooth things out at the cost of some latency.

Moving on, though it’s outside the scope of this article for both a lack of time and a lack of tools, we will be looking at stuttering on AMD cards and NVIDIA cards as the necessary tools become available. AMD hasn’t fixed all of their issues yet and they waste no time admitting to it, so we will want to track their progress and see just how far along they are in bringing this issue under control.

Finally, we wanted to spend a bit more time talking about FRAPS in relation to what AMD discovered, and why FRAPS may still see issues that are not present.

The above is what AMD is calling the heartbeat pattern, and it’s something FRAPS is reporting even in some of the games they’ve fixed. This highlights one of the problems with trying to monitor frame intervals based on Present calls, as the context queue is absorbing the uneven frame dispatch, but FRAPS doesn’t realize it.

In a heartbeat situation the next Present gets delayed coming out of the application for whatever reason, which results in the rendering pipeline feeding from the context queue for a bit while nothing new comes in. Eventually the block is cleared and the application submits the next Present, at which point FRAPS records the Present as having come relatively later. Furthermore, since the context queue has been at least partially drained, there’s still room for one more frame, so rather than idling for a bit the application immediately gets to work on the next frame. As a result the next Present hits the context queue sooner than average, resulting in the early frame as picked up by FRAPS.

In this scenario, at the end of the rendering pipeline every frame could be displayed at an even pace despite the unevenness at the input, but FRAPS would never know. This doesn’t mean it’s not an issue, as uneven presents will cause the gap in time between the simulation steps to suddenly become uneven as well. But unless the heartbeat pattern occurs with high regularity or the size of the beat is enough to let the context queue drain completely, the impact from this scenario is far less than having the frames come out of the end of the rendering pipeline unevenly. Ultimately it’s another form of stuttering, but in the case of FRAPS looks far worse than it would be if we were measuring the end of the rendering pipeline and what the user was actually seeing.

The Tools of the Trade: FRAPS & GPUView AMD & Multi-GPU Stuttering: A Work In Progress
POST A COMMENT

103 Comments

View All Comments

  • Tuvok86 - Tuesday, March 26, 2013 - link

    This is great victory for all of the tech press.
    When people started complaining about stuttering years ago we were only dreaming of getting so much attention from gpu brands.
    I still remember someone constantly saying "micro-stuttering doesn't exist", I wonder how they feel now that they enjoy the fps and smoothness benefits.
    In any case I praise constructive journalism that triggered a significant leap in the technology.
    Reply
  • BrightCandle - Tuesday, March 26, 2013 - link

    One important fact I feel is missing in your treatment of what it is fraps is measuring and why its more representative of problems than you and AMD think it is. For some reason everyone who makes this argument that fraps is isn't very useful seems to skip this one, but its really really important.

    Fraps measures at the present call and that isn't a random choice. Because the present call has a few different modes of operation, but all games use blocking mode. What that means is that if the context queue is full (which it normally is) then game thread is held up waiting for that present call to complete. Subsequent present calls are regulated by the GPU's driver in this case as the thread is held and when it chooses to accept the completion of that frame only then can the games thread continue. Since Fraps is measuring this it can see when the driver is accepting frames in an uneven fashion, so while you might see even frames presented to the monitor due to the buffering there is still a knock on effect.

    Game simulations produce particular frames of their simulation, sometimes in the same thread as the present call and sometimes in a different thread. Regardless they use the release of the present call as the end of their rendering step and that allows another frame to be started or delivered. So if the present calls are coming back unevenly the game simulation itself will stutter as it tries to produce as many simulation steps as the rendering is producing. If the present calls are stuttering there is a feedback loop into the game simulation that is too causing it to stutter.

    Its this feedback loop on the rendering and game simulation which causes much of the problem, and it starts in the GPU driver. It might very well be caused by Windows but the big difference we see in the manufacturers solutions tells us that its almost entirely the manufacturers fault when it happens and impacts on gameplay.

    So quite rightly fraps does not measure stuttering out to the screen, it measures the GPUs regulation of the frame rate of the game rendering and its simulation and that does cause real stuttering, both of the subsequent present calls and the game simulation.

    Of course pcperspective have now shown that AMD's SLI stuttering out the DVI port is considerably worse than Fraps, so much so they considered what they are doing is a cheat as the frames aren't real. But you need bothperspectives, the output and the input to the pipeline to see the impact on the game. Its not just the frames themselves that have to be regulated to be smooth its also the game simulation that must run smoothly, and it is regulated by the handling of the context queue.
    Reply
  • JPForums - Tuesday, March 26, 2013 - link

    There are two things you need to keep in mind:
    1) Nvidia also agrees with the limitation of FRAPS. In fact, IIRC they were the first to voice the issue that FRAPS recordings are in the wrong place and can only infer what actually needs to be recorded. The author is correct, when Ati and Nvidia agree, we should at least pay attention.

    2) Though your your points are AFAIK correct and well articulated, they still point to the issue of FRAPS inferring, rather than recording the the targeted information. The difference is, rather than consistency of output frames, you are looking for consistency of simulation steps. I agree that this is a metric that really needs to be covered. In fact, I would even go as far as matching simulation steps to their corresponding frame times to expose issues when short steps are accompanied by long frames or vice versa.

    Unfortunately, FRAPS can't measure any of this directly and even for your points proves to be limited to inference. That said, until a reviewer gets tools that can reveal this information, inference via FRAPS is better than no information at all. Pcperspective's comments on AMD's stuttering issues are related (as they state) to crossfire setups. I could see the differences between CF and SLI in blind tests (though SLI also has some microstutter) and this only confirms it. The runt frames only add fuel to the fire. I'm open to using AMD in single GPU builds, but only use Nvidia for multiGPU builds. Perhaps this will change in July, but I'm guessing there will still be plenty of work to do.
    Reply
  • JPForums - Tuesday, March 26, 2013 - link

    I should probably expand a little on what I consider a limitation of FRAPS for stutter caused by simulation steps. FRAPS inserts itself at the output of the render and is therefore subject to a variable delay between the simulator time step through the render. Important information can still be inferred, like simulation stutter in AMD's heartbeat waveform. However, I'd still rather get a timestamp directly at the output of the simulator rather than at the output of the renderer, if it ever becomes an option. Unfortunately, that would probably require cooperation with the game developer, so I'm not sure that will ever happen. Reply
  • tipoo - Tuesday, March 26, 2013 - link

    The third page makes me wonder, just how much would a real time operating system improve performance? QNX on BB10 is real time, the PS4 OS may be too. Reply
  • juampavalverde - Tuesday, March 26, 2013 - link

    Time to update the GPU review template guys... At least copy&paste PCPer and TechReport methods. Reply
  • cjb110 - Tuesday, March 26, 2013 - link

    Sounds like there's a market for a tool then, something that does what GPUView does but in simpler manner (like Fraps presents). Reply
  • drbaltazar - Tuesday, March 26, 2013 - link

    sadly the issue they find isn't exsactly caused by the gpu!it is at the os end!data fragmentation at various level is often the cause.and this happen everywhere,at the processor cache level to the server cache level!ms say it doesn't mather !they re wrong!it affect everything related to image quality.bufferbloat also is the main problem.mtu,udp fragmentation ,multithreading and rss fragmentation etc etc etc!oh they say they can reconstruct the data in the proper maner that wont impact performance or quality!again ms is either wrong or unknowing of the problem these various issue cause .I haven't event started on the gpu side yet!all that data manipulation etc is the main issue !how to fix it?mm!probably use official standard limit like the 1460 for mtu and add udp to that also so that it is also at 1460.(just a random exemple cause these will need to be tweaked ,why?so that packet don't get fragmented anywhere in the computer or the server.or they tell people how to make it happen ,because right now not many have 1080p quality even most have a 1080p monitor.so imagine if amd is using window idea to tweak their gpu?like .net4 etc !(yep it become a nightmare)hopefully they ll fix this but all side have been on a race for performance .(wouldn't want to sell a = performing w8 instead of w7 .it wouldn't sell!i am all for getting better performance but not at the expense of subpixel quality of graphic.nvidia is probably better because they noticed ms error and have worked to avoid the os mistake by using standard and proper ways .I aint saying ms is wrong maybe they can really fragment packet and have everything being fine and dandy looking in 1080p.but I will tell you this.in most area of computing it feels like this:os is saying 255.0.0 and at the other end for some reason its like our old phone game,at the other end what is being done isn't at all what the os said the beginning (and viceversa)hopefully these idea of new data mining and testing tool will go deeper and test what is actually going on in our computer,network and server datapath so they all can work together.cause right now?our game look 1080i even tho we are all set at 1080p Reply
  • mi1stormilst - Tuesday, March 26, 2013 - link

    I love you guys, but this article comes off a bit like sour grapes. The Tech Report dove into this issue head first and admitted from the beginning the testing methods may not be perfect. They have continued to be clear on this and you made no mention of the high speed video tests that they performed. The bottom line is The Tech Report is primarily responsible for getting AMD to get on the ball with this issue. Regardless of AMD's bag of excuses and their sudden clarity on the best methods for testing we would not be where we are without the sold work of The Tech Report. I feel that if the FRAPS method of testing was sufficient for bringing these issues to light then a job well done. The situation will only improve from there and Scott Wasson and company deserve more praise than this sour attempt of an article to discredit the good work they have done. If that we not your intention then I apologize, but it comes off as such. Reply
  • brybir - Tuesday, March 26, 2013 - link

    I did not see it this way at all. Instead, I read it as TechReport started a trend in evaluating stuttering that most were not looking for, and that while there is some merit to their methods, there are other better ways of evaluating the issue. I did not see any effort to hide, obscure, or otherwise show "sour grapes" to them for their testing.

    As to the merit of the article, if AMD, Nvidia, and Anandtech folks all agree that the methods used by TechReport are okay but could be improved upon with better tools, then the end result will be better for everyone. Much as standard bench-marking software has evolved a lot over the the last decade, the bench-marking for this type of testing will change dramatically as people find interesting and new ways to really get in depth with the issue and generate data that is easy to aggregate and report. I think that is a net benefit for all of us!
    Reply

Log in

Don't have an account? Sign up now