Original Link: http://www.anandtech.com/show/6862/fcat-the-evolution-of-frame-interval-benchmarking-part-1
FCAT: The Evolution of Frame Interval Benchmarking, Part 1by Ryan Smith on March 27, 2013 9:00 AM EST
In the last year, stuttering, micro-stuttering, and frame interval benchmarking have become a very big deal in the world of GPUs, and for good reason. Through the hard work of the Tech Report’s Scott Wasson and others, significant stuttering issues were uncovered involving AMD’s video cards, breaking long-standing perceptions on stuttering, where the issues lie, and which GPU manufacturer (if anyone) does a better job of handling the problem. The end result of these investigations has seen AMD embarrassed and rightfully so, as it turned out they were stuttering far worse than they thought, and more importantly far worse than NVIDIA.
The story does not stop there however. As AMD has worked on fixing their stuttering issues, the methodologies pioneered by Scott have gone on to gain wide acceptance across the reviewing landscape. This has the benefit of putting more eyes on the problem and helping AMD find more of their stuttering issues, but as it turns out it has also created some problems. As we laid out in detail yesterday in a conversation with AMD, the current methodologies rely on coarse tools that don’t have a holistic view of the entire rendering pipeline. And as such while these tools can see the big problems that started this wave of interest, their ability to see small problems and to tell apart stuttering from other issues is very limited. Too limited.
In their conversation AMD laid out their argument for a change in benchmarking. A rationale for why benchmarking should move from using tools like FRAPS that can see the start of the rendering pipeline, and towards other tools and methods that can see the end of the rendering pipeline. And AMD was not alone in this; NVIDIA too has shown concern about tools like FRAPS, and has wanted to see testing methodologies evolve.
That brings us to this week. Often evolution is best left to occur naturally. But other times evolution needs a swift kick in the pants. This week NVIDIA has decided to give evolution that swift kick in the pants. This week NVIDIA is introducing FCAT.
FCAT, the Frame Capture Analysis Tool, is NVIDIA’s take on what the evolution of frame interval benchmarking should look like. By moving the measurements of frame intervals from the start of the rendering pipeline to the end of the pipeline, FCAT evolves the state of benchmarking by giving reviewers and consumers alike a new way to measure frame intervals. A year and a half ago the use of FRAPS brought a revolution to the 3D game benchmarking scene, and today NVIDIA seeks to bring about that revolution all over again.
FCAT is a powerful, insightful, and perhaps above all else labor intensive tool. For these reasons we are going to be splitting up our coverage on FCAT into two parts. Between trade shows and product launches we simply have not had enough time to put together a complete and proper dataset for FCAT, so rather than to do this poorly, we’re going to hold back our results until we’ve had a chance to run all of the FCAT tests and scenarios that we want to run
In part one of our series on FCAT, today we will be taking a high-level overview of FCAT. How it works, why it’s different from FRAPS, and why we are so excited about this tool. Meanwhile next week will see the release of part two of our series, in which we’ll dive into our FCAT results, utilizing FCAT to its full extent to look at where FCAT sees stuttering and under what conditions. So with that in mind, let’s dive into FCAT.
Reprise: When FRAPS Isn’t Enough
Since we covered the subject of FRAPS in great detail yesterday, we’re not going to completely rehash it. But for those of you who have not had the time to read yesterday’s article, here’s a quick rundown of how FRAPS measures frame intervals, and why at times this can be insufficient.
Direct3D (and OpenGL) uses a complex rendering pipeline that spans several different mechanisms and stages. When a frame is generated by an application, it must travel through the pipeline to Direct3D, the video drivers, a frame queue (the context queue), a GPU scheduler, the video drivers again, the GPU, and finally after that a frame can be displayed. The pipeline analogy is used here because that’s exactly what it is, with the added complexity of the context queue sitting in the middle of that pipeline.
FRAPS for its part exists at almost the very beginning of this pipeline. It interfaces with individual applications and intercepts the Present calls made to Direct3D that mark the end of each frame. By counting Present calls FRAPS can easily tell how many frames have gone into the pipeline, making it a simple and effective tool for measuring average framerates.
The problem with FRAPS as it were, is that while it can also be used to measure the intervals between frames, it can only do so at the start of the rendering pipeline, by counting the time between Present calls. This, while better than nothing, is far removed from the end of the pipeline where the actual buffer swaps take place, and ultimately is equally removed from the end-user experience. Furthermore because FRAPS is so far up the rendering pipeline, it’s insulated from what’s going on elsewhere; the context queue in particular can hold up to 3 frames, which means the rate of flow into the context queue can at times be very different from the rate of flow outside of the context queue.
As a result FRAPS is best descried as a coarse tool. It can see particularly egregious stuttering situations – like what AMD has been experiencing as of late – but it cannot see everything. It cannot see stuttering issues the context queue hides, and it’s particularly blind to what’s going on in multi-GPU scenarios.
In our comprehensive look at stuttering and FRAPS, we laid out what our ideal method would be for measuring frame intervals. Ideally we would like to be able to tag a frame from the start of the rendering pipeline to the end, comparing frames as they come in and out of the rendering pipeline by time stamping frames and then comparing the intervals in those time stamps to the intervals between the frames at the end of the rendering pipeline when they are displayed. Ideally, these two intervals would match up (or be close enough), with the simulation time between frames coming at an even pace, and the frame interval itself coming at an even pace.
Of course in the real world this isn’t quite impossible, but it’s highly impractical due to the fact that it requires the participation and assistance of the application itself to write the time stamps (by the time draw calls are being made, it’s too late). In lieu of that, simply being able to look at the end of the rendering pipeline would be a major benefit. After all, the end of the rendering pipeline is where frame swaps actually happen, and it is the position in the rendering pipeline that best describes what the user is seeing. If FRAPS isn’t enough because it can only see the start of the rendering pipeline, then the logical next step is to look at the end of the rendering pipeline instead.
This brings us to the subject of today’s article, FCAT, the Frame Capture Analysis Tool.
As we mentioned in our look at stuttering yesterday, as it turns out both NVIDIA and AMD agree with the fundamental problem of trying to judge frame intervals from the start of the rendering pipeline. For the past couple of years NVIDIA has been working on an alternative tool to measure frame latency at the end of the rendering pipeline, and at long last they are releasing this tool to reviewers and the public. This tool is FCAT.
So what is FCAT? FCAT is essentially a collection of tools, but at its most fundamental level FCAT is a simple, yet ingenious method to measure frame latency at the end of the rendering pipeline. Rather than attempting to tap into the video drivers themselves – a process inherently fraught with problems if you’re intending to do it in a vendor-neutral manner that works across all video cards – through FCAT NVIDIA can do true frame analysis, capturing individual frames and looking at them to determine when a buffer swap occurred, and in turn using that to measure the frame interval.
How FCAT Works
So how does FCAT work? FCAT is essentially a 2 part solution. We’ll dive into greater detail on this in part 2 of our FCAT article, but in summary, due to the inner-workings of video cards, monitors, and PC capture cards, both monitors and PC capture cards work at fixed intervals. Regardless of the frame rate an application is running at, most PC LCD monitors operate at a 60Hz refresh interval. In the case of v-sync this means buffer swaps are synchronized with the refresh interval (which among other things caps the framerate at 60fps), but when v-sync is disabled, buffer swaps can occur in the middle of a refresh. As a result any given refresh interval can be composed of multiple frames. This makes it possible to display well over 60fps on what’s otherwise a 60Hz monitor, with the end result being that multiple frames can be in one refresh interval.
PC capture cards work on the same principle, and just as how a monitor would refresh at 60Hz a PC capture card will capture at 60Hz. The end result being that while a PC capture card can’t see more than 60 whole frames, it can see parts of those frames, and being able to see parts of frames is good enough. In fact it sees the same parts of those frames that a user would see, since the 60Hz refresh rate on a monitor causes the same effect.
Ultimately by capturing frames and analyzing them, it is possible to tell how many frames were delivered in any given refresh interval, and furthermore by counting the time between those partial frames and comparing it to the refresh interval, it is possible to compute just how long the frame interval was and how long any individual frame was visible.
Of course doing this on a raw game feed would be difficult in the best of situations. As a simple thought experiment, consider a game where the player isn’t moving. If nothing changes in the image, how is one to be able to tell if a new frame has been delivered or not?
The solution to this is in the first-half of FCAT, the overlay tool. The overlay tool at its most basic level is a utility that color-codes each frame entering the rendering pipeline. By tagging frames with color bars, it is possible to tell apart individual frames by looking at the color bars. Regardless of the action on the screen (or lack thereof), the color bars will change with each successive frame, making each frame clear and obvious.
On a technical level, the FCAT overlay tool ends up working almost identically to video game overlays as we see with FRAPS, MSI Afterburner, and other tools that insert basic overlays into games. In all of these cases, these tools are attaching themselves to the start of the rendering pipeline, intercepting the Present call, adding their own draw commands for their overlay, and then finally passing on the Present call. The end result is that much like how FRAPS is able to quickly and simply monitor framerates and draw overlays, the FCAT overlay tool is able to quickly insert the necessary color bars, and to do so without ever touching the GPU or video drivers.
With the frames suitably tagged, the other half of the FCAT solution comes into play, the extractor tool. By using a PC capture card, the entire run of a benchmark can be captured and recorded to video for analysis. The extractor tool in turn is what’s responsible for looking at the color bars the overlay tool inserts, parsing the data from a video file to find the individual frames and calculate the frame intervals. Though not the easiest thing to code, conceptually this process is easy; the tool is merely loading a frame, analyzing each line of the color bar, finding the points where the color bar changes, and then recording those instances.
This ultimately results in a Tab Seperated Values file that contains a list of frames, when they occurred, the color bar they were attached to, and more. From here it is possible to then further process the data to calculate the frame intervals.
The end result of this process is that through the use of marking frames, capturing the output of a video card, and then analyzing that output, it is possible to objectively and quantitatively measure the output of a video card as an end-user would see it. This process doesn’t answer the subjective questions for us – mainly, how much stutter is enough to be noticed – but it gives us numbers that we can use to determine those answers ourselves.
Finally, for the purposes of this article we’ll be glossing over the analysis portion of FCAT, but we’ll quickly mention it. Along with the overlay and extractor tools, FCAT also includes a tool to analyze the output of the extractor tool, from which it can generate graphs, identify so-called “runt” frames, and more. The analysis tool is not strictly necessary to use FCAT – one can always do their own analysis – but the analysis tool does simplify the use of the suite by quickly and conveniently handling that last step of the process. We’ll get into the analysis tool in much greater detail in part 2 of our article, where we can apply it to our full suite of test results to better understand what it looks for and what it’s representing.
More To Come
While we were unable to complete our work with FCAT ahead of NVIDIA’s embargo, we wanted to provide an article that at least gives a brief overview of FCAT, as FCAT is in many ways itself part two of a process we started yesterday with our article and analysis of stuttering on AMD cards.
FCAT, we believe, is the next evolution of frame interval benchmarking. Where FRAPS' coarse nature does not suffice, FCAT provides a clear picture of what’s happening at the end of the rendering pipeline, giving us for the first time an automated, quantitative look at frame intervals, stuttering, and more. To be clear it is by no means a perfect tool, but as we have taken the time to lay out yesterday and today, compared to the beginning of the rendering pipeline, it is the end of the rendering pipeline that is more meaningful both for quantitative analysis, and ultimately for the users.
Speaking more directly however, FCAT is quite simply the frame interval analysis tool we have long wanted. It is the tool that will enable us to analyze stuttering, micro-stuttering, and more, in a manner consistent with our benchmarking methods and core beliefs in the scientific method. It’s exceedingly rare that we say this, but we haven’t been this excited by a new benchmarking tool in a very long time.
Wrapping things up, we will be following up this article next week with part 2 in our look at FCAT. In part 2 we will go into further detail about how to analyze the results FCAT generates, and what we’re finding across a range of video cards and games, both in single-GPU and multi-GPU configurations. So until then, stay tuned.