Better AA: Dynamic Super Resolution & Multi-Frame Sampled Anti-Aliasing

On a personal note, the subject of anti-aliasing has always been near and dear to my heart. When you review video cards for a living you start to see every minor defect, and this is especially the case for jaggies and other forms of aliasing. So when new anti-aliasing modes are being introduced it is always a time of great interest.

Dynamic Super Resolution

With the launch of Maxwell 2 NVIDIA is going to be launching 2 new anti-aliasing technologies. The first of these technologies is called Dynamic Super Resolution, and it is a sort of brute force anti-aliasing method targeted at games that do not support real anti-aliasing or do not support it well.

In the case of Dynamic Super Resolution (DSR), NVIDIA achieves anti-aliasing by rendering a frame at a resolution higher than the user’s monitor (the Super Resolution of DSR), and then scaling the image back down to the monitor’s native resolution. This process of rendering at a higher resolution and then blending pixels together when the image is scaled down results in a higher quality image that is less aliased than an image rendered at a native resolution, owing to the additional detail attained from rendering at a higher resolution.

Although NVIDIA is first introducing DSR with Maxwell 2 GPUs, the technique is actually much older than that. For enthusiasts this process is better known as downsampling, and while it has been around for years it has been relatively inaccessible to the masses due to the hacky nature of unsupported downsampling, which among other things requires tweaking settings for monitors, drivers, and games all alike. As a result while NVIDIA can’t lay claim to the idea of downsampling, this is still a significant improvement in the downsampling process because downsampling is now being promoted to a first-class feature, which means it brings with it the full development backing of NVIDIA and the wider accessibility that will bring.

Of course it should also be noted that NVIDIA and enthusiasts aren’t the only parties who have been engaging in downsampling, as game developers as well have periodically been adding the feature directly to their games. Among our benchmarking suite, Battlefield 4, Company of Heroes 2, and Thief all support the equivalent of downsampling; BF4 and CoH2 allow a game to be internally rendered at a higher resolution, and Thief has SSAA modes that do the same thing. As a result there are already some games on the market that utilize downsampling/DSR, with the difference/advantage of NVIDIA’s implementation being that it makes the technique accessible to games that do not implement it on their own.

Digging a bit deeper, the image quality advantage of downsampling/DSR is that it’s fundamentally a form of Super Sample Anti-Aliasing (SSAA). By rendering an image at a higher resolution and then scaling it down, DSR is essentially sampling each pixel multiple times, improving the resulting image quality by removing geometry, texture, and shader aliasing. And like true SSAA, DSR is going to be very expensive from a rendering standpoint – you’re potentially increasing your frame resolution by 4x – but if you have the performance to spare then DSR will be worth it, and this is the basis of NVIDIA’s inclusion of DSR as a first-class feature.

Meanwhile from an image quality standpoint DSR should be a decent but not spectacular form of SSAA. Because it’s simply rendering an image at a larger size, DSR functionally uses an ordered pixel grid. For anti-aliasing purposes ordered grids are suboptimal due to the fact that near-vertical and near-horizontal geometry doesn’t get covered well, which is why true AA techniques will use rotated grids or sparse grids. None the less while DSR’s resulting sample pattern isn’t perfect it is going to be much better than the alternative of forgoing anti-aliasing entirely.



Anti-Aliasing Example: Ordered Grid vs. Rotated Grid (Images Courtesy Beyond3D)

DSR to that end can be considered a sort of last-resort method of SSAA. For games that support proper RG/SG SSAA, those anti-aliasing methods will produce superior results. However as a number of games do not support native anti-aliasing of any kind due to the use of deferred renderers, DSR provides a way to anti-alias these games that is compatible with their rendering methods.

Moving on, under the hood NVIDIA is implementing DSR as a form of high resolution rendering combined with a 13-tap Gaussian filter. In this process NVIDIA’s drivers present a game with a fake resolution higher than the actual monitor (i.e. 3840x2160 for a true 1080p monitor), and then have the game render to that higher resolution while using the Gaussian filter to blend the results down to the lower resolution. The fact that NVIDIA is using a Gaussian filter here as opposed to a simple box filter definitely raises a few eyebrows due to the potential for unwanted blurring, and this is something we will be taking a look at next week in our image quality analysis of GTX 980.

In the meantime the use of downsampling in this fashion means that DSR will have a high but less-than-perfect compatibility rate. Games that can’t render at very high resolutions will not be usable with DSR, and games that render incorrectly at those resolutions will similarly be problematic. In practice many games should be able to render at 4K-like resolutions, but some fraction of those games will not know how to scale up the UI accordingly, resulting in a final UI that is too small after the image is scaled down.


Looking at the broader picture, from a marketing and product perspective DSR is another tool for NVIDIA for dealing with console ports. Games that are ported from current-gen and last-gen consoles and don’t make significant (if any) use of newer GPU features will as a rule of thumb look little-if-any better on the PC than they do their original console. This in turn leaves more powerful GPUs underutilized and provides little incentive to purchase a PC (and an NVIDIA GPU) over said consoles. But by implementing DSR, NVIDIA and NVIDIA users can attain a leg-up on consoles by improving image quality through SSAA. And while this can’t make up for a lack of texture or model quality, it can convincingly deal with the jaggies that would otherwise be present on both the PC and the console.

With that in mind, it should be noted that DSR is primarily geared towards low DPI monitor users; 1080p, 900p, 1200p, etc. High DPI monitor users can simply run a game natively at 4K, at which point they likely won’t have much performance left over for any further anti-aliasing anyhow. Meanwhile DSR for its part will support resolution factors of between 1.2x (1.1 x 1.1) and 4x (2 x 2), allowing the resolution used to vary depending on the desired quality level and resulting performance. From a quality perspective 4x will in turn be the best factor to use, as this is the only factor that allows for potentially clean integer scaling (think Retina display). For this reason DSR also offers a smoothness control, which allows the user to control the intensity of the Gaussian filter used.

Meanwhile for end users NVIDIA will be exposing DSR at two points. DSR is currently implemented in the NVIDIA control panel, which allows for direct control of the scaling factor and the smoothness on a per-game basis. Meanwhile DSR will also be exposed in GeForce Experience, which can enable DSR for games that NVIDIA has vetted to work with the technology and are running on computers fast enough to render at these higher resolutions.

Finally, while DSR is currently limited to Maxwell 2 video cards, NVIDIA has not-so-subtly been hinting that DSR will in time be ported to NVIDIA’s previous generation cards. The technique itself does not require any special Maxwell 2 hardware and should easily work on Kepler hardware as under the hood it’s really just a driver trick. However whether Kepler cards are fast enough to use DSR with an adequate resolution factor will be another matter entirely.

Multi-Frame Sampled Anti-Aliasing

NVIDIA’s other new anti-aliasing technology for Maxwell 2 is the unfortunately named Multi-Frame sampled Anti-Aliasing. Whereas DSR was targeted at the quality segment of the market as a sort of last resort AA method for improving image quality, Multi-Frame Sampled Anti-Aliasing is targeted at the opposite end of the spectrum and is designed to be a more efficient form of MSAA that achieves similar results with half as many samples and half of the overhead.

Unlike DSR, Multi-Frame Sampled Anti-Aliasing is implemented on and requires new Maxwell 2 hardware, which is NVIDIA’s new programmable MSAA sampling pattern ability in their ROPs. This feature allows NVIIDA to dynamically alter their MSAA sample patterns, which is a key feature of Multi-Frame Sampled Anti-Aliasing, and therefore cannot easily be backported to existing hardware.

In any case, Multi-Frame Sampled Anti-Aliasing is based on the concept of changing the MSAA sample pattern in every frame, in practice using a 2x (2 sample) MSAA pattern and combining the results from multiple frames to mimic a 4x (4 sample) MSAA pattern. If it’s done right then you should receive results comparable to 4x MSAA with the cost of 2x MSAA.

Once you can grasp the concept of changing sample patterns, the idea is actually relatively simple. And in fact like DSR it has been done before in a lesser form by none other than AMD (or at the time, ATI). In 2004 with their X800 series of cards, AMD launched their Temporal Anti-Aliasing technology, which was based on the same sampling concept but importantly without any kind of frame combining/blending. Over the years Temporal AA never did see much use, and was ultimately discontinued by AMD.


Compare & Contrast: AMD's Discontinued Temporal AA

What sets Multi-Frame Sampled Anti-Aliasing apart from Temporal AA and similar efforts – and why NVIDIA thinks they will succeed where AMD failed – is the concept of temporal reprojection, or as NVIDIA calls it their temporal synthesis filter. By reusing pixels from a previous frame (to use them as pseudo-MSAA samples), the resulting frame can more closely match true 4x MSAA thanks to the presence of multiple samples. The trick is that you can’t simply reuse the entire last frame, as this would result in a much less jagged image that also suffered from incredible motion blur. For this reason the proper/best form of temporal reprojection requires figuring out which specific pixels to reproject and which to discard.

From an image quality standpoint, in the ideal case of a static image this would correctly result in image quality rivaling 4x MSAA. As a lack of camera motion means that the pixels being sampled never changed, the samples would line up perfectly and would fully emulate 4x MSAA. However once in motion the overall image quality is going to be heavily reliant on the quality of the temporal reprojection. In the best case scenario for motion Multi-Frame Sampled Anti-Aliasing still will not perfectly match 4x MSAA, and in the worst case scenario for motion it could still result in either 2x MSAA-like anti-aliasing, significant blurring, or even both outcomes.

Multi-Frame sampled Anti-Aliasing also has one other catch that has to be accounted for, and that’s frame rates. At low framerates – below 30fps – the time between frames grows so large that temporal reprojection would become increasingly inaccurate and the human eyes would pick up on the sample pattern changes, which means that this anti-aliasing technique is only usable with high frame rates. Importantly this is actually one of the benefits of Multi-Frame sampled Anti-Aliasing, as the lower overhead of a 2x sample pattern makes it easier to maintain higher framerates.

For what it’s worth, while NVIDIA is the first GPU vendor to implement temporal AA with temporal reprojection in their drivers, they are not the first individual overall. Over the years a few different game engines have implemented AA with temporal reprojection, the most notable of which is Crytek’s CryEngine 3. In Crysis 3 temporal reprojection was implemented as part of the SMAA anti-aliasing technique. The result was effective at times, but SMAA does result in some blurring, though this is difficult to separate from the effects of morphological filtering in SMAA. In any case the point is that while we will reserve our final comments for our evaluation of Multi-Frame sampled Anti-Aliasing, we are expecting that it will result in some degree of blurring compared to the 4x MSAA it is emulating.

Moving on, while Multi-Frame sampled Anti-Aliasing can potentially be used in a number of scenarios there are two specific scenarios NVIDIA will be targeting with the technology, both of which are performance-critical situations. The first of which is 4K gaming, where the strain of 8 million pixels alone leaves little room for anti-aliasing. In this case Multi-Frame sampled Anti-Aliasing can be enabled for a relatively low performance penalty. Meanwhile NVIDIA’s other usage scenario is VR headset gaming, where frame latency is critical and yet jaggies are highly visible. 4x MSAA is fully usable here, however the increase in frame rendering time may not be desirable, so Multi-Frame sampled Anti-Aliasing would allow for a similar quality without quite as long of an increase in frame rendering times.

In both cases Multi-Frame sampled Anti-Aliasing could be enabled at the driver level, with NVIDIA’s drivers intercepting the call for MSAA and instead providing their new anti-aliasing technique. At this point we don’t know for sure what compatibility will be like, so it remains to be seen what games it will work with. NVIDIA for their part is noting that they “plan to support […] a wide range of games” with the technology.

Wrapping things up, at this point in time while NVIDIA is publicly announcing Multi-Frame sampled Anti-Aliasing and has shown it to the press, it is not in shipping condition yet and is unavailable in NVIDIA’s current driver set. NVIDIA is still classifying it as an upcoming technology, so there is currently no set date or ETA for when it will finally be shipped to GTX 900 series owners.

Display Matters: HDMI 2.0, HEVC, & VR Direct Launching Today: GTX 980 & GTX 970
Comments Locked

274 Comments

View All Comments

  • garadante - Sunday, September 21, 2014 - link

    What might be interesting is doing a comparison of video cards for a specific framerate target to (ideally, perhaps it wouldn't actually work like this?) standardize the CPU usage and thus CPU power usage across greatly differing cards. And then measure the power consumed by each card. In this way, couldn't you get a better example of
  • garadante - Sunday, September 21, 2014 - link

    Whoops, hit tab twice and it somehow posted my comment. Continued:

    couldn't you get a better example of the power efficiency for a particular card and then meaningful comparisons between different cards? I see lots of people mentioning how the 980 seems to be drawing far more watts than it's rated TDP (and I'd really like someone credible to come in and state how heat dissipated and energy consumed are related. I swear they're the exact same number as any energy consumed by transistors would, after everything, be released as heat, but many people disagree here in the comments and I'd like a final say). Nvidia can slap whatever TDP they want on it and it can be justified by some marketing mumbo jumbo. Intel uses their SDPs, Nvidia using a 165 watt TDP seems highly suspect. And please, please use a nonreference 290X in your reviews, at least for a comparison standpoint. Hasn't it been proven that having cooling that isn't garbage and runs the GPU closer to high 60s/low 70s can lower power consumption (due to leakage?) something on the order of 20+ watts with the 290X? Yes there's justification in using reference products but lets face it, the only people who buy reference 290s/290Xs were either launch buyers or people who don't know better (there's the blower argument but really, better case exhaust fans and nonreference cooling destroys that argument).

    So basically I want to see real, meaningful comparisons of efficiencies for different cards at some specific framerate target to standardize CPU usage. Perhaps even monitoring CPU usage over the course of the test and reporting average, minimum, peak usage? Even using monitoring software to measure CPU power consumption in watts (as I'm fairly sure there are reasonably accurate ways of doing this already, as I know CoreTemp reports it as its probably just voltage*amperage, but correct me if I'm wrong) and reported again average, minimum, peak usage would be handy. It would be nice to see if Maxwell is really twice as energy efficient as GCN1.1 or if it's actually much closer. If it's much closer all these naysayers prophesizing AMD's doom are in for a rude awakening. I wouldn't put it past Nvidia to use marketing language to portray artificially low TDPs.
  • silverblue - Sunday, September 21, 2014 - link

    Apparently, compute tasks push the power usage way up; stick with gaming and it shouldn't.
  • fm123 - Friday, September 26, 2014 - link

    Don't confuse TDP with power consumption, they are not the same thing. TDP is for designing the thermal solution to maintain the chip temperature. If there is more headroom in the chip temperature, then the system can operate faster, consuming more power.

    "Intel defines TDP as follows: The upper point of the thermal profile consists of the Thermal Design Power (TDP) and the associated Tcase value. Thermal Design Power (TDP) should be used for processor thermal solution design targets. TDP is not the maximum power that the processor can dissipate. TDP is measured at maximum TCASE"

    https://www.google.com/url?sa=t&source=web&...
  • NeatOman - Sunday, September 21, 2014 - link

    I just realized that the GTX 980 has a TDP of 165 watts, my Corsair CX430 watt PSU is almost overkill!, that's nuts. That's even enough room to give the whole system a very good stable overclock. Right now i have a pair of HD 7850's @ stock speed and a FX-8320 @ 4.5Ghz, good thing the Corsair puts out over 430 watts perfectly clean :)
  • Nfarce - Sunday, September 21, 2014 - link

    While a good power supply, you are leaving yourself little headroom with 430W. I'm surprised you are getting away with it with two 7850s and not experiencing system crashes.
  • ET - Sunday, September 21, 2014 - link

    The 980 is an impressive feat of engineering. Fewer transistors, fewer compute units, less power and better performance... NVIDIA has done a good job here. I hope that AMD has some good improvements of its own under its sleeve.
  • garadante - Sunday, September 21, 2014 - link

    One thing to remember is they probably save a -ton- of die area/transistors by giving it only what, 1/32 double precision rate? I wonder how competitive in terms of transistors/area an AMD GPU would be if they gutted double precision compute and went for a narrower, faster memory controller.
  • Farwalker2u - Sunday, September 21, 2014 - link

    I am looking forward to your review of the GTX 970 once you have a compatible sample in hand.
    I would like to see the results of the Folding @Home benchmarks. It seems that this site is the only one that consistently use that benchmark in its reviews.

    As a "Folder" I'd like to see any indication that the GTX 970, at a cost of $330 and drawing less watts than a GTX 780; may out produce both the 780 ($420 - $470) and the 780Ti ($600). I will be studying the Folding @ Home: Explicit, Single Precision chart which contains the test results of the GTX 970.
  • Wolfpup - Monday, September 22, 2014 - link

    Wow, this is impressive stuff. 10% more performance from 2/3 the power? That'll be great for desktops, but of course even better for notebooks. Very impressed they could pulll off that kind of leap on the same process!

    They've already managed to significantly bump up the top end mobile part from GTX 680 -> 880, but within a year or so I bet they can go quite a bit higher still.

    Oh well, it was nice having a top of the line mobile GPU for a while LOL

    If 28nm hit in 2012 though, doesn't that make 2015 its third year? At least 28nm seems to be a really good process, vs all the issues with 90/65nm, etc., since we're stuck on it so long.

    Isn't this Moore's Law hitting the constraints of physical reality though? We're taking longer and longer to get to progressively smaller shrinks in die size, it seems like...

    Oh well, 22nm's been great with Intel and 28's been great with everyone else!

Log in

Don't have an account? Sign up now