Software, Cont: ShadowPlay and "Reason Flags"

Along with providing the game optimization service and SHIELD’s PC client, GeForce Experience has another service that’s scheduled to be added this summer. That service is called ShadowPlay, and not unlike SHIELD it’s intended to serve as a novel software implementation of some of the hardware functionality present in NVIDIA’s latest hardware.

ShadowPlay will be NVIDIA’s take on video recording, the novel aspect of it coming from the fact that NVIDIA is basing the utility around Kepler’s hardware H.264 encoder. To be straightforward video recording software is nothing new, as we have FRAPS, Afterburner, Precision X, and other utilities that all do basically the same thing. However all of those utilities work entirely in software, fetching frames from the GPU and then encoding them on the CPU. The overhead from this is not insignificant, especially due to the CPU time required for video encoding.

With ShadowPlay NVIDIA is looking to spur on software developers by getting into video recording themselves, and to provide superior performance by using hardware encoding. Notably this isn’t something that was impossible prior to ShadowPlay, but for some reason recording utilities that use NVIDIA’s hardware H.264 encoder have been few and far between. Regardless, the end result should be that most of the overhead is removed by relying on the hardware encoder, minimally affecting the GPU while freeing up the CPU, reducing the amount of time spent on data transit back to the CPU, and producing much smaller recordings all at the same time.

ShadowPlay will feature multiple modes. Its manual mode will be analogous to FRAPS, recording whenever the user desires it. The second mode, shadow mode, is perhaps the more peculiar mode. Because the overhead of recording with the hardware H.264 encoder is so low, NVIDIA wants to simply record everything in a very DVR-like fashion. In shadow mode the utility keeps a rolling window of the last 20 minutes of footage, with the goal being that should something happen that the user decides they want to record after the fact, they can simply pull it out of the ShadowPlay buffer and save it. It’s perhaps a bit odd from the perspective of someone who doesn’t regularly record their gaming sessions, but it’s definitely a novel use of NVIDIA’s hardware H.264 encoder.

NVIDIA hasn’t begun external beta testing of ShadowPlay yet, so for the moment all we have to work from is screenshots and descriptions. The big question right now is what the resulting quality will be like. NVIDIA’s hardware encoder does have some limitations that are necessary for real-time encoding, so as we’ve seen in the past with qualitative looks at NVIDIA’s encoder and offline H.264 encoders like x264, there is a quality tradeoff if everything has to be done in hardware in real time. As such ShadowPlay may not be the best tool for reference quality productions, but for the YouTube/Twitch.tv generation it should be more than enough.

Anyhow, ShadowPlay is expected to be released sometime this summer. But since 95% of the software ShadowPlay requires is also required for the SHIELD client, we wouldn’t be surprised if ShadowPlay was released shortly after a release quality version of the SHIELD client is pushed out, which may come as early as June alongside the SHIELD release.

Reasons: Why NVIDIA Cards Throttle

The final software announcement from NVIDIA to coincide with the launch of the GTX 780 isn’t a software product in and of itself, but rather an expansion of NVIDIA’s 3rd party hardware monitoring API.

One of the common questions/complaints about GPU Boost that NVIDIA has received over the last year is about why a card isn’t boosting as high as it should be, or why it suddenly drops down a boost bin or two for no apparent reason. For technically minded users who know the various cards’ throttle points and specifications this isn’t too complex – just look at the power consumption, GPU load, and temperature – but that’s a bit much to ask of most users. So starting with the recently released 320.14 drivers, NVIDIA is exposing a selection of flags through their API that indicate what throttle point is causing throttling or otherwise holding back the card’s clockspeed. There isn’t an official name for these flags, but “reasons” is as good as anything else, so that’s what we’re going with.

The reasons flags are a simple set of 5 binary flags that NVIDIA’s driver uses to indicate why it isn’t increasing the clockspeed of the card further. These flags are:

  • Temperature Limit – the card is at its temperature throttle point
  • Power Limit – The card is at its global power/TDP limit
  • Voltage Limit – The card is at its highest boost bin
  • Overvoltage Max Limit – The card’s absolute maximum voltage limit (“if this were to occur, you’d be at risk of frying your GPU”)
  • Utilization Limit – The current workload is not high enough that boosting is necessary

As these are simple flags, it’s up to 3rd party utilities to decide how they want to present these flags. EVGA’s Precision X, which is NVIDIA’s utility of choice for sampling new features to the press, simply records the flags like it does the rest of the hardware monitoring data, and this is likely what most programs will do.

With the reason flags NVIDIA is hoping that this will help users better understand why their card isn’t boosting as high as they’d like to. At the same time the prevalence of GPU Boost 2.0 and its much higher reliance on temperature makes exposing this data all the more helpful, especially for overclockers that would like to know what attribute they need to turn up to unlock more performance.

Software: GeForce Experience, Out of Beta Our First FCAT & The Test
Comments Locked

155 Comments

View All Comments

  • just4U - Thursday, May 23, 2013 - link

    I love the fact that their using the cooler they used for the Titan. While I plan to wait (no need to upgrade right now) I'd like to see more of that.. It's a feature I'd pay for from both Nvidia and Amd.
  • HalloweenJack - Thursday, May 23, 2013 - link

    no compute with the GTX 780 - the DP is similar to a GTX 480 and way way down on a 7970. no folding on these then
  • BiffaZ - Friday, May 24, 2013 - link

    Folding doesn't use DP currently, its SP, same for most @home type compute apps, the main exclusion being Milkyway@Home which needs DP alot.
  • boe - Thursday, May 23, 2013 - link

    Bring on the DirectCU version and I'll order 2 today!
  • slickr - Thursday, May 23, 2013 - link

    At $650 its way too expensive. Two years ago this card would have been $500 at launch and within 4-5 months it would have been $400 with the slower cut down version at $300 and mid range cards $200.

    I hope people aren't stupid to buy this overpriced card that only brings about 5fps more than AMD top end single card.
  • chizow - Thursday, May 23, 2013 - link

    I think if it launched last year, it's price would have been more justified, but Nvidia sat on it for a year while they propped up mid-range GK104 as flagship. Very disappointing.

    Measured on it's own merits, GTX 780 is very impressive and probably worth the increase over previous flagship price points. For example, it's generally 80% faster than GTX 580, almost 100% faster than GTX 480, it's predecessors. In the past the increase might only be ~60-75% and improve some with driver gains. It also adds some bling and improvements with the cooler.

    It's just too late imo for Nvidia to ask those kinds of prices, especially after lying to their fanbase about GK104 always slotted as Kepler flagship.
  • JPForums - Thursday, May 23, 2013 - link

    I love what you are doing with frame time deltas. Some sites don't quite seem to understand that you can maintain low maximum frame times while still introducing stutter (especially in the simulation time counter) by having large deltas between frames. In the worst case, your simulation time can slow down (or speed up) while your frame time moves back in the opposite direction exaggerating the result.

    Admittedly I may be misunderstanding your method as I'm much more accustomed to seeing algebraic equations describing the method, but assuming I get it, I'd like to suggest further modification to you method to deal with performance swings that occur expectedly (transition to/from cut-scenes, arrival/departure of graphically intense elements, etc.). Rather than compare the average of the delta between frames against an average frame time across the entire run, you could compare instantaneous frame time against a sliding window average. The window could be large for games with consistent performance and smaller for games with mood swings. Using percentages when comparing against the average frame times for the entire run can result in situations where two graphics solutions with the exact same deltas would show the one with better performance having worse deltas. As an example, take any video cards frame time graph and subtract 5ms from each frame time and compare the two resulting delta percentages. A sliding window accounts for natural performance deviations while still giving a baseline to compare frame times swings from. If you are dead set on percentages, you can take them from there as the delta percentages from local frame time averages are more relevant than the delta percentage from the runs overall average. Given my love of number manipulation, though, I'd still prefer to see the absolute frame time difference from the sliding window average. It would make it much easier for me to see whether the difference to the windowed average is large (lets say >15ms) or small (say <4ms). Of course, while I'm being demanding, it would be nice to get an xls, csv, or some other format of file with the absolute frame times so I can run whatever graph I want to see myself. I won't hold my breath. Take some of my suggestions, all of them, or none of them. I'm just happy to see where things are going.
  • Arnulf - Thursday, May 23, 2013 - link

    The correct metric for this comparison would be die size (area) and complexity of manufacturing rather than the number of transistors.

    RAM modules contain far more transistors (at least a couple of transistors per bit, with common 4 GB = 32 Gb = 64+ billion transistors per stick modules selling for less than $30 on Newegg), yet cost peanuts compared to this overpriced abomination that is 780.
  • marc1000 - Thursday, May 23, 2013 - link

    and GTX 760 ??? what will it be? will it be $200??

    or maybe the 660 will be rebranded as 750 and go to $150??
  • kilkennycat - Thursday, May 23, 2013 - link

    Fyi: eVGA offers "Superclocked" versions of the GTX780 with either a eVGA-designed "ACX" dual-open-fan cooler, or the nVidia-designed "titan"blower. Both at $659 are ~ $10 more than the default-speed version. The overclocks are quite substantial, 941MHz base, 993MHz boost (vs default 863/902) for the "titan" blower version, 967/1020 for the ACX-cooler version. The ACX cooler is likely to be more noisy than the "titan", plus it will dump some exhaust heat back into the computer case. Both of these eVGa Superclocked types were available for a short time on Newegg this morning, now "Auto Notify" :-( :-(

Log in

Don't have an account? Sign up now