NVIDIA has always been the underdog when it comes to video processing features on its GPUs. For years ATI had dominated the market, being the first of the two to really take video decode quality and performance into account on its GPUs. Although now defunct, ATI maintained a significant lead over NVIDIA when it came to bringing TV to your PC. ATI's All-in-Wonder series offered a much better time shifting/DVR experience than anything NVIDIA managed to muster up, usually too late on top of that. Obviously these days most third party DVR applications have been made obsolete by the advent of Microsoft's Media Center 10-ft UI, but when the competition was tough, ATI was truly on top.

While NVIDIA eventually focused on more than just 3D performance with its GPUs, NVIDIA always seemed to be one step behind ATI when it came to video processing and decoding features. More recently, ATI was first to offer H.264 decode acceleration on its GPUs at the end of 2005.

NVIDIA has remained mostly quiet throughout much of ATI's dominance of the video market, but for the first time in recent history, NVIDIA actually beat ATI to the punch on implementing a new video related feature. With the launch of its GeForce 8600 and 8500 GPUs, NVIDIA became the first to offer 100% GPU based decoding of H.264 content. While we can assume that ATI will offer the same in its next-generation graphics architecture, the fact of the matter is that NVIDIA was first and you can actually buy these cards today with full H.264 decode acceleration.

We've taken two looks at 3D gaming performance of NVIDIA's GeForce 8600 series and came away relatively unimpressed, but for those interested in watching HD-DVD/Blu-ray content on their PCs does NVIDIA's latest mid-range offering have any redeeming qualities?

Before we get to the performance tests, it's important to have an understanding of what the 8600/8500 are capable of doing and what they aren't. You may remember this slide from our original 8600 article:

The blocks in green illustrate what stages in the H.264 decode pipeline are now handled completely by the GPU, and you'll note that this overly simplified decode pipeline indicates that the GeForce 8600 and 8500 do everything. Adding CAVLC/CABAC decode acceleration was the last major step in offloading H.264 processing from the host CPU, and it simply wasn't done in the past because of die constraints and transistor budgets. As you'll soon see, without CAVLC/CABAC decode acceleration, high bitrate H.264 streams can still eat up close to 100% of a Core 2 Duo E6320; with the offload, things get far more reasonable.

The GeForce 8600 and 8500 have a new video processor (that NVIDIA is simply calling VP2) that runs at a higher clock rate than its predecessor. Couple that with a new bitstream processor (BSP) to handle CAVLC/CABAC decoding, and these two GPUs can now handle the entire H.264 decode pipe. There's a third unit that wasn't present in previous GPUs that has made an appearance in the 8600/8500 and that is this AES128 engine. The AES128 engine is simply used to decrypt the content sent from the CPU as per the AACS specification, which helps further reduce CPU overhead.

Note that the offload NVIDIA has built into the G84/G86 GPUs is hardwired for H.264 decoding only; you get none of the benefit for MPEG-2 or VC1 encoded content. Admittedly H.264 is the more strenuous of the three, but given that VC1 content is still quite prevalent among HD-DVD titles it would be nice to have. Also note that as long as your decoder supports NVIDIA's VP2/BSP, any H.264 content will be accelerated. For MPEG-2 and VC1 content, the 8600 and 8500 can only handle inverse transform, motion compensation and in-loop deblocking and the rest of the pipe is handled by the host CPU; VP1 NVIDIA hardware only handles motion compensation and in-loop deblocking. ATI's current GPUs can handle inverse transform, motion compensation and in-loop deblocking, so they should in theory have lower CPU usage than the older NVIDIA GPUs on this type of content.

It's also worth noting that the new VP2, BSP and AES128 engines are only present in NVIDIA's G84/G86 GPUs, which are currently only used on the GeForce 8600 and 8500 cards. GeForce 8800 owners are out of luck, but NVIDIA never promised this functionality to 8800 owners so there are no broken promises. The next time NVIDIA re-spins its high end silicon we'd expect to see similar functionality there, but we're guessing that it won't be for quite some time.

The Applications
Comments Locked

64 Comments

View All Comments

  • Spoelie - Sunday, April 29, 2007 - link

    Hi,
    I've been intrigued by the impact on video playback for a while now, and there are some questions that've been bothering me. Some of these I think can only be answered by the NVIDIA/ATi driver teams, but here goes anyway.

    Is the GPU assisted decoding, H.264 spec compliant? In essence, does it have bit identical output as the reference decoder? I was under the impression (from reading doom9) that currently no GPU assisted decoding supported deblocking, an essential part of the spec. However, this was before the release of the 8000 series, so that may have changed.

    Also YV12 being the colorspace of all mpeg codecs, for best quality, what is the best way to proceed? What are the YV12->RGB32 colorspace conversion algorithms of the video card, and do they compare to e.g. ffdshow's high precision conversion? Converting colorspace as late as possible improves cpu performance, since there are less bits to move around and process. More of this stuff: http://forum.doom9.org/showthread.php?t=106111&...">http://forum.doom9.org/showthread.php?t...&hig...

    Lastly, can't something be done about resizing quality of overlays? This should be a driver thingy of our videocards. Current resizing is some crude bicubic form that produces noticeable artifacts (stairstepping in lines and blocks in gradients and uniform colors). Well, noticeable on lcd screens, crts have a tendency to hide them. There are a lot better algorithms like spline and lanczos. Again, you can do this in ffdshow in software, but this bumps up cpu usage from ~10% to ~80%, just for having a decent resizer. Supporting this in hardware would be nice.
  • othercents - Sunday, April 29, 2007 - link

    Did you all test this on Windows XP? I have business reasons why I can't upgrade yet and wanted to know what the performance difference was between XP and Vista with HD-DVD and if it actually works. I already have the Computer connected to my TV and a TV Tuner card, so getting the 8600 is next on my list if it works with Windows XP.

    I am also going to be interested to see if AMD/ATI has the same results with their new video cards especially since I'm not impressed with the Gaming side of the 8600 cards.

    Other
  • Bladen - Saturday, April 28, 2007 - link

    On page 5 under the second picture is this text;
    "Maximum CPU utilization is a bit higher but still less than 30%. Again, note how the 8600 GTS is slightly faster than the 8600 GT in PowerDVD."

    Yet the picture shows the GTS as having a higher CPU usage (in the first and second pics).
  • JarredWalton - Saturday, April 28, 2007 - link

    Sorry, bad edit by me. I made a text addition and after the results in Yozakura my brain didn't register that the GTS was higher CPU this time.
  • kelmerp - Saturday, April 28, 2007 - link

    Will there be, or is there now an AGP version of this board? I have an old Athlon XP 3200, with a Geforce 6600GT card acting as my main HTPC. I can play most HD materials, except it can get a little choppy every now and then, and I can't play back 1080p material. I'm curious how upgrading to this card would afect my setup.

    Thank you, and keep up the good work.
  • irusun - Sunday, April 29, 2007 - link

    I'd also like to see more testing with older CPUs with the upcoming review of the 8500.

    Many people use this kind of card with an *older* PC for their HTPC setup. With one of these cards, how low can you go on the CPU and still be able to play back HDTV smoothly? Could you be watching an HDTV movie and recording HDTV programming at the same time? That kind of info would be really informative to see in a review!

    Thanks, and keep up the good work.
  • phusg - Saturday, April 28, 2007 - link

    Any chance this technology will work over the AGP bus? Can you tell us how much bandwidth is being used during the H.264 decoding?

    I've been looking at the ATI/AMD X1950 for my AGP HTPC, but in several reviews of the AGP cards there were problems with the H.264 decoding.

    I've also tried a legal copy of the CoreAVC software codec, but that isn't the solution I was hoping it would be, it seems fairly buggy (at least on my aging 2GHz Athlon XP system).

    I think a AGP 8500 would be a very popular upgrade amoungst AGP HTPC owners like myself.
  • DerekWilson - Saturday, April 28, 2007 - link

    NVIDIA has confirmed to us that AGP8x does not have enough bandwidth to handle H.264 content.

    The problem isn't total bandwidth, or even up/down stream bandwidth as I understand it.

    The problem is that there is a need for 2 way communication between the GPU and the CPU during H.264 playback, and the AGP bus must stop down stream communication in order to allow up stream communication. The frequency with which the GPU needs to talk up stream causes too much latency and reduces effective bandwidth.

    At least, this is what I got from my convo with NVIDIA ...
  • JarredWalton - Saturday, April 28, 2007 - link

    At this point it's all hypothetical. Until NVIDIA or AMD releases an AGP card/GPU with full H.264 decoding support, we really can't say how it will perform. It seems possible that the technology might actually make use of the upstream (i.e. GPU to RAM/CPU) bandwidth more, in which case PCIe might actually be a requirement to get acceptable performance.
  • kmmatney - Saturday, April 28, 2007 - link

    Funny how ATI does worse with the hardware decode. Is the cpu utilization a truly meaningful figure? For instance, I can run SETI on my system, and it will use 100% of the cpu, but it will also give up any cpu power as soon as any other program needs it, as its all low priority. It seems like the best way to figure out the true resources needed would be to run another benchmark while decoding a movie.

Log in

Don't have an account? Sign up now