Original Link: http://www.anandtech.com/show/2220
NVIDIA GeForce 8600: Full H.264 Decode Accelerationby Anand Lal Shimpi on April 27, 2007 4:34 PM EST
- Posted in
NVIDIA has always been the underdog when it comes to video processing features on its GPUs. For years ATI had dominated the market, being the first of the two to really take video decode quality and performance into account on its GPUs. Although now defunct, ATI maintained a significant lead over NVIDIA when it came to bringing TV to your PC. ATI's All-in-Wonder series offered a much better time shifting/DVR experience than anything NVIDIA managed to muster up, usually too late on top of that. Obviously these days most third party DVR applications have been made obsolete by the advent of Microsoft's Media Center 10-ft UI, but when the competition was tough, ATI was truly on top.
While NVIDIA eventually focused on more than just 3D performance with its GPUs, NVIDIA always seemed to be one step behind ATI when it came to video processing and decoding features. More recently, ATI was first to offer H.264 decode acceleration on its GPUs at the end of 2005.
NVIDIA has remained mostly quiet throughout much of ATI's dominance of the video market, but for the first time in recent history, NVIDIA actually beat ATI to the punch on implementing a new video related feature. With the launch of its GeForce 8600 and 8500 GPUs, NVIDIA became the first to offer 100% GPU based decoding of H.264 content. While we can assume that ATI will offer the same in its next-generation graphics architecture, the fact of the matter is that NVIDIA was first and you can actually buy these cards today with full H.264 decode acceleration.
We've taken two looks at 3D gaming performance of NVIDIA's GeForce 8600 series and came away relatively unimpressed, but for those interested in watching HD-DVD/Blu-ray content on their PCs does NVIDIA's latest mid-range offering have any redeeming qualities?
Before we get to the performance tests, it's important to have an understanding of what the 8600/8500 are capable of doing and what they aren't. You may remember this slide from our original 8600 article:
The blocks in green illustrate what stages in the H.264 decode pipeline are now handled completely by the GPU, and you'll note that this overly simplified decode pipeline indicates that the GeForce 8600 and 8500 do everything. Adding CAVLC/CABAC decode acceleration was the last major step in offloading H.264 processing from the host CPU, and it simply wasn't done in the past because of die constraints and transistor budgets. As you'll soon see, without CAVLC/CABAC decode acceleration, high bitrate H.264 streams can still eat up close to 100% of a Core 2 Duo E6320; with the offload, things get far more reasonable.
The GeForce 8600 and 8500 have a new video processor (that NVIDIA is simply calling VP2) that runs at a higher clock rate than its predecessor. Couple that with a new bitstream processor (BSP) to handle CAVLC/CABAC decoding, and these two GPUs can now handle the entire H.264 decode pipe. There's a third unit that wasn't present in previous GPUs that has made an appearance in the 8600/8500 and that is this AES128 engine. The AES128 engine is simply used to decrypt the content sent from the CPU as per the AACS specification, which helps further reduce CPU overhead.
Note that the offload NVIDIA has built into the G84/G86 GPUs is hardwired for H.264 decoding only; you get none of the benefit for MPEG-2 or VC1 encoded content. Admittedly H.264 is the more strenuous of the three, but given that VC1 content is still quite prevalent among HD-DVD titles it would be nice to have. Also note that as long as your decoder supports NVIDIA's VP2/BSP, any H.264 content will be accelerated. For MPEG-2 and VC1 content, the 8600 and 8500 can only handle inverse transform, motion compensation and in-loop deblocking and the rest of the pipe is handled by the host CPU; VP1 NVIDIA hardware only handles motion compensation and in-loop deblocking. ATI's current GPUs can handle inverse transform, motion compensation and in-loop deblocking, so they should in theory have lower CPU usage than the older NVIDIA GPUs on this type of content.
It's also worth noting that the new VP2, BSP and AES128 engines are only present in NVIDIA's G84/G86 GPUs, which are currently only used on the GeForce 8600 and 8500 cards. GeForce 8800 owners are out of luck, but NVIDIA never promised this functionality to 8800 owners so there are no broken promises. The next time NVIDIA re-spins its high end silicon we'd expect to see similar functionality there, but we're guessing that it won't be for quite some time.
The last time we looked at Blu-ray/HD-DVD playback on PCs we were sorely disappointed in software support, mostly because we needed to use a separate application for Blu-ray and HD-DVD playback despite similarities in the standards. Thankfully both Cyberlink and Intervideo have since introduced universal versions of their applications that support both Blu-ray and HD-DVD. Cyberlink's PowerDVD Ultra and Intervideo's WinDVD 8 support both standards through a single UI; unfortunately neither application appears to be quite ready for prime time.
Cyberlink's PowerDVD Ultra 7.3 gave us the most problems, especially with ATI hardware. The application was simply far more prone to random crashes than WinDVD 8, which was unfortunate given that it was the only of the two that properly enabled hardware acceleration on ATI GPUs.
WinDVD 8 didn't crash nearly as much as PowerDVD Ultra 7.3, but it did give us its fair share of problems. Complete application crashes were fairly rare, but on NVIDIA hardware we'd sometimes be greeted with a green version of whatever movie we were trying to watch. There was no rhyme or reason to why it would happen, but it just did. When things worked, they worked just fine though.
If you're running 64-bit Vista, you'll probably want to avoid installing either application as the problems we encountered were only amplified under the OS. Enabling hardware acceleration for ATI hardware under 64-bit Vista caused PowerDVD to crash anytime it attempted to playback an H.264 stream, while VC1 content was totally fine. WinDVD 8 gave us the wonderful problem of throwing an error whenever we hovered over a program menu item for too long. As much as we appreciated the improvement to our reflexes, we fondly preferred using WinDVD under 32-bit Vista where we could spend as much time as wanted in the menu without running into an error.
A quick perusal through Cyberlink and Intervideo's forums reveal that we aren't the only ones that have had issues with their software. Do keep these issues in mind if you are planning on turning your PC into a Blu-ray/HD-DVD playing powerhouse, as we're not yet at the point where you can get a truly CE experience on your PC with these applications.
It's a shame that we could only get ATI's hardware acceleration to work under PowerDVD and it's equally unfortunate that PowerDVD was so unstable because it was actually the faster of the two applications when it came to menu rendering/interaction time. Clearly both applications need work, but for our benchmarking purposes they sufficed to give us an initial look at what will be available once the bugs are fully vanquished.
We chose to test with four NVIDIA GPUs and two ATI GPUs. From NVIDIA we used the GeForce 8800 GTX, 8600 GTS, 8600 GT and the 7950 GT. The 8800 GTX and 7950 GT have the same VP as the rest of the GeForce 7 line, so they should offer fairly similar performance to everything else in NVIDIA's lineup that runs above 400MHz (remember that NVIDIA's VP stops working at core clocks below 400MHz). We included both 8600 cards to confirm NVIDIA's claim that the two 8600s would perform identically when it comes to H.264 decoding.
ATI uses its shader units to handle video decode, so there's more performance variance between GPUs. ATI only guarantees 720p or above decode acceleration on X1600 or faster GPUs and thus we included two parts in this review: a Radeon X1600 XT and a Radeon X1950 XTX; in theory the latter should be a bit better at its decode acceleration.
For our host CPU we chose the recently released Intel Core 2 Duo E6320, running at 1.86GHz with a 4MB L2 cache. As always, we reported both average and maximum CPU utilization figures. There will be some variability between numbers since we're dealing with manual measurements of CPU utilization, but you should be able to get an idea of basic trends.
We chose three HD-DVD titles for our performance test: Yozakura (H.264), The Interpreter (H.264) and Serenity (VC1). Yozakura is a Japanese HD-DVD that continues to be the most stressful test we've encountered; even on some of the fastest Core 2 systems it will still peak at 100% CPU utilization. Keep in mind that the NVIDIA GPUs don't handle CAVLC/CABAC for VC1 decode as VP2 is hardwired for H.264 decode, thus our VC1 test shouldn't show any tremendous improvement thanks to the new GPUs.
We used the Microsoft Xbox 360 HD-DVD drive for all of our tests.
|System Test Configuration|
|CPU:||Intel Core 2 Duo E6320 (1.86GHz/4MB)|
|Motherboard:||ASUS P5B Deluxe|
|Chipset Drivers:||Intel 188.8.131.520|
|Hard Disk:||Seagate 7200.7 160GB SATA|
|Memory:||Corsair XMS2 DDR2-800 4-4-4-12 (1GB x 4)|
|Video Card:||NVIDIA GeForce 8800 GTX
NVIDIA GeForce 8600 GTS
NVIDIA GeForce 8600 GT
NVIDIA GeForce 7950 GT
ATI Radeon X1950 XTX
ATI Radeon X1600 XT
|Video Drivers:||ATI Catalyst 7.4
NVIDIA ForceWare 158.16
|Desktop Resolution:||1920 x 1080 - 32-bit @ 60Hz|
|OS:||Windows Vista Ultimate 32-bit|
The Yozakura test isn't the highest bitrate test we have, but it is the most stressful we've encountered due to how it uses the H.264 codec. Our benchmark starts at the beginning of chapter 1 and continues until the 1:45 mark.
We start off with PowerDVD and immediately we see the tremendous difference that NVIDIA's new video decode engine offers. While even the previous generation NVIDIA hardware still eats up more than a single CPU core, the 8600s average in the low 20% for CPU utilization.
All of the steps that happen outside of the green box are responsible for any remaining CPU utilization seen when playing back H.264 content on a GeForce 8600.
Why isn't the CPU utilization down to 0%? The entire H.264 decode pipeline is handled on the GPU, but NVIDIA claims that the extra 20% is simply related to processing and decrypting data off of the disk before it's passed on to the GPU. If you had an unencrypted disk, the CPU utilization should be in the single digits.
The maximum CPU utilization for these two cards is still significant, but obviously much better than the 70%+ of the competitors. Surprisingly enough, ATI's hardware actually does worse than NVIDIA's in these tests despite offloading more of the decode pipeline than the GeForce 7 or 8800.
To confirm our findings we also ran the tests under WinDVD 8, which as we mentioned before doesn't support ATI hardware acceleration so the only GPUs compared here are from NVIDIA.
NVIDIA's older hardware actually does worse under WinDVD 8 than under PowerDVD, but the 8600 does a lot better.
Maximum CPU utilization is particularly better on the 8600s under WinDVD 8, the two never even break 24%.
Looking at the PowerDVD and WinDVD scores, it's interesting to note that while the 8600 GTS is clearly faster in PowerDVD, the two cards are basically tied under WinDVD. There is definitely room for further optimizations in PowerDVD at present, so hopefully we will get that along with bug fixes in a future update.
The Interpreter (H.264)
Our second H.264 test is The Interpreter which we've used in the past. Although it's not nearly as stressful as Yozakura, it still eats up almost all of our Core 2 Duo CPU at peak.
The BSP engine of the 8600 proves its worth once more as average CPU utilization drops to around 20% once more.
Maximum CPU utilization is a bit higher but still less than 30%. In a reversal from Yozakura, note how the 8600 GTS now has a slightly faster CPU utilization than the 8600 GT in PowerDVD.
WinDVD 8 tells a similar story: H.264 offload is absolutely necessary for good Blu-ray/HD-DVD playback.
Our final test is a VC1 test, meaning the new BSP engine remains idle in the GeForce 8600 while running this test as it is hardcoded to H.264 CAVLC/CABAC bitstreams. When decoding VC1 content, the new 8600 (and the 8500) are essentially the same as the GeForce 8800 GTX or the GeForce 7 series GPUs. While they do include support for inverse transform, that doesn't appear to make any significant difference to the strain on the CPU.
For some reason ATI's offerings continue to give us much higher CPU utilization figures. In this case it's as if hardware assist isn't working at all. We haven't been following ATI's AVIVO over the past several Catalyst revisions, so it is possible that somewhere along the line ATI broke compatibility. It could also be just one more software bug that needs to be fixed by PowerDVD. ATI's hardware is supposed to handle motion compensation while the NVIDIA hardware does not, so in theory ATI should be producing lower CPU utilization numbers in these VC1 tests.
Under WinDVD the story is no different; the new GPUs (as expected) do the same amount of decode work as the old ones and CPU utilization remains unchanged. Given that VC1 is predominantly an HD-DVD codec, the CPU utilization figures we're seeing here aren't terrible.
While NVIDIA has stated that it will look into adding a VC1 compatible BSP in future GPU revisions, it's not absolutely necessary today.
The reason that a handful of execution engines within a $150 graphics card can be faster than even some of the most powerful desktop microprocessors is because of the use of specialized logic designed specifically for the task at hand. NVIDIA took this approach to an even greater degree by effectively making its BSP engine useful for exactly one thing: CAVLC/CABAC bitstream decoding for H.264 encoded content. Needless to say, NVIDIA's approach is not only faster than the general purpose microprocessor approach, but it should also be more power efficient.
To measure the improvement in power efficiency, we outfitted our test bed with a GeForce 8600 GT and ran the Yozakura benchmark with hardware acceleration enabled and disabled. With it enabled, the 8600 GT is handling 100% of the H.264 decode process; with it disabled the host CPU (an Intel Core 2 Duo E6320) is responsible for decoding the video stream. We measured total system power consumption at the wall outlet and reported the average and max values in Watts.
At idle, our test bed consumed 112W and when decoding the most stressful H.264 encoded HD-DVD we've got the power jumped up to 124.8W. Relying on the CPU alone to handle the decoding required 8% more power, bringing the average system power usage up to 135.1W.
Surprisingly enough, the difference in power consumption isn't as great as we'd expect. Obviously system performance is a completely different story as the 8600's hardware acceleration makes multitasking while watching H.264 content actually feasible, but these numbers show the strength of Intel's 65nm manufacturing process. We do wonder what the power consumption difference would look like if a CPU manufacturer was able to produce a CPU and a GPU on the very same process. With AMD's acquisition of ATI, we may very well know the answer to that question in the coming years.
Although we haven't been terribly impressed with the gaming performance of the GeForce 8600, it is currently the best option for anyone looking to watch Blu-ray or HD-DVD on their PCs. The full H.264 offload onto the GPU makes HD movie playback not only painless but also possible on lower speed systems.
Even more interesting isn't the GeForce 8600, but the $100 GeForce 8500 that we'll be looking at in the coming weeks. According to NVIDIA, the GeForce 8500 will have the same H.264 decoding power as the 8600, so if you don't need the added 3D gaming performance then the 8500 will be an even better solution for HTPCs.
Honestly, the only downside to H.264 decoding with these cards isn't the cards themselves but rather the state of decoding software. WinDVD appears to be ahead in the fit and finish department, while hardware support is better with PowerDVD. WinDVD also performs better on the new 8600 GPUs for H.264 decoding, while PowerDVD is faster on other hardware configurations. Both applications also need serious work before they are useful in Vista 64-bit. We'd expect at least one or two more revisions of the software to go by before these problems really get taken care of.
Kudos to NVIDIA on being first to deliver full H.264 decode assist not only in any GPU but moreover in a mainstream GPU. Now it's a matter of how quickly NVIDIA can extend this functionality to the rest of its product line, and how quickly ATI can respond.