Original Link: http://www.anandtech.com/show/2284
HD Video Decode Quality and Performance Summer '07by Derek Wilson on July 23, 2007 5:30 AM EST
- Posted in
The current generation of graphics hardware is capable of delivering high definition video with lower CPU utilization and better quality than ever. Armed with the most recent drivers from AMD and NVIDIA we have spent quite a bit of time testing and analyzing the current state of HD playback on the GPU. And we have to say that while there are certainly some very high points here, we have our concerns as well.
Since the last time we tested HD playback performance on the 8600 line, we have seen software support improve dramatically. PowerDVD, especially, has come quite a long way and now fully supports both AMD and NVIDIA hardware with full hardware acceleration and is quite stable. Drivers from both camps have also now added HD video quality improvements in the form of post processing to their drivers. HD deinterlacing and noise reduction now (mostly) work as we would expect. This is in contrast to the across the board scores of 0 under HD HQV we saw earlier this year.
This will be the first time we test AMD's new R600 and RV6xx based graphics cards using our video decode tests. Our RV6xx based Radeon HD 2600 and 2400 hardware features AMD's UVD video decode pipeline that accelerates 100% of the HD video decode process on all codecs supported by HD-DVD and Blu-ray. NVIDIA's hardware falls short of AMD's offering in the VC-1 bitstream decoding department, as it leaves this task up to the CPU. We will try to evaluate just how much of an impact this difference will really offer end users.
Here's a breakdown of the decode features for the hardware we will be testing:
While the R600 based Radeon HD 2900 XT only supports the features listed as "Avivo", G84 and G86 based hardware comprise the Avivo HD feature set (100% GPU offload) for all but VC-1 decoding (where decode support is the same as the HD 2900 XT, lacking only bitstream processing).
With software and driver support finally coming up to speed, we will begin to be able to answer the questions that fill in the gaps with the quality and efficacy of AMD and NVIDIA's mainstream hardware. These new parts are sorely lacking in 3D performance, and we've been very disappointed with what they've had to offer. Neither camp has yet provided a midrange solution that bridges the gap between cost effective and acceptable gaming performance (especially under current DX10 applications).
Many have claimed that HTPC and video enthusiasts will be able to find value in low end current generation hardware. We will certainly address this issue as well.
Our test setup consisted of multiple processors including a high end, low end, and previous generation test case. Our desire was to evaluate how much difference hardware decode makes for each of these classes of CPU and to determine how much value video offload really brings to the table today.
Performance Test Configuration:
|CPU:|| Intel Core 2 Extreme X6800 (2.93GHz/4MB)
Intel Core 2 Duo E4300 (1.8GHz/2MB)
Intel Pentium 4 560 (3.6GHz)
|Chipset Drivers:||Intel 184.108.40.2064|
|Hard Disk:||Seagate 7200.7 160GB SATA|
|Memory:||Corsair XMS2 DDR2-800 4-4-4-12 (1GB x 2)|
|Video Drivers:|| ATI Catalyst 220.127.116.11-rc2
NVIDIA ForceWare 163.11
|Desktop Resolution:||1920 x 1080 - 32-bit @ 60Hz|
|OS:||Windows Vista x86|
We are using PowerDVD Ultra 7.3 with patch 3104a applied. This patch fixed a lot of our issues with playback and brought PowerDVD up to the level we wanted and expected. We did, however, have difficulty disabling GPU acceleration with this version of PowerDVD, so we will be unable to present CPU only decoding numbers. From our previous experience though, only CPUs faster than an E6600 can guarantee smooth decoding in the absence of GPU acceleration.
As for video tests, we have the final version of Silicon Optix HD HQV for HD-DVD, and we will be scoring these subjective tests to the best of our ability using the criteria provided by Silicon Optix and the examples they provide on their disk.
For performance we used perfmon to record average CPU utilization over 100 seconds (the default loop time). Our performance tests will include three different clips: The Transporter 2 trailer from The League of Extraordinary Gentlemen Blu-ray disc (H.264), Yozakura (H.264), and Serenity (VC-1). All of these tests proved to be very consistent in performance under each of our hardware configurations. Therefore, for readability's sake, we will only be reporting average CPU overhead.
HD HQV Image Quality Analysis
We have already explored Silicon Optix HD HQV in detail. The tests and what we are looking for in them have not changed since our first round. Fortunately, the ability of NVIDIA and AMD hardware to actually perform the tasks required of HD HQV has changed quite a bit.
Both AMD and NVIDIA told us to expect scores of 100 out of 100 using their latest drivers and hardware. We spent quite a bit of time and effort in fully evaluating this test. We feel that we have judged the performance of these solutions fairly and accurately despite the fact that some subjectivity is involved. Here's what we've come up with.
|Silicon Optix HD HQV Scores|
|Noise Reduction||Video Res Loss||Jaggies||Film Res Loss||Stadium||Total|
|AMD Radeon HD 2900 XT||15||20||20||25||10||90|
|AMD Radeon HD 2600 XT||15||20||20||25||10||90|
|AMD Radeon HD 2600 Pro||15||20||20||25||10||90|
|AMD Radeon HD 2400 XT||0||20||0||25||10||55|
|NVIDIA GeForce 8800 GTX||25||20||20||25||10||100|
|NVIDIA GeForce 8600 GTS||25||20||20||25||10||100|
|NVIDIA GeForce 8600 GT||25||20||20||25||10||100|
The bottom line is that NVIDIA comes out on top in terms of quality. We've seen arguments for scoring these cards differently, but we feel that this is the most accurate representation of the capabilities offered by each camp.
On the low end, both AMD and NVIDIA hardware begin to stumble in terms of quality. The HD 2400 XT posts quite a lack luster performance, failing in noise reduction and HD deinterlacing (jaggies). But at least it poorly deinterlaces video at full resolution. We excluded tests of NVIDIA's 8500 series, as their video drivers have not yet been optimized for their low end hardware. Even so, we have been given indications not to expect the level of performance we see from the 8600 series. We would guess that the 8500 series will perform on par with the AMD HD 2400 series, though we will really have to wait and see when NVIDIA releases a driver for this.
With video decode hardware built in as a separate block of logic and post processing being handled by the shader hardware, it's clear that the horrendous 3D performance of low end parts has bled through to their video processing capability as well. This is quite disturbing, as it removes quite a bit of potential value from low cost cards that include video decode hardware.
Both AMD and NVIDIA perform flawlessly and identically in every test but the noise reduction test. AMD uses an adaptive noise reduction algorithm that the user is unable to disable or even adjust in any way. NVIDIA, on the other hand, provides an adjustable noise reduction filter. In general, we prefer having the ability to adjust and tweak our settings, but simply having this ability is irrelevant in HQV scores.
The major issue that resulted in our scoring AMD down in noise reduction was that noise was not reduced significantly enough to match what we expected. In addition to the tests, Silicon Optix provides a visual explanation of the features tested, including noise reduction. They show a side by side video of a yellow flower (a different flower than the one presented in the actual noise reduction test). The comparison shows a noisy video on the left and a video with proper noise reduction applied on the right. The bottom line is that there is almost no noise at all in the video on the right.
During the test, although noise is reduced using AMD hardware, it is not reduced to the level of expectation set by the visual explanation of the test. Based on this assessment, we feel that AMD noise reduction deserves a score of 15 out of 25. Silicon Optix explains a score of 15 as: "The level of noise is reduced somewhat and detail is preserved." In order to achieve a higher score, we expect the noise to be reduced to the point where we do not notice any "sparkling" effect in the background of the image at all.
By contrast, with NVIDIA, setting the noise reduction slider anywhere between 51% and 75% gave us a higher degree of noise reduction than AMD with zero quality loss. At 75% and higher we noticed zero noise in the image with no detail loss until noise reduction was set very high. Tests done with the noise reduction slider at 100% show some detail loss, but there is no reason to crank it up that high unless your HD source is incredibly noisy (which will not likely be the case). In addition, at such high levels of noise reduction, we noticed banding and artifacts in some cases. This was especially apparent in the giant space battle near the end of Serenity. It seems to us that computer generated special effects seemed to suffer from this issue more than other aspects of the video.
While, ideally, we would like to see artifacts avoided at all cost, NVIDIA has provided a solution that offers much more flexibility than their competition. With a little experimentation, a higher quality experience can be delivered on NVIDIA hardware than on AMD hardware. In fact, because NVIDIA sets noise reduction to default off, we feel that the overall experience provided to consumers will be higher.
Transporter 2 Trailer (High Bitrate H.264) Performance
This is our heaviest hitting benchmark of the bunch. Nestled into the recesses of the Blu-ray version of The League of Extraordinary Gentlemen (a horrible move if ever there was one) is a very aggressively encoded trailer for Transporter 2. This ~2 minute trailer is encoded with an average bitrate of 40 Mbps. The bitrate actually peaks at nearly 54 Mbps by our observation. This pushes up to the limit of H.264 bitrates allowed on Blu-ray movies, and serves as an excellent test for a decoder's ability to handle the full range of H.264 encoded content we could see on Blu-ray discs.
First up is our high performance CPU test (X6800):
Neither the HD 2900 XT nor the 8800 GTX feature bitstream decoding on any level. They are fairly representative of older generation cards from AMD and NVIDIA (respectively) as we've seen in past articles. Clearly, a lack of bitstream decoding is not a problem for such a high end processor, and because end users generally pair high end processors with high end graphics cards, we shouldn't see any problems.
Lower CPU usage is always better. By using an AMD card with UVD, or an NVIDIA card featuring VP2 hardware (such as the 8600 GTS), we see a significant impact on CPU overhead. While AMD does a better job at offloading the CPU (indicating less driver overhead on the part of AMD), both of these solutions enable users to easily run CPU intensive background tasks while watching HD movies.
Next up is our look at an affordable current generation CPU (E4300):
While CPU usage goes up across the board, we still have plenty of power to handle HD decode even without H.264 bitstream decoding on our high end GPUs. The story is a little different when we look at older hardware, specifically our Pentium 4 560 (with Hyper-Threading) processor:
Remember that these are average CPU utilization figures. Neither the AMD nor the NVIDIA high end parts are able to handle decoding in conjunction with the old P4 part. Our NetBurst architecture hardware just does not have what it takes even with heavy assistance from the graphics subsystem and we often hit 100% CPU utilization without one of the GPUs that support bitstream decoding.
Of course, bitstream decoding delivers in a HUGE way here, not only making HD H.264 movies watchable on older CPUs, but even giving us quite a bit of headroom to play with. We wouldn't expect people to pair the high end hardware with these low end CPUs, so there isn't much of a problem with the lack in this area.
Clearly offloading CABAC and CAVLC bitstream processing for H.264 was the right move, as the hardware has a significant impact on the capabilities of the system on the whole. NVIDIA is counting on bitstream processing for VC-1 not really making a difference, and we'll take a look at that in a few pages. First up is another H.264 test case.
Yozakura (High Complexity H.264) Performance
H.264 offers quite a range of options, and we haven't seen everyone taking advantage of some of the more advanced features. Yozakura is encoded in 1080i at 25 Mbps. This is fairly low for H.264 maximums, but this is still very CPU intensive because the video is encoded using macroblock adaptive frame/field (MBAFF) coding. MBAFF is a high quality technique to ensure maximum visual fidelity in interlaced video by adaptively selecting frame or field encoding per macroblock based on a motion threshold.
While 1080p is clearly Hollywood's choice of resolution, there is 1080i encoded content out there now and more likely to come. As TV shows transition to HD, we will likely see 1080i as the choice format due to the fact that this is the format in which most HDTV channels are broadcast (over-the-air and otherwise), 720p being the other option. It's nice to know that H.264 offers high quality interlaced HD encoding options, and we hope content authors who decide to release their creations in 1080i will take advantage of things like MBAFF.
Additionally, good deinterlacing is essential for getting a good experience with movies like this. Poorly deinterlaced HD content is not only sad to watch, but gives this author quite a headache. Jaggies and feathering are horrible distractions at this resolution. As long as you stick with an HD 2600 or GeForce 8600 series or higher you should be fine here. Any slower just won't cut it when trying to watch 1080i on a progressive scan display.
Our high end CPU is able to cope fairly well, with the 8800 GTX besting the 2900 XT in performance while UVD leads VP2 putting the 2600 XT ahead of the 8600 GTS.
For our cheap yet current processor, we do see utilization go up, but the hardware with bitstream decoding maintains very low overhead. All of our GPUs maintain good performance when paired with this level of processor. Of course, we would likely not see the high end GPUs matched with such a CPU (unless we are looking at notebooks, but that's a whole other article).
For our older hardware, Yozakura is simply not watchable without bitstream decoding. With the numbers for the high end AMD and NVIDIA GPUs even worse than under our Transporter 2 trailer test, it's clear that NetBurst does not like whatever Yozakura is doing. It may be that decoding the bitstream when MBAFF is used is branch heavy causing lots of stalls. All we can say for sure is that, once again, GPU accelerated bitstream decoding is necessary to watch H.264 content on older/slower hardware.
Serenity (VC-1) Performance
We haven't yet found a VC-1 title to match either of the H.264 titles we tested in complexity or bitrate, so we decided to stick with our tried and true test of Serenity. The main event here is in determining the real advantage of including VC-1 bitstream decoding on the GPU. NVIDIA's claim is that this is not as complex as it is under H.264 so it isn't necessary. AMD is pushing their solution as more complete, but does it really matter? Let's take a look.
Our HD 2900 XT has the highest CPU utilization, while the 8600 GTS and 8800 GTS share roughly the same performance. The HD 2600 XT leads the pack with an incredibly low CPU overhead of just 5 percent. This is probably approaching the minimum overhead of AACS handling and disk accesses through PowerDVD, which is very impressive. At the same time, the savings with GPU bitstream decode are not as impressive under VC-1 as on H.264 on the high end.
Dropping down in processor power doesn't heavily impact CPU overhead in the case of VC-1.
Moving all the way down to a Pentium 4 based processor, we do see higher CPU utilization across the board. The difference isn't as great as under H.264, and, not only that, but VC-1 movies appear to remain very playable on this hardware even without bitstream decoding on the GPU. This is not the case for our H.264 movies. While we wouldn't recommend it with the HD 2900 XT, we could even consider looking at a (fairly fast) single core CPU the other hardware, with or without full decode acceleration.
While noise reduction can be a good thing, when viewing well mastered and high quality compressed HD video, noise should be kept at a minimum anyway. We've seen our fair share of early HD releases where noise is simply atrocious, however, and we expect that it will take some studios a little time to adjust to the fact that higher resolution movies not only look better, but reveal flaws more readily as well. For now (especially for movies like X-Men 3), noise reduction is highly appreciated. But down the line we hope that studios will put a bit more effort into delivering a polished product.
There are cases where blending effects require a bit of added noise to give scenes a more natural feel. Noise can even be cranked way up by a director to provide an artistic or dated effect. In these cases (which will hopefully be most cases where noise is evident in the future), we want to view HD material as it was delivered. When presented with poor post processing from a studio it is nice to have the ability to make our own decisions on how we want to view the content. These facts make it clear to us that the ability to enable or disable noise reduction is an imperative feature for video processors. While fully adjustable noise reduction might not be as necessary, it is absolutely appreciated and offers those who know what they are doing the highest potential image quality across every case.
Those who choose to stick with very well produced 1080p content may not need post processing noise reduction or deinterlacing, but they might miss out on imported content or HD releases of some TV series (depending on what studios choose to do in that area). For now, we're going to recommend that users interested in HTPC setups stick with the tools that can get the job done best no matter what the source material is. The only options for HD video intensive systems today are the Radeon HD 2600 and GeForce 8600 series cards. For its better handling of noise reduction (and especially the fact that it can be turned off) we recommend the 8600 GT/GTS above the other options in spite of the fact that the 2600 XT provided better CPU offloading.
We have to stress here that, in spite of the fact that NVIDIA and AMD expect the inclusion of video decode hardware on their low end hardware to provide significant value to end users, we absolutely cannot recommend current low end graphics card for use in systems where video decode is important. In our eyes, with the inability to provide a high quality HD experience in all cases, the HD 2400, GeForce 8500, and lower end hardware are all only suitable for use in business class or casual computing systems where neither games nor HD video play a part in the system's purpose.
AMD's UVD does beat out NVIDIA's VP2 in both H.264 and VC-1 decode performance. However, it isn't really enough to make a tangible difference in the viewing of movies. Performance is important, and UVD performance is certainly impressive. But we still have to favor the 8600 for its superior image quality.
VC-1 bitstream decoding doesn't have as large an impact as H.264 bitstream decoding. We would have to drop down to a significantly slower CPU in order for the difference to offer AMD an advantage. In the scenarios we tested, we feel that NVIDIA didn't make a serious blunder in skipping the inclusion of hardware to handle VC-1 bitstreams. At least, they didn't make as serious a blunder as AMD did by not including UVD in their HD 2900 XT.
In the future, we won't "need" H.264 or VC-1 decode on our GPUs either (just as we don't "need" MPEG-2 acceleration for current CPUs), but we don't see this as a valid excuse not to provide a full range of functionality for end users. And need is a relative term at best. We can do good realtime 3D on CPUs these days, but we don't see graphics card companies saying "this card will be paired with a high end CPU so we decided not to implement [insert key 3D feature] in hardware." We want to see AMD and NVIDIA include across the board support for video features in future product lineups. Saving CPU cycles isn't an exclusive desire of owners of low end hardware, and when we buy higher end hardware we expect higher performance.