Discrete HTPC GPU Shootout

Name: Discrete HTPC GPU Shootout
Item: Discrete HTPC GPU Shootout
Author: Ganesh T S

by Ganesh T S on June 12, 2011 10:30 PM EST

Posted in
GPUs
AMD
Sapphire
MSI
HTPC
NVIDIA

70 Comments | Add A Comment

70 Comments

LAV CUVID can be benchmarked using GraphStudio's inbuilt benchmark to check the video decoder performance. Unfortunately, GraphStudio can't use madVR in this process. Since our intent was to determine the performance of the GPU with and without madVR enabled, it was essential that madVR be a part of the benchmark. The developer of madVR, Mathias Rauen, created a special benchmarking build which was used to generate the figures in this section.

The picture below shows the madVR benchmark build working in the decode-only mode on the GT 430 for a 1080i60 H264 clip.

Click to Enlarge

LAV CUVID is doing the actual decoding (that is not visible in the picture) and sending frames over to the madVR filter, but the filter just keeps track of the decode frame rate and doesn't render it. All the driver post processing steps are enabled. The interlaced clip being played back uses around 76% of the VPU. Decoding is being performed at 91 fps, much more than the clip's 60 fps rate. The GPU load is 79%, and that is because of the deinterlacing being performed using the shaders. This shows there is some headroom available in the GPU for further post processing. Is there enough for madVR ? The picture below shows the benchmark build working in the decode + post processing mode.

Click to Enlarge

Note that the frame rate falls below the real time requirement. At 52 fps, the renderer drops approximately 8 frames every second. The VPU load falls to 38% because the process is now limited by how fast the processing steps in madVR can execute. GPU-Z shows that madVR has caused the GPU load to hike up to 97%, and this becomes the bottleneck in the chain.

Another interesting aspect to note in the GPU-Z screenshots above is that madVR increases the load on the GPU's memory controller from 23% to 36%. This is to be expected, as madVR makes multiple passes over the frame and needs to move data back and forth between the shaders and the GPU's DRAM.

The extent of drop in the frame rate (and whether it fails to meet real time requirements) is decided by the options enabled in the madVR settings. We ran the benchmarks with various madVR configurations and for various codecs to get an idea of the performance of LAV CUVID, madVR and of course, the GPUs.

Before moving on to the benchmarking results, we have some more notes about the upsampling algorithms in madVR. Human eyes are much less sensitive to chroma resolution than to luma resolution. This is the reason why chroma is stored in a lower resolution with 4:2:0 compression. Due to the low chroma resolution, chroma often tends to look blocky with visible aliasing (especially visible when you have e.g. red fonts on black background). Usually, the best way to upsample chroma is to use a very soft interpolator to remove all the aliasing. However, that comes at the cost of chroma sharpness. A less soft chroma upsampling algorithm will achieve sharpness. Basically, one can't have the cake and eat it too. So, it is a matter of taste as to whether one prefers removal of aliasing or wants a sharper picture.

The default luma algorithm used by madVR is Lanczos. The default chroma algorithm is SoftCubic 100 (which is very soft). It is not recommended to set chroma upsampling to Lanczos or Spline as they are very sharp. The cost in performance is also too big to be worth the gain for chroma. SoftCubic, Bicubic or Mitchell-Netravali are suggested for chroma upsampling as they are all 2-tap and need less GPU resources. In any case, it is hard to spot differences between various chroma algorithms in most real life images.

For luma upsampling the situation is very different. Most people prefer sharp results. The luma algorithm has a much bigger impact on overall image quality than the chroma upsampling algorithm. For luma upscaling, the nice sharp Lanczos 4 or Spline 4 is preferred by some users. Some prefer the SoftCubic 50 because it does a better job at hiding source artifacts. Others prefer Mitchell-Netravali or Bicubic for a more all around solution. There is no hard recommendation for this.

The madVR settings used for benchmarking were classified broadly into three categories:

Low Quality : Bilinear luma and chroma scaling
Medium Quality : Bicubic (sharpness 50) luma scaling and Bilinear chroma scaling
High Quality : Lanczos (4-tap) luma scaling and SoftCubic (softness 70) chroma scaling

Scaling is one of the core functions in madVR, but it is not needed if the display resolution matches that of the video. In the 1080p and 1080i videos presented below, there is no scaling of luma, but chroma needs to be upsampled, though. The 'trade quality for performance' madVR options didn't seem to improve performance too much, and all of them were kept unchecked for benchmarking.

In the graphs below, 'Full VPP' refers to all the video post processing options as set in the NVIDIA Control Panel. The other entries refer to the madVR settings described above. The top row in each graph indicates the performance of the LAV CUVID decoder. When compared with the benchmarks of the DXVA2 decoders (presented in an earlier section), we see that the LAV CUVID decoder has almost no performance penalty.

In the graphs below, we try to identify what causes the throughput to fall below 60 fps. First, let us take a look at the 1080p H.264 clip.

In the above graph, we see that the lack of shaders in the GT 520 affects the madVR performance. The madVR steps become the bottleneck in this case. On the GT 430, the VPU remains the bottleneck till the more complicated scaling algorithms (of theoretical interest) are enabled (which are not presented in the graph above).

We see the same trends continuing for MPEG-2 and VC-1 also. Now, we move on to get a first glimpse at the extent of hardware acceleration available for MPEG-4 streams.

As expected, we get decent hardware acceleration for MPEG-4 and the post processing impact is the same as that for the other codecs.

Interlaced streams don't seem to alter the trend. The absolute values of the maximum decode frame rate is slightly lower in the high stress cases due to the overhead from deinterlacing. The GT 430's efficiency is now limited by shader power, rather than the VPU.

How do things change when we try to upscale the non-1080p content onto a 1080p display? This is probably where madVR's algorithms are needed most. To test this out, we put some non-1080i/p H.264 clips through the same benchmark.

720p H.264

480p H.264

480i H.264

An interesting result in the above benchmark is that the 480i H.264 stream can be processed faster using the GT 430 compared to the GT 520 with madVR disabled. It is quite obvious here that the deinterlacing using the GT 520's shaders is the bottleneck once the VPU hits 300 fps.

In all of the above non-1080i/p benchmarks, the lack of shaders in the GT 520 really hurt it. At 720p60, the High Quality frame rate is very close to 60 fps, and can't be recommended. The GT 430 holds up pretty decently in all the cases.

The takeaway from this section is that the GT 520 is not entirely suitable for madVR processing if you deal with a lot of SD material. The GT 430 is quite suitable for madVR processing as long as you keep the settings sane.

madVR is still an advanced HTPC user's tool. However, it should gain further traction with support for integrated hardware decoding and other driver supplied post processing options. We have covered a solution for NVIDIA GPU based HTPCs in this section. Let us see how this plays out for the AMD and Intel GPU platforms in the future.

Software for NVIDIA HTPCs : LAV CUVID and madVR Miscellaneous Issues

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

70 Comments

View All Comments

jwilliams4200 - Monday, June 13, 2011 - link
All the numbers add up correctly now. Thanks for monitoring the comments and fixing the errors!
Samus - Monday, June 13, 2011 - link
Honestly, my Geforce 210 has been chillin' in my HTPC for 2+ years, and works perfectly :)
josephclemente - Monday, June 13, 2011 - link
If I am running a Sandy Bridge system with Intel HD Graphics 3000, do these cards have any benefit over integrated graphics? What is Anandtech's HQV Benchmark score?

I tried searching for scores, but people say this is subjective and one reviewer may differ from another. One site says 196 and another in the low 100's. What does this reviewer say?
ganeshts - Monday, June 13, 2011 - link
Give me a couple of weeks. I will be getting a test system soon with the HD 3000, and I will do detailed HQV benchmarking in that review too.
dmsher99@gmail.com - Tuesday, June 14, 2011 - link
I recently built a HTPC with a core i5-2500k on a ASUS P8H67 EVO with a Ceton InfiniTV cable card. Note that the Intel driver is fundamentally flawed and will destroy a system if patched. See the Intel communities thread 20439 for more details.

Besides causing BSOD over HDMI output when patched, the stable versions have their own sets of bugs including a memory bleed when watching some premium content on HD channels that crashed WMC. Intel appears to have 1 part time developer working on this problem but every test river he puts out breaks more than it fixes. Watching the same, content with a system running a NVIDIA GPU and the memory bleed goes away.

In my opinion, second gen SB chips is just not ready for prime time in a fully loaded HTPC.
jwilliams4200 - Monday, June 13, 2011 - link
"The first shot shows the appearance of the video without denoising turned on. The second shot shows the performance with denoising turned off. "

Heads I win, tails you lose!
ganeshts - Monday, June 13, 2011 - link
Again, sorry for the slip-up, and thanks for bringing it to our notice. Fixed it. Hopefully, the gallery pictures cleared up the confusion (particularly the Noise Reduction entry in the NVIDIA Control Panel)
stmok - Monday, June 13, 2011 - link
Looking through various driver release README files, it appears the mobile Nvidia Quadro NVS 4200M (PCI Device ID: 0x1056) also has this feature set.

The first stable Linux driver (x86) to introduce support for Feature Set D is 270.41.03 release.
=> ftp://download.nvidia.com/XFree86/Linux-x86/270.41...

It shows only the Geforce GT 520 and Quadro NVS 4200M support Feature Set D.

The most recent one confirms that they are still the only models to support it.
=> ftp://download.nvidia.com/XFree86/Linux-x86/275.09...
ganeshts - Monday, June 13, 2011 - link
Thanks for bringing it to our notice. When that page was being written (around 2 weeks back), the README indicated that the GT 520 was the only GPU supporting Feature Set D. We will let the article stand as-is, and I am sure readers perusing the comments will become aware of this new GPU.
havoti97 - Monday, June 13, 2011 - link
So basically the app store's purpose is to attract submissions of ideas for features of their next OS, uncompensated of course. All the other crap/fart apps not worthy are approved and people make pennies of those.

Discrete HTPC GPU Shootout

Post Your Comment

70 Comments

View All Comments

jwilliams4200 - Monday, June 13, 2011 - link

Samus - Monday, June 13, 2011 - link

josephclemente - Monday, June 13, 2011 - link

ganeshts - Monday, June 13, 2011 - link

dmsher99@gmail.com - Tuesday, June 14, 2011 - link

jwilliams4200 - Monday, June 13, 2011 - link

ganeshts - Monday, June 13, 2011 - link

stmok - Monday, June 13, 2011 - link

ganeshts - Monday, June 13, 2011 - link

havoti97 - Monday, June 13, 2011 - link

Log in

Don't have an account? Sign up now