LAV CUVID can be benchmarked using GraphStudio's inbuilt benchmark to check the video decoder performance. Unfortunately, GraphStudio can't use madVR in this process. Since our intent was to determine the performance of the GPU with and without madVR enabled, it was essential that madVR be a part of the benchmark. The developer of madVR, Mathias Rauen, created a special benchmarking build which was used to generate the figures in this section.

The picture below shows the madVR benchmark build working in the decode-only mode on the GT 430 for a 1080i60 H264 clip.

Click to Enlarge

LAV CUVID is doing the actual decoding (that is not visible in the picture) and sending frames over to the madVR filter, but the filter just keeps track of the decode frame rate and doesn't render it. All the driver post processing steps are enabled. The interlaced clip being played back uses around 76% of the VPU. Decoding is being performed at 91 fps, much more than the clip's 60 fps rate. The GPU load is 79%, and that is because of the deinterlacing being performed using the shaders. This shows there is some headroom available in the GPU for further post processing. Is there enough for madVR ? The picture below shows the benchmark build working in the decode + post processing mode.

Click to Enlarge

Note that the frame rate falls below the real time requirement. At 52 fps, the renderer drops approximately 8 frames every second. The VPU load falls to 38% because the process is now limited by how fast the processing steps in madVR can execute. GPU-Z shows that madVR has caused the GPU load to hike up to 97%, and this becomes the bottleneck in the chain.

Another interesting aspect to note in the GPU-Z screenshots above is that madVR increases the load on the GPU's memory controller from 23% to 36%. This is to be expected, as madVR makes multiple passes over the frame and needs to move data back and forth between the shaders and the GPU's DRAM.

The extent of drop in the frame rate (and whether it fails to meet real time requirements) is decided by the options enabled in the madVR settings. We ran the benchmarks with various madVR configurations and for various codecs to get an idea of the performance of LAV CUVID, madVR and of course, the GPUs.

Before moving on to the benchmarking results, we have some more notes about the upsampling algorithms in madVR. Human eyes are much less sensitive to chroma resolution than to luma resolution. This is the reason why chroma is stored in a lower resolution with 4:2:0 compression. Due to the low chroma resolution, chroma often tends to look blocky with visible aliasing (especially visible when you have e.g. red fonts on black background). Usually, the best way to upsample chroma is to use a very soft interpolator to remove all the aliasing. However, that comes at the cost of chroma sharpness. A less soft chroma upsampling algorithm will achieve sharpness. Basically, one can't have the cake and eat it too. So, it is a matter of taste as to whether one prefers removal of aliasing or wants a sharper picture.

The default luma algorithm used by madVR is Lanczos. The default chroma algorithm is SoftCubic 100 (which is very soft). It is not recommended to set chroma upsampling to Lanczos or Spline as they are very sharp. The cost in performance is also too big to be worth the gain for chroma. SoftCubic, Bicubic or Mitchell-Netravali are suggested for chroma upsampling as they are all 2-tap and need less GPU resources. In any case, it is hard to spot differences between various chroma algorithms in most real life images.

For luma upsampling the situation is very different. Most people prefer sharp results. The luma algorithm has a much bigger impact on overall image quality than the chroma upsampling algorithm. For luma upscaling, the nice sharp Lanczos 4 or Spline 4 is preferred by some users. Some prefer the SoftCubic 50 because it does a better job at hiding source artifacts. Others prefer Mitchell-Netravali or Bicubic for a more all around solution. There is no hard recommendation for this.

The madVR settings used for benchmarking were classified broadly into three categories:

  1. Low Quality : Bilinear luma and chroma scaling
  2. Medium Quality : Bicubic (sharpness 50) luma scaling and Bilinear chroma scaling
  3. High Quality : Lanczos (4-tap) luma scaling and SoftCubic (softness 70) chroma scaling

Scaling is one of the core functions in madVR, but it is not needed if the display resolution matches that of the video. In the 1080p and 1080i videos presented below, there is no scaling of luma, but chroma needs to be upsampled, though. The 'trade quality for performance' madVR options didn't seem to improve performance too much, and all of them were kept unchecked for benchmarking.

In the graphs below, 'Full VPP' refers to all the video post processing options as set in the NVIDIA Control Panel. The other entries refer to the madVR settings described above. The top row in each graph indicates the performance of the LAV CUVID decoder. When compared with the benchmarks of the DXVA2 decoders (presented in an earlier section), we see that the LAV CUVID decoder has almost no performance penalty.

In the graphs below, we try to identify what causes the throughput to fall below 60 fps. First, let us take a look at the 1080p H.264 clip.

1080p H.264

In the above graph, we see that the lack of shaders in the GT 520 affects the madVR performance. The madVR steps become the bottleneck in this case. On the GT 430, the VPU remains the bottleneck till the more complicated scaling algorithms (of theoretical interest) are enabled (which are not presented in the graph above).

1080p MPEG-2

1080p VC-1

We see the same trends continuing for MPEG-2 and VC-1 also. Now, we move on to get a first glimpse at the extent of hardware acceleration available for MPEG-4 streams.

1080p MPEG-4 [DiVX]

1080p MPEG-4 [XViD]

As expected, we get decent hardware acceleration for MPEG-4 and the post processing impact is the same as that for the other codecs.

Interlaced streams don't seem to alter the trend. The absolute values of the maximum decode frame rate is slightly lower in the high stress cases due to the overhead from deinterlacing. The GT 430's efficiency is now limited by shader power, rather than the VPU.

1080i H.264

1080i MPEG-2

1080i VC-1

How do things change when we try to upscale the non-1080p content onto a 1080p display? This is probably where madVR's algorithms are needed most. To test this out, we put some non-1080i/p H.264 clips through the same benchmark.

720p H.264

480p H.264

480i H.264

An interesting result in the above benchmark is that the 480i H.264 stream can be processed faster using the GT 430 compared to the GT 520 with madVR disabled. It is quite obvious here that the deinterlacing using the GT 520's shaders is the bottleneck once the VPU hits 300 fps.

In all of the above non-1080i/p benchmarks, the lack of shaders in the GT 520 really hurt it. At 720p60, the High Quality frame rate is very close to 60 fps, and can't be recommended. The GT 430 holds up pretty decently in all the cases.

The takeaway from this section is that the GT 520 is not entirely suitable for madVR processing if you deal with a lot of SD material. The GT 430 is quite suitable for madVR processing as long as you keep the settings sane.

madVR is still an advanced HTPC user's tool. However, it should gain further traction with support for integrated hardware decoding and other driver supplied post processing options. We have covered a solution for NVIDIA GPU based HTPCs in this section. Let us see how this plays out for the AMD and Intel GPU platforms in the future.

Software for NVIDIA HTPCs : LAV CUVID and madVR Miscellaneous Issues
Comments Locked

70 Comments

View All Comments

  • ganeshts - Thursday, June 16, 2011 - link

    PotPlayer apparently doesn't have support for hardware deinterlacing, and has a host of other issues [ Search for PotPlayer in this page and then read the next set of posts about it : http://www.avsforum.com/avs-vb/showthread.php?p=20... ].

    Of course, if it works for you, it is great :) (probably it is a good solution for people watching progressive material only).

    The author of LAV CUVID talks in that thread about how renderless DXVA mode works with madVR at the cost of deinterlacing.

    Btw, there is no decode of DTS-HD in any open source software now. Both ffdshow and PotPlayer can decode only the core DTS soundtrack. DTS decode has been around for a long time, though.
  • NikosD - Saturday, June 18, 2011 - link

    Indeed, I was referring to progressive material only - interlaced material is rare - but the page you mentioned says PotPlayer has CPU deinterlacing.

    I don't see where is the problem.

    Hardware Deinterlacing is less important - for most users - than Hardware Decoding (DXVA) and less important than the UNIQUE capability of using DXVA + madVR at the same time.

    The cost of hardware deinterlacing is nothing compared to the cost of DXVA and madVR.

    For the audio part of your answer, I have to say that because of my AVR (Pioneer VSX-920) decoding inside a PC, BluRay, Media Player or any other decoding capable device of multi-channel audio is never an option for me.

    I always prefer the bitstreaming solutions for multi-channel audio - as most of the owners of AVR do - like those provided by FFDshow and PotPlayer which both are more than capable of providing them.

    That's why I wrote "decoding and pass-through", I had to write "splitting and pass-through".

    One last word.

    For every piece of software out there, there is always a list of changes, bugs, things to do.

    That doesn't mean we don't use it or like it.
  • PR3ACH3R - Friday, June 17, 2011 - link

    @Ganesh T S,
    This is some NICE work.
    In fact, I cannot recall when was the last time I have seen such an in depth article on the HTPC GPU subject in Anandtech.

    The balance between the technical issues, the background, & the effort to honestly report all issues known to you in this article, is spot on.

    If it is missing something on the issues report, it misses on the ATI/AMD DPC Latency spiking issues.

    As this is still remains unnoticed in Anandtech even in this excellent article, here is a link to the AVS post describing it.

    http://www.avsforum.com/avs-vb/showthread.php?t=12...

    (Ignore some of the discredit attempt posts in this thread, this problem exists to this very day.)
  • NikosD - Thursday, June 23, 2011 - link

    Well, I did some further tests and found out that PotPlayer does have hardware deinterlacing.

    Have you done any tests by yourself to see if the player supports Hardware Deinterlacing ?
  • ganeshts - Saturday, June 25, 2011 - link

    NikosD, I will definitely try PotPlayer out in the next GPU review. Till now, my knowledge is limited to what is there in the AVSForum thread.
  • flashbacck - Friday, June 24, 2011 - link

    I know HTPCs are even more of niche these days than ever, so I appreciate you still doing these tests very much.
  • wpoulson - Thursday, July 28, 2011 - link

    I really appreciate this guide and have been stepping through it

    I just registered the ASVid.ax file from TMT5 but the filter is not showing up in the External Filter section of MPC-HC. At first I thought it might be because I registered it on the 32 bit side and I'm using 64 bit MPC-HC, so I unregistered the file from System 32 and registered it on the 64 bit side.

    I registered it by going to Start>CMD>Cntrl-Shift-Enter and using the "Regsvr32" command to register the file. I put the file along with the checkactivate dll in a folder in the root directory of my C drive and pointed the Regsvr command to the ASVid.ax file. After hitting enter, I received a "dll successfully registered" message.

    Can someone help me to get the filter visible for MPC-HC?

    A question...While it's considered beta, will the new LAV video decoder do the same thing the arcsoft video decoder does?

    Thanks

    Warren
  • stuartm - Friday, January 20, 2012 - link

    I am aware the gt 430 is a good choice to work around the infamous WMC 29/59 framerate bug. Can you comment on whether or not the 6570 will stutter or not when playing content with 29/59 framerate problems? A very important consideration for those of us using ceton or HDHR Primes (or the new Hauppauge box) for cable TV Live viewing and record/replay.

    Thank You
  • MichaelSan1980 - Saturday, January 21, 2012 - link

    I'd use my HTPC for DVD's and BD's only with an Full-HD TV. Since i have a rather strong CPU and wouldn't use Hardware Deinterlacing for DVDs, i wonder, if the GT520 is ~that~ bad, in terms of image quality?
  • drizzo4shizzo - Wednesday, June 20, 2012 - link

    Old guy here.

    In the market but I need confirmation that these cards can do component output to "old guy" HDTV.

    NONE of the marketing materials suggest that any recent card can.

    Meaning they either come with a component video breakout or at least are compatible with a known 3rd party product, and that they can do the RGB -> YUV thing.

    This ancient EVGA 7600 GT I have does it... with an "svideo lookalike" 7 pin -> component breakout.

    Anyone? Beuller?

Log in

Don't have an account? Sign up now