We ran the DXVA Checker benchmark on all the cards, but our graphing engine allows us to present only four series in each graph. This meant that we had to choose between the GDDR5 based AMD 6450 and the DDR3 based MSI 6450. Keeping in mind our focus on passively cooled GPUs, we went with the latter. It was observed that the 'No VPP (Video Post Processing)' frame rates were similar for both the candidates. However, as post processing algorithms were enabled, the MSI 6450 began to perform a bit worse than the AMD 6450. We will analyze the probable cause later. We were able to get DXVA2 acceleration with the EVR renderer for all codecs except the MPEG-4 variants.

First, we look at a 1080p H.264 clip. The MPC Video Decoder v1.5.2.3134 was able to playback the clip without issues on all the GPUs. There were a couple of surprises in store when the DXVA Checker benchmark (as described in the previous section) was run.

1080p H.264

While the GT 430 was unable to reach the magical 60 fps benchmark figure (I expect any GPU worth its salt to be able to decode 1080p60 H264 clips), the GT 520 sprang a surprise with some insane decoding speeds. Even considering the fact that the GT 520 took shortcuts by skimping on the post processing, it comfortably beats every other GPU in the race. The 430's benchmark result was even more puzzling, considering the fact that all the 1080p60 AVCHD and re-encoded broadcast clips that we threw at it played back flawlessly. We talked to NVIDIA about this, and it looks like the culprit in this case was the bitrate. Our sample was a 40 Mbps clip at 1080p30. At 60 fps, the VPU engine would have had to process a sample at 80 Mbps, and apparently, the VP4 engine in the GT 430 is simply not capable of that. We are willing to cut NVIDIA some slack here, because I have personally not seen any real 1080p60 content at 80 Mbps. We will cover both of the above aspects in detail in the next section.

With the exception of the 6450, we find that enabling various post-processing options doesn't bring down the decode frame rate. This shows that the latency of the post processing steps is completely hidden by the time spent in the UVD / VPU engines to obtain the decoded frame. For the 6450, we infer that the lower core clock for the stream processors slows down the post processing steps a bit too much.

For the 1080p VC-1 clip, we again use MPC Video Decoder v1.5.2.3134 for flawless play back.

1080p VC-1

We find that the NVIDIA GPUs hide their post processing latency in the time taken by the VPU engine. However, the 6570 shows a gradual decline in the throughput as various options are enabled. The decline is not as alarming as the 6450's, and manages to comfortably stay above 60 fps.

VLD acceleration for MPEG-2 was only recently introduced in the UVD 3 engine by AMD. The Microsoft DTV-DVD Video Decoder is able to provide DXVA2 acceleration for MPEG-2 clips.

1080p MPEG-2

It is not clear why turning on deinterlacing / cadence detection should affect the throughput of the decode of the progressive clip, but that is what we observe for all the candidates except the 6570. Compared to VC-1 and H.264 decoding which decided the throughput of the video pipeline, MPEG-2 is much easier on the UVD / VPU engine. This is reflected in the fact that the video post processing brings down the throughput quite a bit on all the GPUs.

Moving onto interlaced streams, we will consider a 1080i H.264 clip first.

1080i H.264

As expected, deinterlacing definitely kicks in to lessen the throughput of the frames. Unlike the 1080p H.264 decode performance, we find that all the GPUs are now limited by how fast the post-processing can be done. This makes sense, since the UVD/VPU engine needs to operate for only half the usual horizontal resolution for interlaced content. Note that the 'frames per second' figure presented for the interlaced streams is actually 'fields per second' (a 1080i clip showing 29.97 fps with MediaInfo actually has 59.94 fields per second).

The interlaced MPEG-2 performance is as below:

1080i MPEG-2

Results are very similar to what we got for the interlaced H.264 clip. One can conclude that interlaced clips spend more time getting post-processed compared to the progressive clips, but that is hardly surprising.

We had noted earlier that DXVA2 / EVR wasn't enabled for interlaced VC-1 streams on any of the GPUs. However, with the checkactivate.dll hack (described in the LAV Splitter section), we were able to make Arcsoft Video Decoder appear in the list of codecs when the 'Check DirectShow / MediaFoundation Decoders' was used for interlaced VC-1 clips. Though it wasn't explicitly indicated that the support was DXVA2 using EVR, we did find that playing back the stream using EVR consumed almost nil CPU resources and kept the GPU / VPU engine quite busy. Presented below is the interlaced VC-1 performance using the Arcsoft Video Decoder in Total Media Theater v5.0.187

1080i VC-1

The takeaway from this section is that cards which run too close to the 60 fps limit with all post processing steps enabled should be avoided, unless there are some convincing reasons for that. The results also need to be taken in conjunction with the day-to-day usage experience. As mentioned before, the 6450 fails on both counts. The GT 520 fails the day-to-day usage test (deinterlacing performance). The GT 430 gets a recommendation despite weighing in at less than 60 fps for the 1080p H.264 stress stream. The 6570 is the hands down winner in this section. It is able to carry out all the post processing steps even when it is forced to process very stressful video streams.

Designing a HTPC GPU Evaluation Strategy GT 430 Bitrate Limitations and the GT 520 VDPAU Feature Set D
Comments Locked

70 Comments

View All Comments

  • enki - Monday, June 13, 2011 - link

    How about a short conclusion section for those who just use a Windows 7 box with a Ceton tuner card to watch hdtv in Windows Media Center? (i.e. will just be playing back WTV files recorded directly on the box)

    What provides the best quality output?

    What can stream better then stereo over HDMI? On my old 3400 ATI card it either streams the Dolby Digital directly (the computer doesn't do any processing of the audio) or can output stereo (doesn't think there can be more then 2 speakers connected)

    Thanks
  • BernardP - Monday, June 13, 2011 - link

    The inability to create and scale custom resolutions within AMD graphics drivers is, for me, a deal-breaker that keeps me from even considering AMD graphics. It will also keep me from Llano, Trinity and future AMD Fusion APU's. I'll stay with NVidia as long as they keep allowing for custom resolutions.

    My older eyes are grateful for the custom 1536 X 960 desktop resolution on my 24 inch 16:10 monitor. I couldn't create this resolution with AMD graphics drivers.
  • bobbozzo - Tuesday, June 14, 2011 - link

    In your case, you should just increase the size of the fonts and widgets instead of lowering the screen res.
  • Assimilator87 - Tuesday, June 14, 2011 - link

    I wish there was a section dedicated to the silent stream bug. I have a GTX 470 hooked up to an Onkyo TX-SR805 and this issue is driving me insane. For instance, does this issue only plague certain cards or do all nVidia suffer from it? I was hoping the latest WHQL driver (275.33) would fix this, but sadly, no. Otherwise, the article was amazing and I'll definitely have to check out LAV Splitter.
  • ganeshts - Tuesday, June 14, 2011 - link

    The problem with the silent stream bug is that one driver version has it, the next one doesn't and then the next release brings it back. It is hard to pinpoint where the issue is.

    Amongst our candidates, even with the same driver release, the GT 520 had the bug, but the GT 430 didn't. I am quite confident that the GT 520 issue will get resolved in a future update, but then, I can just hope that it doesn't break the GT 430.
  • JoeHH - Tuesday, June 14, 2011 - link

    This is simply one of the best articles I have ever seen about HTPC. Congrats Ganesh and thank you. Very informative and useful.
  • bobbozzo - Tuesday, June 14, 2011 - link

    Hi, Can you please compare hardware de-intelacing, etc., vs software?

    e.g. many players/codecs can do de-interlacing, de-noise, etc. in software, using the CPU.

    How does this compare with a hardware implementation?

    thanks
  • ganeshts - Tuesday, June 14, 2011 - link

    This is a good suggestion. Let me try that out in the next HTPC / GPU piece.
  • CiNcH - Wednesday, June 15, 2011 - link

    Hey guys,

    here is how I understand the refresh rate issue. It does not matter weather it is 0.005 Hz off. You can't calculate frame drops/repeats from that. In DirectShow, frames are scheduled with the graph reference clock. So the real problem is how much the clock which the VSync is based on and the reference clock in the DirectShow graph drift from each other. And here comes ReClock into play. It derives the DirectShow graph clock from the VSync, i.e. synchronizes the two. So it does not matter weather your VSync is off as long as playback speed is adjusted accordingly. A problem here is synchronizing audio which is not too easy if you bitstream it...
  • NikosD - Thursday, June 16, 2011 - link

    Nice guide but you missed something.
    It's called PotPlayer, it's free and has built-in almost everything.
    CPU & DXVA (partial, full) codecs and splitters for almost every container and every video file out there.
    The same is true for audio, too.
    It has even Pass through (S/PDIF, HDMI) for AC3/TrueHD/DTS, DTS-HD. Only EAC3 is not working.
    It has also support for madVR and a unique DXVA-renderless mode which combines DXVA & madVR!
    I think it's close to perfect!
    BTW, in the article says that there is no free audio decoder for DTS, DTS-HD.
    That's not correct.
    FFDShow is capable of decoding and pass through (S/PDIF, HDMI) both DTS and DTS-HD.
    And PotPlayer of course!

Log in

Don't have an account? Sign up now