HTPC Aspects : Decoding & Rendering Benchmarks

Our decoding and rendering benchmarks consists of standardized test clips (varying codecs, resolutions and frame rates) being played back through MPC-HC v1.7.3 (which comes with LAV Filters 0.60.1.5 in-built). GPU usage is tracked through GPU-Z logs and power consumption at the wall is also reported. The former provides hints on whether frame drops could occur, while the latter is an indicator of the efficiency of the platform for the most common HTPC task - video playback. Starting with this review, we have added two new streams to our benchmark suite. The first one is a 1080p24 H.264 clip (the type of content that most HTPC users watch), while the second one is a 2160p30 (4Kp30) H.264 clip (which will give us a way to test the downscaling performance of various codec / renderer combinations).

In the course of our testing, we found that our standard 1080p60 H.264 clip played with lots of artifacts on the GT 750Ti. This happened with both MPC-HC and CyberLink PowerDVD13. Using the same drivers on the GT 640 resulted in perfect playback. [Update: NVIDIA got back to us indicating that this is a Maxwell-related driver issue. We are waiting for new drivers]

It will be interesting to determine the reason behind this issue. Not all 1080p60 clips had this problem, though. On the positive side, both the GT 750Ti and the GT 640 (as expected) were able to decode UHD / 4K streams using the GPU. The 7750 fell back to software decode (avcodec) for those streams despite the relevant setting being ticked in the LAV Video Decoder configuration.

Before proceeding to the renderer benchmark numbers, it is important to explain the GPU loading numbers in the tables below. It goes without saying that the GPU loading of NVIDIA cards must obviously not be compared directly to the AMD card. Even amongst the NVIDIA cards, the loading numbers don't signify the same thing. The GPU load numbers reported by GPU-Z don't take into consideration the core clock. Maxwell GPUs have more fine-grained clock control. For example, when playing back 4Kp30 material, the 750 Ti's core clock is around 824 MHz, but, when playing 1080p24 material, it scales down to 135 MHz. Kepler, on the other hand, seems to use 824 MHz when playing back both 4Kp30 and 1080p24 material. For 480i, it goes down to 324 MHz. In terms of GPU loading on the GTX 750 Ti, we find 4Kp30 playback reporting a load of 2.65%, while 1080p60 reports 46% under EVR. The 2% loading is under much higher core clocks compared to the clock being used for 1080p60 playback. For the GT 640, this 'disconnect' is much harder to observe, since the clocks are same for most HD material. However, in the GT 640 segment of the screenshot below, it is possible to observe a higher GPU load of 34% for 480i60 material (the third part) compared to a lower value at higher clocks for 1080p24 material.

GPU-Z 0.7.7 Sensor Readings - Fine-grained clock control in Maxwell (4Kp30 and 1080p24 playback) compared to Kepler (4Kp30, 1080p24 and 480i60 playback). Core-clock / Load numbers 'disconnect' can be observed in both cases for Maxwell, but only in the 480i60 case for Kepler.

In any case, if the GPU usage is hovering above 95%, it is likely that the playback suffered from dropped frames. In terms of apples-to-apples comparison for efficiency purposes, the power consumption at the wall reigns supreme.

Enhanced Video Renderer (EVR)

The Enhanced Video Renderer is the default renderer made available by Windows 8.1. It is a lean renderer in terms of usage of system resources since most of the aspects are offloaded to the GPU drivers directly. EVR is mostly used in conjunction with native DXVA2 decoding. The GPU is not taxed much by the EVR despite hardware decoding also taking place. In our evaluation, all video post processing steps were left for MPC-HC to decide (except for the explicit activation of inverse telecine). In all our tests, we used the native DXVA2 decoder provided by MPC-HC's internal LAV Video Decoder. Deinterlacing mode was set to aggressive in the LAV Video Decoder setting. The GT 750Ti's VPU loading barely went above 40% even when decoding 1080p60 or 4Kp30 clips.

Enhanced Video Renderer (EVR) Performance
Stream GTX 750 Ti GT 640 HD 7750
  GPU Load (%) Power GPU Load (%) Power GPU Load (%) Power
480i60 MPEG2 44.67 57.15 W 20.92 68.74 W 14.76 68.42 W
576i50 H264 55.57 57.25 W 19.28 69.37 W 12.16 69.01 W
720p60 H264 38.91 56.75 W 36.05 61.08 W 9.90 68.16 W
1080i60 MPEG2 80.92 59.53 W 32.76 71.27 W 15.06 69.03 W
1080i60 H264 55.87 63.34 W 35.79 73.11 W 18.78 71.21 W
1080i60 VC1 79.29 60.69 W 35.07 72.63 W 18.91 70.97 W
1080p60 H264 45.53 57.67 W 39.29 61.91 W 11.87 69.02 W
1080p24 H264 15.69 55.06 W 15.61 58.26 W 4.62 67.47 W
4Kp30 H264 2.65 63.89 W 24.21 67.33 W 11.36 76.90 W

 

Enhanced Video Renderer - Custom Presenter (EVR-CP)

EVR-CP is the default renderer used by MPC-HC. It is slightly more resource intensive compared to EVR, as some explicit post processing steps are done on the GPU without going through DXVA post processing API calls provided by the driver.

Enhanced Video Renderer - Custom Presenter (EVR-CP) Performance
Stream GTX 750 Ti GT 640 HD 7750
  GPU Load (%) Power GPU Load (%) Power GPU Load (%) Power
480i60 MPEG2 61.58 58.99 W 18.97 69.22 W 11.99 69.93 W
576i50 H264 55.45 57.93 W 17.97 68.81 W 9.93 69.85 W
720p60 H264 54.18 58.88 W 47.97 63.17 W 12.54 70.93 W
1080i60 MPEG2 17.69 68.38 W 39.84 73.85 W 22.82 72.01 W
1080i60 H264 16.92 70.14 W 42.62 74.35 W 21.97 73.43 W
1080i60 VC1 17.45 69.77 W 41.79 73.99 W 22.03 73.56 W
1080p60 H264 56.5 60.07 W 19.80 70.64 W 13.36 71.61 W
1080p24 H264 25.61 56.83 W 23.80 60.36 W 9.68 69.20 W
4Kp30 H264 5.52 67.11 W 27.51 70.76 W 26.10 84.03 W

 

Experimenting with madVR

madVR provides plenty of options to tweak. For our evaluation, we considered two main scenarios. Our first run was with the default settings ( Chroma upscaling: Bicubic with Sharpness 75, Image upscaling: Lanczos 3-tap and Image downscaling: Catmull-Rom). With these settings, both the GT 640 and 750Ti processed all our test clips without dropping frames. The HD 7750 failed with the 720p60 and 1080p60 clips.

madVR (Default Settings) Performance
Stream GTX 750 Ti GT 640 HD 7750
  GPU Load (%) Power GPU Load (%) Power GPU Load (%) Power
480i60 MPEG2 76.02 62.27 W 28.77 73.68 W 20.91 74.76 W
576i50 H264 73.21 62.10 W 30.93 74.24 W 20.88 75.40 W
720p60 H264 19.34 69.89 W 35.18 75.42 W 25.11 78.46 W
1080i60 MPEG2 23.16 71.08 W 49.53 77.78 W 27.74 78.22 W
1080i60 H264 24.87 71.79 W 52.27 78.26 W 28.13 79.67 W
1080i60 VC1 24.47 71.06 W 51.48 77.74 W 27.88 79.18 W
1080p60 H264 20.49 70.43 W 42.30 76.45 W 29.72 79.16 W
1080p24 H264 41.70 59.20 W 43.98 63.41 W 14.03 72.08 W
4Kp30 H264 27.51 73.24 W 66.72 81.54 W 23.06 100.94 W

The second run was with our stress settings (Chroma and image upscaling : Jinc 3-tap with anti-ringing filter activated, Image downscaling : Lanczos 3-tap with anti-ringing filter activated). With these settings, the GT 750Ti was able to process all test clips without dropping frames. However, the GT 640 failed the 576i50 / 720p60 / 1080i60 / 4Kp30 clips. The HD 7750 failed the 720p60, 1080p60 and 4Kp30 clips.

madVR (Stress Settings) Performance
Stream GTX 750 Ti GT 640 HD 7750
  GPU Load (%) Power GPU Load (%) Power GPU Load (%) Power
480i60 MPEG2 50.53 76.35 W 90.48 88.77 W 70.38 89.99 W
576i50 H264 55.08 76.92 W 95.09 92.75 W 80.21 91.65 W
720p60 H264 63.65 84.37 W 96.82 93.72 W 92.64 95.85 W
1080i60 MPEG2 51.29 76.43 W 95.93 89.86 W 63.32 88.58 W
1080i60 H264 52.65 77.06 W 94.9 90.63 W 64.26 89.64 W
1080i60 VC1 51.71 77.33 W 96.86 90.31 W 64.28 89.09 W
1080p60 H264 54.43 77.92 W 96.63 91.71 W 73.20 92.09 W
1080p24 H264 76.58 62.23 W 38.04 75.26 W 24.82 77.68 W
4Kp30 H264 77.52 99.33 W 99 101.13 W 95.71 117.07 W

As entry level HTPC GPUs become more and more powerful, madVR keeps pushing the bar higher too. Recently, NNEDI3 was added as an upscaling algorithm option. In our experiments with a 1080p display output, NNEDI3 and Jinc 3-tap (for chroma and luma upscaling) work for 1080p24 or lower resolution / frame rate clips in the 750Ti and 7750, but not in the GT 640.  With NNEDI3, the NVIDIA driver is a bit buggy, with a greenish tinge all through. Any higher resolution / frame rate immediately chokes. Jinc 3-taps works fine, though. 4K to 1080p downscaling results in greenish screens intermittently, finally ending up with a resetting Direct 3D Device failure. The downscaling path seems to be buggy, either due to driver issues or bugs in madVR v0.87.4.

HTPC Aspects : Network Streaming Performance HTPC Aspects : Miscellaneous Factors
Comments Locked

177 Comments

View All Comments

  • Harry Lloyd - Tuesday, February 18, 2014 - link

    20 nm Maxwell will be epic. Gimme.
  • TheinsanegamerN - Tuesday, February 18, 2014 - link

    Imagine. OCed Geforce 690 level performance, out of a single chip, with 8 GB of RAM on a 512 bit bus, pulling the same amount of power as a geforce 770. One can dream....
  • ddriver - Tuesday, February 18, 2014 - link

    LOL, epic? Crippling FP64 performance further from 1/24 to 1/32 - looks like yet another nvidia architecture I'll be skipping due to abysmal compute performance per $ ratio...
  • JDG1980 - Tuesday, February 18, 2014 - link

    This card is designed for gaming and HTPC. Only a tiny fraction of users need FP64.
  • nathanddrews - Tuesday, February 18, 2014 - link

    So I guess we'll have to wait for the 750TIB before we can see SLI benchmarks. Two of these would be within reach of 770 while using considerably less power. Hypothetically, that is.
  • ddriver - Tuesday, February 18, 2014 - link

    You do realize the high end GPUs on the same architecture will have the same limitation?
  • Morawka - Tuesday, February 18, 2014 - link

    I thought the higher end Maxwell cards will have Denver/aRM cores on the PCB as well.
  • Mr Perfect - Wednesday, February 19, 2014 - link

    It might be a software/firmware limitation though. From what the compute enthusiasts have said, the only difference between the Titan's full compute and 780Ti's cut down compute is firmware based. They've got the same chip underneath, and some people hack their 780s for full compute. They're probably doing the same thing with the Maxwell stack.
  • chrnochime - Wednesday, February 19, 2014 - link

    Got link for the hack? Sounds interesting.
  • Mr Perfect - Thursday, February 20, 2014 - link

    I don't myself, but if you're interested look up IvanIvanovich over at bit-tech.net. He was talking about vbios mods and resistor replacement tweaks that can do that.

Log in

Don't have an account? Sign up now