HTPC Decoding and Rendering Benchmarks: EVR / EVR-CP

In our Ivy Bridge HTPC review, we had covered the CPU / GPU utilization during playback of various types of clips. In the Vision3D 252B review, we had graphs of CPU and GPU loading with various renderers and codecs. Unfortunately, AMD doesn't provide similar data / sensors for use with their APUs. Hence, we had to resort to power consumed at the wall along with GPU loading in the Trinity HTPC review. In order to keep benchmarking consistent across all HTPC reviews, we started adopting the Trinity HTPC review methodology starting with the review of the ASRock Vision HT.

The tables below present the results of running our HTPC rendering benchmark samples through various decoders when using the Enhanced Video Renderer / Enhanced Video Renderer (Custom Presenter) (EVR / EVR-CP). Entries in bold indicate that there were dropped frames which indicate that the unit wasn't up to the task for those types of streams. Fortunately, none of the streams presented any problem to the system and there were no dropped frames. The recorded values include the GPU loading and power consumed by the system at the wall when playing back the streams using MPC-HC v1.6.5.6366 and LAV Filters 0.54.

Enhanced Video Renderer (EVR)

The Enhanced Video Renderer is the default renderer made available by Windows 8. It is a lean renderer in terms of usage of system resources since most of the aspects are offloaded to the GPU drivers directly. EVR is mostly used in conjunction with native DXVA2 decoding.

LAV Video Decoder (DXVA2 Native) + EVR
Stream GPU Usage % Power Consumption
     
480i60 MPEG-2 24.05 35.04
576i50 H.264 21.38 36.06
720p60 H.264 26.13 36.6
1080i60 H.264 28.9 39.95
1080i60 MPEG-2 28.19 37.06
1080i60 VC-1 31.23 45.57
1080p60 H.264 30.11 37.09

The GPU is not taxed much by the EVR despite hardware decoding also taking place. Deinterlacing and other post processing aspects were left at the default settings in the Intel HD Graphics Control Panel (and these are applicable when EVR is chosen as the renderer)

Enhanced Video Renderer - Custom Presenter (EVR-CP)

EVR-CP is the default renderer used by MPC-HC. It is usually used in conjunction with MPC-HC's video decoders, some of which are DXVA-enabled. However, for our tests, we used the DXVA2 mode provided by the LAV Video Decoder.

LAV Video Decoder (DXVA2 Native) + EVR-CP
Stream GPU Usage % Power Consumption
     
480i60 MPEG-2 26.69 38.78
576i50 H.264 24.43 37.88
720p60 H.264 32.76 40.4
1080i60 H.264 40.16 42.02
1080i60 MPEG-2 39.75 41.62
1080i60 VC-1 40.99 48.45
1080p60 H.264 41.33 42

In addition to DXVA2 Native, we also used the QuickSync decoder developed by Eric Gur (an Intel applications engineer) and made available to the open source community. It makes use of the specialized decoder blocks available as part of the QuickSync engine in the GPU.

LAV Video Decoder (QuickSync / DXVA2 Copy-Back) + EVR-CP
Stream GPU Usage % Power Consumption
     
480i60 MPEG-2 27.16 38.42
576i50 H.264 25.26 38.05
720p60 H.264 36.84 41.6
1080i60 H.264 44.2 43.41
1080i60 MPEG-2 44.32 43.02
1080i60 VC-1 43.56 43.26
1080p60 H.264 48.28 45.13

In general, using the QuickSync decoder results in a higher power consumption because the decoded frames are copied back to the DRAM before being sent to the renderer. Using native DXVA decoding, the frames are directly passed to the renderer without the copy-back step. The odd-man out in the power numbers is the interlaced VC-1 clip, where QuickSync decoding is around 5W more efficient compared to 'native DXVA2'. This is because there is currently no support in the open source native DXVA2 decoders for interlaced VC-1, and hence,  it is done in software [Clarification: This restriction is only on Intel GPUs. On both AMD and NVIDIA cards, DXVA2 native decode acceleration is supported for all VC-1 streams]. On the other hand, the QuickSync decoder is able to handle it with the VC-1 bitstream decoder in the GPU.

 

Refresh Rate Handling HTPC Decoding and Rendering Benchmarks: madVR
Comments Locked

138 Comments

View All Comments

  • HighTech4US - Sunday, January 20, 2013 - link

    Agree, I see no other overall complete platform that would be better (or even equal) for a 4-OTA Tuner DVR with unlimited storage (only limited by disk size) with free EPG that Windows 7 Media Center provides.

    And by tricking out 7MC with MediaBrowser, MediaControl, SHARK007 Codecs I have a complete on demand system that can play any type of media.

    I use MediaCenterMaster to get program meta information, backdrops and thumbnails for MediaBrowser.

    I also use MakeMKV to rip my DVD's and VideoReDo TVSuite h.264 to edit recorded TV shows and convert them to H.264 MKV's.

    Oh and 7MC can show your digital pictures as a slide show on your big screen with background music.

    I also love the screen saver where it shows random pictures from your picture library then zooms to one (or more) from a folder. When I first got this enabled the wife spent 45 minutes just watching the screen saver.
  • powerarmour - Monday, January 21, 2013 - link

    Agreed, WMC is only EPG based Tuner app that can correctly use Freeview HD DVB-T2 Tuners in the UK, there are no other usable HTPC alternatives.
  • psuedonymous - Sunday, January 20, 2013 - link

    Question: why was the obsolete 2-pass method used instead of the faster (and more common) CRF? Was the encoding benchmark intended as an artificial CPU-stressing benchmark rather than a 'real world' encoding benchmark?
  • ganeshts - Sunday, January 20, 2013 - link

    Hmm.. that is what Graysky's benchmark does, and it keeps the setting consistent across different systems when you want to see how much better or worse your system is, when compared to someone else's.

    FWIW, pass 1 stresses the memory subsystem, while pass 2 stresses the CPU.
  • ganeshts - Sunday, January 20, 2013 - link

    Thanks for the info. I was looking at the FAQ hosted by TechARP here: http://www.techarp.com/showarticle.aspx?artno=442&... ;

    Also, look at Ian's test with various memory speeds here using the same processor (last section on this page):

    http://www.anandtech.com/show/6372/memory-performa...

    There is definitely an impact on pass 1 performance using different memory speeds and the impact is more than on pass 2.
  • Iketh - Sunday, January 20, 2013 - link

    Why is Prime95 v25.9 used? That is grossly outdated. The latest official 27.7 is needed to tax Ivy Bridge with AVX instructions. All those temps and watts you got will increase significantly. Please revise your Prime95. An oversight like this is unacceptable.

    Not to mention the latest Intel compilers have been implementing AVX instructions for like 6+ months now even if the programmer didn't specifically write for it. AND Handbrake has been using AVX in about that same timeframe and is only increasing.....
  • ganeshts - Sunday, January 20, 2013 - link

    I will definitely do some experiments with the new Prime95 and report back.
  • ganeshts - Monday, January 21, 2013 - link

    I repeated the CPU loading with the latest Prime95 (v27.7):

    http://i.imgur.com/lK0zqjR.png

    The readings didn't go up significantly, but, yes, there is an increase. The power consumption at the wall increased from 58.25 to 62.56 W.

    Thanks for bringing this to our attention, and we will make sure future reviews use the updated Prime95.
  • ganeshts - Monday, January 21, 2013 - link

    Oh, but, with full GPU and CPU loading (using Furmark 1.10.3 - latest), the power at the wall is only 89.77 W (compared to 88.75 W earlier). The ~40 W / ~15W TDP distribution between the CPU and the GPU still remains the same.

    http://i.imgur.com/soCGAyk.jpg

    I don't expect the steady state temperatures to be that different because the power increase at the wall is only 1 W.
  • ganeshts - Sunday, January 20, 2013 - link

    Yes, the scaling algorithms affect the performance a lot.

    That is why I mentioned that we used the default settings: Bicubic with sharpness 75 for chroma (no anti-ringing filter), Lanczos 3-tap for image upscaling / Catmull-Rom for image downscaling (no anti-ringing filter or linear light scaling),

    We will look at other scaling algorithms and their performance on the HD 4000 / GT 640 / AMD 7750 in the third part of the HTPC series.

    Also, a note that if you are using HD 4000 (or any other Intel HD Graphics), I would strongly suggest looking at DXVA Scaling. Users might be surprised at the quality delivered without taxing the GPU too much.

Log in

Don't have an account? Sign up now