QuickSync Gets Open Source Support, Regresses in Quality

I have traditionally avoided touching upon QuickSync in any of my HTPC reviews. The main reason behind this was the fact that support only existed in commercial software such as MediaEspresso, and even that functionality was spotty at best. Limited source file type support as well as limited configuration options rendered these unusable for the power users. While full x264 acceleration using QuickSync is out of the question, the developers of HandBrake have come forward with support for QuickSync in their transcoding application.

The feature is still in beta (for example, only H.264 files are allowed as input right now, and cropping isn't working properly), but we took it out for a test drive. We took a m2ts file from a Blu-ray and compressed it with a target bitrate of 10 Mbps using x264 single pass (everything at default) as well as QuickSync. The time taken for compression as well as the average power consumption during the course of the process are tabulated below. Numbers are also provided for the same process using our passive Ivy Bridge HTPC (which has the HD4000 GPU).

H.264 Transcoding Performance
Transcoding Configuration Engine Power (W) FPS
       
1080p @ 36.2 Mbps to 1080p @ 10 Mbps QuickSync on HD4600 41.81 W 90.41
x264 on Core i7-4765T 67.93 W 51.66
QuickSync on HD4000 50.32 W 127.64
x264 on Core i3-3225 53.63 W 25.99
1080p @ 36.2 Mbps to 720p @ 7 Mbps QuickSync on HD4600 44.02 W 166.91
x264 on Core i7-4765T 65.37 W 32.88
QuickSync on HD4000 59.67 W 206.65
x264 on Core i3-3225 53.85 W 16.31

Fast and power-efficient transcoding is not the only requirement in the market. Video output quality is also very important. Encoder companies may present whitepapers with cherry-picked frame captures to show their efforts in good light. For all it is worth, the company's selected frame might be an I-frame, while the competitor's samples might be P or B-frames. PSNR is also presented as a metric indicating better quality. However, this is very unfair because encoders might be particularly tuned for PSNR but look bad when compared against the results of encoders tuned for, say, structural similarity (SSIM).

QuickSync is usually pretty fast, but the choice of bitrates in Handbrake seem to force it into one of the new modes in Haswell which actually regressed in both performance and image quality. This explains why the FPS on HD4000 is much  more than than on the HD4600. However, Haswell remains very power efficient. Anand had mentioned in passing about image quality degradation in QuickSync on Haswell in yesterday's review. I was also able to replicate it. Given below are 10 consecutive raw frames from the various encoders. Take a look and judge for yourself on the basis of how the encoders handle movement and whether there are any image artifacts in the encoder results.

In our opinion, the QuickSync results on HD4600 appear to be worse than what is obtained on the HD4000. With Haswell, Intel introduced seven levels of quality/performance settings that application developers can choose from. According to Intel, even the lowest quality Haswell QSV settings should be better than what we had with Ivy Bridge. In practice, this simply isn't the case. There's a widespread regression in image quality ranging from appreciably worse to equal at best with Haswell compared to Ivy Bridge. I'm not sure what's going on here but QuickSync remains one of the biggest missed opportunities for Intel over the past few years. The fact that it has taken this long to get Handbrake support going is a shame. Now that we have it, the fact that Intel seems to have broken image quality is the icing on a really terrible cake.

For users looking for the best quality transcodes, software based x264 can deliver better output with tweaked options two-pass encodes (such flexibilities are just not available with the QuickSync encoder). The big attraction to QuickSync remains low CPU utilization (< 10% in many cases) while you transcode. The image quality produced by Haswell's seemingly broken QSV implementation is still good enough for use on smartphones and tablets, it's just a step in the wrong direction.

4K for the Masses Power Consumption
POST A COMMENT

95 Comments

View All Comments

  • HisDivineOrder - Tuesday, June 04, 2013 - link

    I've heard this song and dance before. It never happens. Plus, limiting people to GDDR5 of pre-determined amounts for a HTPC seems like an exercise in being stupid. Reply
  • Spunjji - Tuesday, June 04, 2013 - link

    Yeah, I'm not buying that rumour. Doesn't make much sense. Reply
  • JDG1980 - Sunday, June 02, 2013 - link

    It's good to see that Intel finally got around to fixing the 23.976 fps bug, which was the biggest show-stopper for using their integrated graphics in a HTPC.

    Regarding MadVR, I'd be interested to see more benchmarks. How good can you run the settings before hitting a wall with GPU utilization? How about on the GT3e - if this ever shows up in an all-in-one Mini-ITX board or NUC, it might be a great choice for HTPCs. Can it handle the good scaling algorithms?

    My own experience is that anti-ringing doesn't add that much GPU load. I recently upgraded to a Radeon HD 7750, and it can handle anti-ringing filters on both luma and chroma with no problem. Chroma upscaling works fine with 3-tap Jinc, and luma also can do this with SD content (even interlaced), but for the most demanding test clip I have (1440x1080 interlaced 60 fields per second) I have to downgrade luma scaling to either Lanczos 3-tap or SoftCubic 80 to avoid dropping frames. (The output destination is a 1080p TV.) I suspect a 7790 or 7850 could handle 3-tap Jinc for both chroma and luma at all resolutions and frame rates up to full HD.

    By the way, I found a weird problem with madVR - when I ran GPU-Z in the background to monitor load, all interlaced content dropped frames. Didn't matter what settings I used. Closing GPU-Z ended the problem. I was still able to monitor GPU load with Microsoft's "Process Explorer" application and this did not cause any problems.

    Regarding 4K output, did you test whether DisplayPort 60 Hz 4K works properly? This might be of interest to some users, especially if the upcoming Asus 4K monitor is released at a reasonable price point. I know people have had to use some odd tricks to get the Sharp 4K monitor to do native resolution at 60 Hz with existing cards.
    Reply
  • ganeshts - Monday, June 03, 2013 - link

    This is very interesting.. What version of GPU-Z were you using? I will check whether my Jinc / anti-ringing dropped frames were due to GPU-Z running in the background. I did do the initial setup when GPU-Z wasn't active, but obviously the benchmark runs were run with GPU-Z active in the background. Did you see any difference in GPU load between GPU-Z and Process Explorer when playing interlaced content with dropped frames? Reply
  • JDG1980 - Monday, June 03, 2013 - link

    I was using the latest version (0.7.1) of GPU-Z. The strange part is that the GPU load calculation was correct - it was just dropping frames for no reason, it wasn't showing the GPU as being maxed out. For the video card, I was using the newest stable Catalyst driver (13.4, I believe) from AMD's website. The OS is Windows 7 Ultimate (64-bit).

    The only reason I suspected GPU-Z is because after searching a bunch of forums to try to find out why interlaced content (even SD with low madVR settings) wouldn't play properly, I found one other user who said he had to turn off GPU-Z. I cannot say if this is a widespread issue and it's possible it may be limited to certain system configurations or certain GPUs. Still worth trying, though. Thanks for the follow-up!
    Reply
  • tential - Sunday, June 02, 2013 - link

    I don't understand the H.264 Transcoding Performance chart at all can someone help?

    QuickSync does more FPS at 720p than 1080p. This makes sense.

    The x264 on the Core i3 and core i7 post higher FPS in 1080p but lower in 720p. Why is this?
    Reply
  • ganeshts - Monday, June 03, 2013 - link

    Maybe the downscaling of the frame from 1080p to 720p sucks up more resources, causing the drop in FPS? Remember that the source is 1080p... Reply
  • tential - Monday, June 03, 2013 - link

    Ok so if I'm downscaling to 720p, why does FPS increase with quicksync, but decrease with the processor?

    It's OPPOSITE directions one increases (quicksync) one decreases (cpu). Wouldn't it be the same both ways?
    Reply
  • ganeshts - Monday, June 03, 2013 - link

    Downscaling is also hardware accelerated in QS mode. Hardware transcode is faster for 720p decoded frames rather than 1080p decoded frames. The time taken to downscale is much lower than the time taken to transcode the 'extra pixels' in a 1080p version. Reply
  • elian123 - Monday, June 03, 2013 - link

    Ganesh, you mention "The Iris Pro 5200 GPUs are reserved for BGA configurations and unavailable to system builders". Does that imply that there won't be motherboards for sale with the 4770R integrated? Will the 4770R only be available in complete systems? Reply

Log in

Don't have an account? Sign up now