Quick Sync Image Quality & Performance

Intel obviously focused on increasing GPU performance with Ivy Bridge, but a side effect of that increased GPU performance is more compute available for Quick Sync. As you may recall, Sandy Bridge's secret weapon was an on-die hardware video transcode engine (Quick Sync), designed to keep Intel's CPUs competitive when faced with the onslaught of GPU computing applications. At the time, video transcode seemed to be the most likely candidate for significant GPU acceleration so the move made sense. Plus it doesn't hurt that video transcoding is an extremely popular activity to do with one's PC these days.

The power of Quick Sync was how it leveraged fixed function decode (and some encode) hardware with the on-die GPU's EU array. The combination of the two resulted in some pretty incredible performance gains not only over traditional software based transcoding, but also over the fastest GPU based solutions as well.

Intel put to rest any concerns about image quality when Quick Sync launched, and thankfully the situation hasn't changed today with Ivy Bridge. In fact, you get a bit more flexibility than you had a year ago.

Intel's latest drivers now allow for a selectable tradeoff between image quality and performance when transcoding using Quick Sync. The option is exposed in Media Espresso and ultimately corresponds to an increase in average bitrate. To test image quality and performance, I took the last Harry Potter Blu-ray, stripped it of its DRM and used Media Espresso to make it playable on an iPad 2 (1024 x 768 preset).

In the case of our Harry Potter transcode, selecting the Better Quality option increased average bitrate from to 3.86Mbps to 5.83Mbps. The resulting file size for the entire movie increased from 3.78GB to 5.71GB. Both options produced a good quality transcode, picking one over the other really depends on how much time (and space) you have as well as the screen size of the device you'll be watching it on. For most phone/tablet use I'd say the faster performing option is ideal.

Intel Core i7 3770K (x86) Intel Quick Sync (SNB) Intel Quick Sync (IVB) Intel Quick Sync, Better (IVB) NVIDIA GeForce GTX 680 AMD Radeon HD 7970
original original original original original original

 

While AMD has yet to enable VCE in any publicly available software, NVIDIA's hardware encoder built into Kepler is alive and well. Cyberlink Media Espresso 6.5 will take advantage of the 680's NVENC engine which is why we standardized on it here for these tests. Once again, Quick Sync's transcoding abilities are limited to applications like Media Espresso or ArcSoft's Media Converter—there's still no support in open source applications like Handbrake.

Compared to the output from Quick Sync, NVENC appears to produce a softer image. However, if you compare the NVENC output to what we got from the software/x86 path you'll see that the two are quite similar. It seems that Quick Sync, at least in this case, is sharpening/adding more noise beyond what you'd normally expect. I'm not sure I'd call it bad, but I need to do some more testing before I know whether or not it's a good thing.

The good news is that NVENC doesn't pose any of the horrible image quality issues that NVIDIA's CUDA transcoding path gave us last year. For getting videos onto your phone, tablet or game console I'd say the output of either of these options, NVENC or Quick Sync, is good enough.

Unfortunately AMD's solution hasn't improved. The washed out images we saw last year, particularly in dark scenes prior to a significant change in brightness are back again. While NVENC delivers acceptable image quality, AMD does not.

The performance story is unfortunately not much different from last year either. The chart below is average frame rate over the entire encode process.

CyberLink Media Espresso 6.5—Harry Potter 8 Transcode

Just as we saw with Sandy Bridge, Quick Sync continues to be an incredible way to get video content onto devices other than your PC. One thing I wanted to make sure of was that Media Espresso wasn't somehow holding x86 performance back to make the GPU accelerated transcodes seem much better than they actually are. I asked our resident video expert, Ganesh, to clone Media Espresso's settings in a Handbrake profile. We took the profile and performed the same transcode, the result is listed above as the Core i7 3770K (Handbrake). You will notice that the Handbrake x86/x264 path is definitely faster than Cyberlink's software path, by over 50% to be exact. However even using Handbrake as a reference, Quick Sync transcodes over 2x faster.

In the tests below I took the same source and varied the output quality with some custom profiles. I targeted 1080p, 720p and 480p at decreasing average bitrates to illustrate the relationship between compression demands and performance:

CyberLink Media Espresso 6.5—Harry Potter 8 Transcode

CyberLink Media Espresso 6.5—Harry Potter 8 Transcode

CyberLink Media Espresso 6.5—Harry Potter 8 Transcode

Unfortunately NVENC performance does not scale like Quick Sync. When asked to preserve a good amount of data, both NVENC and Quick Sync perform similarly in our 1080p/13Mbps test. However ask for more aggressive compression ratios for lower resolution/bitrate targets, and the Intel solution quickly distances itself from NVIDIA. One theory is that NVIDIA's entropy encode block could be the limiting factor here.

Ivy Bridge's improved Quick Sync appears to be aided both by an improved decoder and the HD 4000's faster/larger EU array. The graph below helps illustrate:

If we rely on software decoding but use Intel's hardware encode engine, Ivy Bridge is 18% faster than Sandy Bridge in this test (1080p 13Mbps output from BD source, same as above). If we turn on both hardware decode and encode, the advantage grows to 29%. More than half of the performance advantage in this case is due to the faster decode engine on Ivy Bridge.

Power Consumption Final Words
POST A COMMENT

173 Comments

View All Comments

  • Shadowmaster625 - Monday, April 23, 2012 - link

    I would like to start using quicksync, but 2 mbps for a tablet is way too much for me. I just want to quickly take a video and transcode it. There is nothing quick about copying a 1+ gigabyte file onto a tablet or phone. It does no good to be able to transcode faster than you can even copy it LOL. Can quicksync go lower? I want no more than 800 kbps,400-600 ideally.

    Also, is it possible to transcode and copy at the same time? Is anyone doing that?
    Reply
  • BVKnight - Tuesday, April 24, 2012 - link

    When you mention "2 mbps," I think you are referring to the bitrate, which is generally synonymous with the quality of the encoding.

    "It does no good to be able to transcode faster than you can even copy" <---I think this is completely false. The transcoding is a separate file conversion step that creates the final version which you will move to your device. Your machine won't even start copying until transcoding is complete, which means that every little bit of speed you can add to the transcoding process will directly reduce the amount of time it takes to get your file on your device.

    Getting quicksync will make a huge difference for your encoding.
    Reply
  • ncrubyguy - Monday, April 23, 2012 - link

    "Features like VT-d and Intel TXT are once again reserved for regular, non-K-series parts alone."

    Why do they keep doing that?
    Reply
  • JarredWalton - Monday, April 23, 2012 - link

    Because those are mostly for business users, and business users don't overclock and thus don't need K-series. Reply
  • Old_Fogie_Late_Bloomer - Monday, April 23, 2012 - link

    I have a feeling that the real reason is that, if business users could get those features on a K-series processor, it would largely obviate the need/demand for SB-E. A 2600K/2700K overclocked up to, say, 4.5 GHz--which seems consistently achievable, even conservative--would compare very favorably to the 3930K, given the prices of both.

    Yes, I know you can overclock the 3930K, and yes, I know it has six cores and four memory controllers and more cache. But I bet that overclocked SB or IB with VT-d, &c., would make a lot of sense for a lot of applications, given price/performance considerations.
    Reply
  • piroroadkill - Monday, April 23, 2012 - link

    I'd be very interested in seeing overclocked 2500K and 2600K benchmarks tossed in, because lets be honest, one of those is the most popular CPU at the high end right now, and anyone with one has bumped it to at least 4.3GHz, often about 4.4-4.5.

    I think it would be nice to have a visual aid to see how that fares, but I understand the impracticality of doing so.
    Reply
  • Rasterman - Monday, April 23, 2012 - link

    Thank you for including this section, it is great. I think it would be more relevant for people though if it were a much smaller test. I think pretty much anyone is going to know that a project of that size is going to be faster with more cores and speed. What isn't so obvious though are smaller projects, where you are compiling only a few files and debugging. A typical cycle for almost all developers is: making changes, compiling, debugging to test them out. Even though you are only talking times of a few seconds, add this up to 100s-1000s of iterations per day and it makes a difference, I base my entire computer hardware selection around this workflow. For now I use the single threaded benchmarks you post as a guide. Reply
  • iGo - Monday, April 23, 2012 - link

    The features table has put me in a great dilemma. I'm very much interested in running multiple virtual machines on my desktop, for debugging and testing purposes. Although I won't be running these virtual boxes 24x7, it would be great to have processor support for any kind of hardware acceleration that I can get, whenever I fire up these for testing. On the other hand, ability to overclock the K series processor is really tempting, and yes, a decent/modest overclock of say, 4.2-4.5GHz sounds lovely for 24x7 use.

    Anyone using SNB/Intel processors with VT-d can share if its worth going for non-K processor to get better virtualization performance? To be more clear, my primary job involves web-application development with UX development. For which I require a varied testing under different browsers. Currently I've setup 4 different virtual machines on my desktop with different browsers installed on different windows OS versions. Although these machines will never run 24x7 and never all at once (max 2 at once when testing). Apart from that, I also do lot of photo editing (RAW files, Lightroom and works) and bit of video editing/encoding stuff on my dekstop, mostly personal projects, rarely commercial work). Is it better to opt for 3770 for better virtual machine performance or 3770k with chance to boost overall performance by overclocking?
    Reply
  • dcollins - Monday, April 23, 2012 - link

    At the moment, VT-d will not give you any additional performance on your VM's using desktop virtualization programs like VMware workstation or Virtualbox. Neither supports VT-d right now. Based on progress this year, I expect VT-d support is still be a year away in Virtualbox, which is what I use.

    VT-d doesn't help performance in general; instead, VT-d allows VMs to directly access computer hardware. This is essential for high performance networking on servers or for accessing certain hardware like sound cards where low latency is crucial. For your workload, the only advantage will be slightly higher network speeds using native drivers versus a bridged connection. It may facilitate testing GPU accelerated browsers in the future as well.

    If you plan on overclocking, the K series is worth loosing VT-d.
    Reply
  • iGo - Monday, April 23, 2012 - link

    Thanks, that helps a lot. I've been reading about and VT-d and your comment confirms where my thinking was going. I guess, 3770K it is then. :) Reply

Log in

Don't have an account? Sign up now