Quick Sync Image Quality & Performance

Intel obviously focused on increasing GPU performance with Ivy Bridge, but a side effect of that increased GPU performance is more compute available for Quick Sync. As you may recall, Sandy Bridge's secret weapon was an on-die hardware video transcode engine (Quick Sync), designed to keep Intel's CPUs competitive when faced with the onslaught of GPU computing applications. At the time, video transcode seemed to be the most likely candidate for significant GPU acceleration so the move made sense. Plus it doesn't hurt that video transcoding is an extremely popular activity to do with one's PC these days.

The power of Quick Sync was how it leveraged fixed function decode (and some encode) hardware with the on-die GPU's EU array. The combination of the two resulted in some pretty incredible performance gains not only over traditional software based transcoding, but also over the fastest GPU based solutions as well.

Intel put to rest any concerns about image quality when Quick Sync launched, and thankfully the situation hasn't changed today with Ivy Bridge. In fact, you get a bit more flexibility than you had a year ago.

Intel's latest drivers now allow for a selectable tradeoff between image quality and performance when transcoding using Quick Sync. The option is exposed in Media Espresso and ultimately corresponds to an increase in average bitrate. To test image quality and performance, I took the last Harry Potter Blu-ray, stripped it of its DRM and used Media Espresso to make it playable on an iPad 2 (1024 x 768 preset).

In the case of our Harry Potter transcode, selecting the Better Quality option increased average bitrate from to 3.86Mbps to 5.83Mbps. The resulting file size for the entire movie increased from 3.78GB to 5.71GB. Both options produced a good quality transcode, picking one over the other really depends on how much time (and space) you have as well as the screen size of the device you'll be watching it on. For most phone/tablet use I'd say the faster performing option is ideal.

Intel Core i7 3770K (x86) Intel Quick Sync (SNB) Intel Quick Sync (IVB) Intel Quick Sync, Better (IVB) NVIDIA GeForce GTX 680 AMD Radeon HD 7970
original original original original original original

 

While AMD has yet to enable VCE in any publicly available software, NVIDIA's hardware encoder built into Kepler is alive and well. Cyberlink Media Espresso 6.5 will take advantage of the 680's NVENC engine which is why we standardized on it here for these tests. Once again, Quick Sync's transcoding abilities are limited to applications like Media Espresso or ArcSoft's Media Converter—there's still no support in open source applications like Handbrake.

Compared to the output from Quick Sync, NVENC appears to produce a softer image. However, if you compare the NVENC output to what we got from the software/x86 path you'll see that the two are quite similar. It seems that Quick Sync, at least in this case, is sharpening/adding more noise beyond what you'd normally expect. I'm not sure I'd call it bad, but I need to do some more testing before I know whether or not it's a good thing.

The good news is that NVENC doesn't pose any of the horrible image quality issues that NVIDIA's CUDA transcoding path gave us last year. For getting videos onto your phone, tablet or game console I'd say the output of either of these options, NVENC or Quick Sync, is good enough.

Unfortunately AMD's solution hasn't improved. The washed out images we saw last year, particularly in dark scenes prior to a significant change in brightness are back again. While NVENC delivers acceptable image quality, AMD does not.

The performance story is unfortunately not much different from last year either. The chart below is average frame rate over the entire encode process.

CyberLink Media Espresso 6.5—Harry Potter 8 Transcode

Just as we saw with Sandy Bridge, Quick Sync continues to be an incredible way to get video content onto devices other than your PC. One thing I wanted to make sure of was that Media Espresso wasn't somehow holding x86 performance back to make the GPU accelerated transcodes seem much better than they actually are. I asked our resident video expert, Ganesh, to clone Media Espresso's settings in a Handbrake profile. We took the profile and performed the same transcode, the result is listed above as the Core i7 3770K (Handbrake). You will notice that the Handbrake x86/x264 path is definitely faster than Cyberlink's software path, by over 50% to be exact. However even using Handbrake as a reference, Quick Sync transcodes over 2x faster.

In the tests below I took the same source and varied the output quality with some custom profiles. I targeted 1080p, 720p and 480p at decreasing average bitrates to illustrate the relationship between compression demands and performance:

CyberLink Media Espresso 6.5—Harry Potter 8 Transcode

CyberLink Media Espresso 6.5—Harry Potter 8 Transcode

CyberLink Media Espresso 6.5—Harry Potter 8 Transcode

Unfortunately NVENC performance does not scale like Quick Sync. When asked to preserve a good amount of data, both NVENC and Quick Sync perform similarly in our 1080p/13Mbps test. However ask for more aggressive compression ratios for lower resolution/bitrate targets, and the Intel solution quickly distances itself from NVIDIA. One theory is that NVIDIA's entropy encode block could be the limiting factor here.

Ivy Bridge's improved Quick Sync appears to be aided both by an improved decoder and the HD 4000's faster/larger EU array. The graph below helps illustrate:

If we rely on software decoding but use Intel's hardware encode engine, Ivy Bridge is 18% faster than Sandy Bridge in this test (1080p 13Mbps output from BD source, same as above). If we turn on both hardware decode and encode, the advantage grows to 29%. More than half of the performance advantage in this case is due to the faster decode engine on Ivy Bridge.

Power Consumption Final Words
POST A COMMENT

173 Comments

View All Comments

  • frozentundra123456 - Monday, April 23, 2012 - link

    According to the Asus review just out by Anand, the Intel HD4000 and AMD HD6620 are essentially even in the mobile space, where it really matters. I dont know where you are getting the "soundly trounces" description, unless you are talking about the desktop. I dont really care about integrated graphics on the desktop, it is just too easy to add a discrete card that soundly trounces either Intel or AMD integrated. I have no doubt that AMD will regain the lead in the mobile space when Trinity comes out. I just question that they will make the kind of improvements that are being speculated about.

    I also find it ironic that so many people are criticizing IVB for lack of cpu improvement while in the same breath saying bulldozer is OK because it is "good enough" already.
    Reply
  • DanNeely - Monday, April 23, 2012 - link

    Primarily Einstein@Home. Reply
  • fastman696 - Monday, April 23, 2012 - link

    Thanks for the review, but this is new Tech, why use old Tech chipset? Reply
  • JarredWalton - Monday, April 23, 2012 - link

    You're being deliberately obtuse in order to set up a straw man.

    Me: "As I note in the mobile IVB article, mobile Llano GPU performance isn't nearly as impressive relative to IVB as on the desktop."

    You: "The mobile variant of the part that launched last year isn't as dominant over the part that just launched today as the desktop variant is?"

    In other words, you want us to compare to a product that's not out because the current product doesn't look good. I mention Trinity already, but you act as though I miss it. Then you throw out stuff like, "Thanks for resorting to namecalling" when you've already been insulting with your comments since the get go. "Sad to see this kind of crap coming from Anandtech." "I guess Anandtech's standards have drastically lowered." Put another way, you're already calling me an idiot but doing it indirectly. But let's continue....

    How much faster can you do Flash video when it's already accelerated and working properly in Sandy Bridge? Web browsers are basically in the same boat, unless you can name major web sites that a lot of users visit where HD 3000/4000 is significantly worse than the competition.

    Does Photoshop benefit from GPUs? Sure, and lots of people use that, including me, but the same people that use Photoshop are also the people who need more than Llano CPU performance, and more than HD 4000 or Llano or Trinity GPU performance. I'm running Bloomfield with a GTX 580, which is more than 95% of users out there. Most serious Photoshop users that I know use quad-core Intel with some form of NVIDIA graphics for a reason. But even running on straight Sandy Bridge with HD 3000, Photoshop runs faster than on Llano with HD 6620G.

    Vegas, naturally, is in the same category as video transcoding. I suppose I could have said "video editing/transcoding" just to be broader. There are tons of people that don't do video editing/transcoding. Even for those that do, NVIDIA GPUs are doing far better than AMD GPUs, and NVIDIA + Intel CPU is still the platform to beat. If you want quality, though, encoding is still done in software running on the CPU; Premiere for instance really just leverages the GPU to help with the "quick preview" videos, not for final rendering (unless something has changed since the last time I played with it).

    So let's try again: what exactly are the areas where Intel's Ivy Bridge and HD 4000 fall short, where AMD's Llano (or the upcoming Trinity) are going to be substantially better? All without adding a discrete GPU. Llano is equal to HD 4000 for gaming, and seriously behind on the CPU department. There are still areas where AMD's drivers are much better than Intel's drivers, and there are certain tasks (shader and geometry) where AMD is better. Really, though, the only area where Intel doesn't compete is in strictly budget laptops.
    Reply
  • chizow - Monday, April 23, 2012 - link

    Yes I have heard of a "tick", and IVB has manifested itself as a tick+ as indicated in the article which means we are basically on the 3rd generation of the same architecture introduced with Nehalem in late 2008 with some minor bumps in clockspeed/Turbo modes and overclocking headroom.

    Both Conroe and Nehalem were pretty huge jumps in performance only 2.5 years apart on one of Intel's Tick Tock cadence cycles and since then, nothing remotely as interesting.

    Maybe you should be asking yourself why you aren't expecting bigger performance gains? Or maybe you're still reveling and ogling over Tahiti's terrible price:performance gains in the GPU space? :D
    Reply
  • JarredWalton - Monday, April 23, 2012 - link

    Yes, because that extra 10W TDP makes all the difference, doesn't it? 45W Llano parts aren't shipped in very many laptops because the OEMs aren't interested. Just look at Newegg as an example:
    http://www.newegg.com/Product/ProductList.aspx?Sub...

    There is one current A8 APU faster than the A8-3520M for sale at Newegg, and it has an A8-3510MX. AMD's own list isn't much better (http://shop.amd.com/us/All/Search?NamedQuery=visio... there's one more notebook there with an A8-3530MX. So that's why we looked at A8-3520M, but if I had an MX chip I would certainly run the same tests -- no one has been willing to send us such a laptop, unfortunately.

    But even if we got an MX chip, their GPUs are still clocked the same as the A8-3500M/A8-3520M. We might be CPU limited in a couple games, but while there are Llano parts with 20% higher CPU clocks, that just means Intel is "only" ahead by 60-70% instead of 100% faster on CPU performance.
    Reply
  • Joepublic2 - Monday, April 23, 2012 - link

    Because stock temperatures are irrelevant (much like your posting) to the end user as long as the chip isn't throttling. Reply
  • samal90 - Monday, April 23, 2012 - link

    you people over-analyzed my comment. All I wanted to say is that they are bragging about HD 4000 when it doesn't come close to the current competition.
    Couple of years down the road, people won't want dedicated graphics cards in their laptops anymore..its too bulky and consumes too much power. We will all have integrated GPUs. the AMD APU is the way to go. To be honest, CPU power is already way more than enough for a lot of things most people use their laptops for (browsing the web, writing documents, play web-based games a.k.a. angry birds on chrome). The extra GPU is for people that either want to do some graphics processing or play some more graphics intensive games. So yes, it is important for the future to have a good and strong integrated GPU and a good CPU. Therefore, I think AMD will win this round. I hope they continue to compete at each other's throats so we see better and cheaper products from both sides.
    So as I understand it right now: Go for AMD if you want better GPU, go for Intel if CPU is more important for you. Trinity might narrow the CPU gap however and greatly increase the GPU one. Only time will tell.
    Reply
  • chaos215bar2 - Tuesday, April 24, 2012 - link

    "Ivy Bridge is hotter, so if you're paying for the AC, it should be a negative impact."

    Where do you think the dissipated power is going? TDP and overall thermal output are roughly equivalent.

    IVB may get hotter, but without measuring TDP overclocked and under load, that could easily be because the die is smaller and doesn't dissipate heat quite as well.
    Reply
  • DanNeely - Tuesday, April 24, 2012 - link

    "I don't understand this. We're talking about power consumption, not TDP. Heat-wise, Ivy Bridge is hotter, so if you're paying for the AC, it should be a negative impact."

    Power consumption is TDP. 100W of power is 100joules/second of heat to be disipated; it doesn't matter if the heat's coming off a large warm die, or a small hot one. 100W is 100W.

    My current i7-9xx boxes are 130W chips; so just looking at TDP somewhere between 60 and 90W less power at stock (~50 just from the CPU TDP, the higher number the chipset's a theoretical 18 more, probably a lot less in practice, and then whatever cut of the IB's TDP is for the GPU). Probably a wider gap when OCed, but I don't have any stock vs OC power numbers to look at. With AC costs added, cost savings would probably be between $100 and $200/year per box.

    Up front costs would be ~$400-550 for CPU + mobo pairs depending on how high up the feature chain I went; probably fairly high for my main box and more bang for the buck on the 2nd.

    Looking on ebay for successful auctions it looks like I could get ~$250 for my existing cpu/mobo pairs less whatever ebay's fee is. The very rough guess would be a 2 yearish payback time which is somewhat better that I thought (closer to 3 years).

    Not sure I'll do it since I have a few other PC related purchases on the wishlist too: replacing my creaky Core One Duo laptop with a light/medium gaming model or swapping out my netbook for a new ultra portable after Win8 launches might give better returns for my dollar. The latter's battery isn't really lasting as long as I'd like any more. Also, my WHSv1 box is scheduled for retirement this winter.

    I am going to have to give it some serious thought though. Part of me still wants to wait for Haswell even though preliminary indications are that it won't be a huge step up; the much bigger GPU and remaining at dual channel memory makes a mainstream hex core part unlikely.
    Reply

Log in

Don't have an account? Sign up now