AMD’s Heterogeneous Computing with Trinity

It’s not all about just CPU or GPU performance, though—or at least that’s what we’ve been hearing from various parties for a while now. The real question is how a platform performs as a whole. There are some tasks where pure CPU performance is what really matters, and there are other tasks where the parallel nature of GPUs pays serious dividends. AMD (and NVIDIA) has been pushing for more applications to make use of the GPU for tasks where it can provide a lot of number crunching prowess.

With Trinity, AMD provided us with a selection of applications that now leverage—to varying degrees—AMD’s App Acceleration, OpenCL, OpenGL, or other tools. For some of these applications, we don’t have any good way of measuring performance across a wide selection of hardware, and for some of those where benchmarks are possible I’ve run out of time to try to put anything concrete together. I don’t want to skip this section entirely, so what follows is a list of the applications, how they benefit from heterogeneous compute, and some general impressions of the application. We also have graphs for a few of the applications where performance seemed to matter the most.

Adobe Flash 11.2—The latest version of Flash continues to add GPU acceleration features, and now there are 3D hooks in addition to the video offload acceleration we first saw with Flash 10.x. There’s not too much of note here, as NVIDIA and Intel also support the latest features of Flash 11.2. Flash works fine on Trinity, but the same goes for Ivy Bridge and various NVIDIA GPUs. If you never saw the Epic Citadel demo for iOS or Android, there’s now a Flash-based version of the same demo that will run in your browser. (Warning: that link can take 10-15 minutes on a decent connection to download all the textures and other data!) Epic Citadel looks just as nice as it did on iOS, but now we need some actual games to take advantage of the tools. Then perhaps we can start looking into benchmarks of browser games or something….

Adobe Photoshop CS6—Photoshop started to take advantage of GPU acceleration back with the CS4 release, using OpenGL to improve performance on certain filters and features. With CS6, Adobe has begun using OpenCL. Fundamentally, I’m not sure how big of a change this represents, but there are quite a few functions in Photoshop that are now supposed to be faster/better with an OpenCL compatible graphics card. There are also two new features that leverage OpenCL; one is Iris Blur, which allows you to mimic depth of field using Photoshop instead of your camera, and the other is Liquify. Unfortunately, I’m by no means a Photoshop expert, so I’m not sure how much the features really help “power users”. I did try doing a benchmark of general Photoshop CS6 performance using the Photoshop Retouch benchmark with and without GPU acceleration enabled; unfortunately, it looks like most of the filters in that action script don’t benefit from the GPU acceleration, as the scores I got were essentially unchanged with or without GPU/OpenCL enabled. Overall, I’ll take the GPU acceleration, but for most of what I do in Photoshop it doesn’t appear to benefit; if you’re interested, you can read more about AMD’s work with Adobe.

GNU Image Manipulation Program (GIMP)—Going along with Photoshop CS6, AMD provided a special preview build of GIMP 2.8. GIMP is sort of the poor man’s Photoshop, as it’s completely free. At present, there are 19 filters that utilize OpenCL to speed of processing, and over the coming months as the release version of GIMP looks to take their new engine live there will undoubtedly be more additions. For now, probably only five of the filters are things I would use (e.g. noise reduction, maybe a light blur). I tested several of these, and there is sometimes an order of magnitude speedup vs. doing the work on just the CPU. The problem is that it also looks like GIMP isn't incredibly well threaded in many of these tasks, putting multicore CPUs at a disadvantage. My biggest complaint isn’t even about performance, though; sadly, I just find the GIMP UI and general performance to be really bad compared to Photoshop. I've tried several times over the years to use GIMP instead of Photoshop, but I’ve never felt comfortable with the tool. If on the other hand you prefer GIMP, hopefully when the current GEGL menu gets integrated into the main program you’ll realize a healthy performance boost.

Assisted Video Transcoding—ArcSoft MediaConverter 7

ArcSoft MediaConverter 7.5—MediaConverter should be a familiar name by now if you’ve been following our reviews, as it’s one of the showcase titles for Intel’s Quick Sync transcoding. When we reviewed Ivy Bridge last month, we found that on Llano at least the version of MediaConverter we had ran slower on the GPU than on the CPU; with Trinity on the other hand, enabling GPU acceleration results in times that are about 60% faster than the CPU alone. That’s a good performance increase, but we’re looking at 154 seconds on the CPU compared to 98 seconds using the GPU. In contrast, dual-core Sandy Bridge on CPU transcoding took 127 seconds and with Quick Sync it only took 28 seconds—a 5X improvement. Quad-core Ivy Bridge was just as impressive, going from 68 seconds on the CPU down to 16 seconds with Quick Sync (4.25X). We’ve been hoping to see something more from AMD’s new Video Codec Engine (VCE), first announced over six months ago with HD 7970, but unless there’s substantial room for improvement it looks like Intel’s Quick Sync will continue to be the fastest transcoding tool for now.

Assisted Video Transcoding—CyberLink MediaEspresso 6.5

CyberLink MediaEspresso 6.5—This tool is very similar to MediaConverter, and the results are also better this time around. We measured the assisted encode time at 74 seconds compared to 135 seconds on the CPU alone. The 74 second transcode time actually makes Trinity potentially faster than CPU-based transcoding on dual-core Sandy Bridge, but again Quick Sync (25 seconds on SNB, 12 seconds on IVB) remains the fastest way to transcode.  Considering both of these tools are apparently using VCE, I have to state that I’m disappointed; with VCE I was expecting performance similar to what Intel is getting with Quick Sync—four or five times faster than CPU-based encoding for the same APU. That Trinity isn't quite twice as fast with VCE is unfortunate; even though there's a decent improvement, Intel is in a completely different category of performance. We’ll have to wait and see if anything more develops with VCE.

File Compression—WinZip 16.5 and 7-Zip 9.2

Handbrake— Yep, this popular open source video transcoding app is getting an OpenCL facelift. Check out our separate post on it here.

WinZip 16.5—This final application is one that I can see being very useful, assuming we see similar advancements in other compression utilities. WinZip 16.5 now supports OpenCL to improve compression times. We tested by compressing the entire Cinebench 11.5 directory with and without OpenCL enabled, and we also compared the results with 7-Zip. On Trinity, performance improved by about 20%, which is decent; Llano sees an even larger 28% improvement. Meanwhile, Sandy Bridge using CPU-based compression is about as fast as Trinity with OpenCL, and Ivy Bridge is still faster, but the 20% increase for “free” is nothing to scoff at. Unfortunately for WinZip, 7-Zip compressed the same directory to 95MB vs. 108MB in roughly the same time as the non-OpenCL WinZip, and 7-Zip is completely free and doesn't nag you and tell you to buy it. Where WinZip 16.5 is a good proof of concept, what will really help AMD is if all the other compression utilities (7-Zip, WinRAR, etc.) all start using OpenCL or other tools to improve performance.

The majority of the applications continue to focus on video and image manipulation, likely because those are areas where the parallel nature of GPUs can be readily utilized. WinZip on the other hand is an application showing other potential uses for GPGPU and heterogeneous compute. We’d love to see even more adoption of OpenCL and similar tools, but the stark reality is that coming up with new and useful ways of doing this is difficult—if it were easy, everyone would do it! The good news is that giving the creative people of the world more tools with which to work can only help, and we’ll just have to wait and see what else comes out.

There’s another interesting sidebar worth mentioning here. OpenCL is an open standard, and the latest Intel drivers actually install an OpenCL driver on Ivy Bridge and Sandy Bridge. Not surprisingly, not all implementations are created equal, so even with Intel’s drivers we couldn’t enable OpenCL in Photoshop or WinZip; GIMP on the other hand apparently worked okay with OpenCL on Intel—we measured a 5X performance improvement of the Noise Reduction filter with Ivy Bridge. Trinity also came in slightly faster with both leveraging OpenCL, while Intel was nearly twice as fast without.

AMD Trinity Gaming Performance AMD Trinity: Battery Life Also Improved
Comments Locked

271 Comments

View All Comments

  • texasti89 - Tuesday, May 15, 2012 - link


    A10-4600M's TDP = 35W
    I7-3720QM's TDP = 45W

    I'm pretty sure that Intel's 22nm is more power efficient that any 32nm process available in the industry. The efficiency of Intel GPU architecture is what makes their graphic solution appears to be comparable to AMD fusion parts.
  • Lolimaster - Tuesday, May 15, 2012 - link

    As obviously with the biased reviewers.

    Yeah GJ. Compare a top of the line UBER-expensove IB quad core with the highest TDP and the highest frequency vs A10 Trinity wich costs 3times less(if not more) thant that i7 3720QM.

    HD4000 performance is craptastic. Don't fool people with biased comparisons, at medidum detail and low res, cpu take advantage. For mobile each Mhz towards the 3Ghz and above improve performance.

    BUT WE ARE TALKING ABOUT AN i7 IB 3x times MORE EXPENSIVE than Trinity with WAY HIGHER MHZ. It's not the pathetic HD4000 that is shining is just the cpu, you can put an HD6450M and it will appear "faster" than Trinity if you pair with a high end expensive cpu.

    It's like the moronic reviews with a i7 3770K ($300+) vs A8-3870K ($120).

    Everyone knows that the real competion are the dual core i5 and similar price.

    And again, medium details when APU's prooved to offer high quality in most games.
  • JarredWalton - Tuesday, May 15, 2012 - link

    http://www.anandtech.com/bench/Product/600?vs=580

    I've got Mainstream and Enthusiast performance results in there for the games, but there's not much point in running games at 1600x900 High settings at <30 FPS is there?

    I have a whole section stating why we're including the systems we're including. Are you seriously delusional enough to suggest that we not show HD 4000 performance? There are no other HD 4000 results available for the time being, so either I use the i7-3720QM or I omit Ivy Bridge entirely. For you to imply its inclusion (with the note--italicized even!--that "these two laptops do not target the same market") is somehow biased is in fact far more bias than anything I've shown. And the pricing is twice as high for the ASUS system, not three times -- in fact I'd guess the Trinity laptop would be closer to $800 as configured, since it has Blu-ray and an SSD.

    What's more, throughout the review, I've included dual-core i5-2410M results and discussed how AMD's Trinity stacks up. Judging by Sandy Bridge, dual-core Ivy Bridge will be within 10% of the quad-core scores for gaming--it's not like many games can use more than two CPUs, and so it's really just a matter of the HD 4000 clocks being slightly lower on i5 models. You fail to grasp this fact with your ranting and biased outlook, unfortunately.

    In other words, I think your "moronic reviews" comment reflects your reading comprehension skills--or lack there of. Better luck next time. You might want to sign up for the remedial math and basic reading classes at the local community college.
  • kyuu - Tuesday, May 15, 2012 - link

    "I've got Mainstream and Enthusiast performance results in there for the games, but there's not much point in running games at 1600x900 High settings at <30 FPS is there?"

    Is that that the FPS you get? Did you actually test this or just assuming? Also, you can run 1600x900 without automagically turning up the detail settings to High at the same time. I, for one, am interested to see if the performance advantage increases over Llano/HD4000 when you shift more of the burden to the GPU side. At x768, it seems like the CPU would still be handling enough to make the CPU a substantial bottleneck.
  • JarredWalton - Tuesday, May 15, 2012 - link

    Yes, the scores in Mobile Bench are all actually tested -- including the 5 FPS average score of Trinity at 1920x1080 with 4xAA in Battlefield 3. (Yes, watching that made me feel a bit nauseous....) I could test 1600x900 at medium detail, but I don't expect any major changes from what the existing scores show.
  • Denithor - Tuesday, May 15, 2012 - link

    Actually those facts are very interesting to some of us! It lays out what the system can/cannot handle in practical terms. Now, granted, BF3 @ 1080p/4xAA is kinda an obvious fail scenario, but 1080p medium detail might be good to know.

    One real question that I haven't seen mentioned yet - how come there were no Intel cpu + nVidia gpu systems included in this testing? That seemed like a no-brainer to me...
  • JarredWalton - Wednesday, May 16, 2012 - link

    I thought the Acer TimelineU was a good choice. The only other recently tested laptops with Intel + NVIDIA are the Razer Blade (if people complain that N56VM is too expensive, what would they say about a $3500 laptop!?) and the Alienware M17x R3 (completely different class of hardware and again over $2000). The others like Dell XPS 15z came before we changed our game list, so we don't have some of the results for such laptops.
  • vegemeister - Tuesday, May 15, 2012 - link

    CPU speed doesn't become significant at low resolution because the resolution is low, but because the frame rate is high. The CPU must create the scene to be rendered at much higher temporal resolution.
  • bji - Tuesday, May 15, 2012 - link

    I think this was a well written article and that you laid out the facts about as clearly as could be laid out. I agree that Lolimaster has poor reading comprehension and needs some remedial education.
  • raghu78 - Tuesday, May 15, 2012 - link

    OEM laptop pricing is what changes the discussion. Also the sandybridge stock clearing firesale is a crucial factor. Given that core i7 2630qm with nvidia GT 555M is at USD 800 and entry level core i5 laptops at USD 550

    http://www.newegg.com/Product/Product.aspx?Item=N8...
    http://www.newegg.com/Product/Product.aspx?Item=N8...

    The A10 trinity laptops need to come at USD 600 with a max of 650 for the best designs, with the A8 at 500- 550 and the A6 / A4 at USD 400 - 450.Then they can clearly avoid competing core i7 with discrete GPU configs and be considered good alternatives for the low end Intel core i5, core i3 and pentium/ celeron dual cores with crappy intel HD 3000 graphics. Not to forget the the GPU drivers advantage which AMD has, very good image quality and a rapidly growing GPU accelerated apps ecosystem.

Log in

Don't have an account? Sign up now