Compute & Synthetics

One of the major promises of AMD's APUs is the ability to harness the incredible on-die graphics power for general purpose compute. While we're still waiting for the holy grail of heterogeneous computing applications to show up, we can still evaluate just how strong Trinity's GPU is at non-rendering workloads.

Our first compute benchmark comes from Civilization V, which uses DirectCompute 5 to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game's leader scenes. And while games that use GPU compute functionality for texture decompression are still rare, it's becoming increasingly common as it's a practical way to pack textures in the most suitable manner for shipping rather than being limited to DX texture compression.

Compute: Civilization V

Similar to what we've already seen, Trinity offers a 15% increase in performance here compared to Llano. The compute advantage here over Intel's HD 4000 is solid as well.

Our next benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. We're now using a development build from the version 2.0 branch, and we've moved on to a more complex scene that hopefully will provide a greater challenge to our GPUs.

SmallLuxGPU 2.0d4

Intel significantly shrinks the gap between itself and Trinity in this test, and AMD doesn't really move performance forward that much compared to Llano either.

For our next benchmark we're looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher. Note that this test fails on all Intel processor graphics, so the results below only include AMD APUs and discrete GPUs.

AESEncryptDecrypt

We see a pretty hefty increase in performance over Llano in our AES benchmark. The on-die Radeon HD 7660D even manages to outperform NVIDIA's GeForce GT 640, a $100+ discrete GPU.

Our fourth benchmark is once again looking at compute shader performance, this time through the Fluid simulation sample in the DirectX SDK. This program simulates the motion and interactions of a 16k particle fluid using a compute shader, with a choice of several different algorithms. In this case we're using an (O)n^2 nearest neighbor method that is optimized by using shared memory to cache data.

DirectX11 Compute Shader Fluid Simulation - Nearest Neighbor

For our last compute test, Trinity does a reasonable job improving performance over Llano. If you're in need of a lot of GPU computing horsepower you're going to be best served by a discrete GPU, but it's good to see the processor based GPUs inch their way up the charts.

Synthetic Performance

Moving on, we'll take a few moments to look at synthetic performance. Synthetic performance is a poor tool to rank GPUs—what really matters is the games—but by breaking down workloads into discrete tasks it can sometimes tell us things that we don't see in games.

Our first synthetic test is 3DMark Vantage's pixel fill test. Typically this test is memory bandwidth bound as the nature of the test has the ROPs pushing as many pixels as possible with as little overhead as possible, which in turn shifts the bottleneck to memory bandwidth so long as there's enough ROP throughput in the first place.

3DMark Vantage Pixel Fill

Since our Llano and Trinity numbers were both run at DDR3-1866, there's no real performance improvement here. Ivy Bridge actually does quite well in this test, at least the HD 4000.

Moving on, our second synthetic test is 3DMark Vantage's texture fill test, which provides a simple FP16 texture throughput test. FP16 textures are still fairly rare, but it's a good look at worst case scenario texturing performance.

3DMark Vantage Texture Fill

Trinity is able to outperform Llano here by over 30%, although NVIDIA's GeForce GT 640 shows you what a $100+ discrete GPU can offer beyond processor graphics.

Our final synthetic test is the set of settings we use with Microsoft's Detail Tessellation sample program out of the DX11 SDK. Since IVB is the first Intel iGPU with tessellation capabilities, it will be interesting to see how well IVB does here, as IVB is going to be the de facto baseline for DX11+ games in the future. Ideally we want to have enough tessellation performance here so that tessellation can be used on a global level, allowing developers to efficiently simulate their worlds with fewer polygons while still using many polygons on the final render.

DirectX11 Detail Tessellation Sample - Normal

DirectX11 Detail Tessellation Sample - Max

The tessellation results here were a bit surprising given the 8th gen tessellator in Trinity's GPU. AMD tells us it sees much larger gains internally (up to 2x), but using different test parameters. Trinity should be significantly faster than Llano when it comes to tessellation performance, depending on the workload that is.

Minecraft & Civilization V Performance Power Consumption
Comments Locked

139 Comments

View All Comments

  • Arbie - Thursday, September 27, 2012 - link


    For all the reasons you listed, Crysis Warhead is very much worth keeping in the mix. Personally, it's one of the few games I return to and easily the best of all of them. I'm very interested in how the new chips run it.

    Thanks.
  • SanX - Thursday, September 27, 2012 - link

    Make these processors capable of 2, 4, 6, 8-chip configurations and make appropriate cheap motherboards to sell the processors by shovels.

    They will be happy, we will be happy. Intel will be in trouble.
    Indeed, 32-core PC for less then $1000 !
  • calzahe - Thursday, September 27, 2012 - link

    The main issue with Trinity is that it is basically almost the same as Liano, just cosmetic improvements in architecture like VLIW5 -> VLIW4 in GPU and new X86 Piledriver cores... But the number of streaming processors reduced from 400 to 384 and memory controller has still only 2 channels.

    The problem for AMD is that they don't understand that people who could buy APU to play games don't want to stick with low graphics settings in games and prefer to add extra $ to buy external graphics card and set everything in High in games. And the people who don't play games buy Intel Ivy Bridge because it consumes less energy and is less noisy.

    To make next gen Kaveri APU attractive AMD should make it with minimum 800 streaming processors and memory controller should have 4 memory channels with DDR4 support. Otherwise Intel's Haswell will destroy AMD completely next year...

    As for Laptops Market the Ivy Bridge has similar performance as Trinity but provides much longer battery life. So the solution for AMD again to make APU with 800 or more streaming processors and 4 channel memory controller - it will not give 10 hours battery life but anyway combined with effective idle cores switching-off will be more effecient in power saving than CPU + descrete graphics card. So many people will buy these laptops for gaming and HD Movies.

    Regarding the Tablets/Smartphones market, AMD should accept the fact that the GloFo/TSMC 32nm/28nm manufacturing processes are inferior to Intel's 22nm. So unless GloFo will be on par with Intel in 14nm in 2014 (what is highly unlikely) AMD has no chances against Intel. That's why instead of wasting a lot of money and resources on Brazos they should licence ARM architecture and combine it with Radeon cores what can be quite competitive or even better than Tegra or Snapdragon.

    If AMD doesn't make improvements quickly than in 1-2 years they will be sold out or bankrupt.
  • silverblue - Thursday, September 27, 2012 - link

    Do you realise that once AMD implements its HSA initiative (along with perhaps on-die memory), it won't actually need a 4-channel memory bus? Faster clocked RAM is a must, though.

    In any case, people who buy APUs aren't in fact after bleeding edge performance but something affordable that doesn't perform like a dog. Add an external GPU if you like but that's really Vishera's area (and the dual module CPUs have no GPUs and as such will overclock better - Trinity's CPU cores could be more of a hindrance here).
  • calzahe - Thursday, September 27, 2012 - link

    HSA will not help much if used with 2 channel memory, on-die memory or 3D memory stacking will happen best case at 14nm due to transistor budget restriction. But AMD woud be able to use faster clocked DDR4 with 4 channel memory controller even next year without much effort.

    Those who buy discrete graphics cards usually buy them together with Intel CPUs and Vishera will not change this, and also lots of people prefer Nvidia cards over AMD. So to make some competition AMD should combine middle or even high end GPUs with 4-8 x86 cores into APU and use faster clocked DDR4 with 4 channel memory controller and sell the APUs for 200-400usd - it'll be more energy efficient and cheaper than paying 200-300usd for Intel CPU plus 250-500usd for good graphics card and what's the most important is that AMD has all the technologies and resources to make this happen even next year just a correct management decision is required...
  • wenbo - Thursday, October 4, 2012 - link

    You make it sound so easy. If it is that easy, people would have done that already.
  • wwwcd - Thursday, September 27, 2012 - link

    I agreed for that AMD's all desktop platforms need of 4 channel memory controller, but I thing than this option must release immediately...Fact is DDR4 for desktop have not will before Y2015. Four channel with high frequency it's enough for for the present.
  • wwwcd - Thursday, September 27, 2012 - link

    I agreed for that AMD's all desktop platforms need of 4 channel memory controller, but I thing than this option must release immediately...Fact is DDR4 for desktop have not will before Y2015. Four channel with high frequency it's enough for the present....

    Edit some errors;) ...With DDR3
  • silverblue - Friday, September 28, 2012 - link

    It's also not cheap to implement. One of the reasons the top-end Intel boards are so expensive, I expect. I think it'd be better to go for higher speed first and foremost.

    The extra bandwidth could let the CPU breathe a little better as well as open up GPU performance at higher detail levels, however I'm not sure it'll be the massive boost people are hoping for. Keeping a 384-shader GPU means you'll get potentially HD 4830/4770 performance, with the added bonus of more RAM than either of those two cards, however Trinity isn't THAT bandwidth constrained - adding more shaders would certainly alter that picture.
  • kyuu - Friday, September 28, 2012 - link

    "The problem for AMD is that they don't understand that people who could buy APU to play games don't want to stick with low graphics settings in games and prefer to add extra $ to buy external graphics card and set everything in High in games."

    This sentence makes no sense. If someone is looking at buying an APU, then they aren't looking at a discrete GPU setup and obviously aren't looking to run games at max settings. And, contrary to what a lot of people seem to think, a lot of people don't care about running the latest-and-greatest at max settings.

    Obviously, for an enthusiast gamer, Trinity doesn't make a whole lot of sense on the desktop (unless possibly they get asymmetrical crossfire working really well). But in the mobile arena, Trinity makes a lot of sense, giving respectable gaming prowess for significantly cheaper than an Intel CPU and discrete GPU combination as well as superior gaming battery life.

    What I'm most looking forward to is a tablet of Surface quality with a low-voltage Trinity powering it.

    No doubt more memory bandwidth would be greatly beneficial to AMD's APUs, but it's not as simple as just going to 4 channel memory. That increases the cost of the motherboard as well as paying for four sticks of memory, and it may not be practical in the mobile arena (which is where Trinity most shines, asides from HTPC duty).

Log in

Don't have an account? Sign up now