Compute & Synthetics

One of the major promises of AMD's APUs is the ability to harness the incredible on-die graphics power for general purpose compute. While we're still waiting for the holy grail of heterogeneous computing applications to show up, we can still evaluate just how strong Trinity's GPU is at non-rendering workloads.

Our first compute benchmark comes from Civilization V, which uses DirectCompute 5 to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game's leader scenes. And while games that use GPU compute functionality for texture decompression are still rare, it's becoming increasingly common as it's a practical way to pack textures in the most suitable manner for shipping rather than being limited to DX texture compression.

Compute: Civilization V

Similar to what we've already seen, Trinity offers a 15% increase in performance here compared to Llano. The compute advantage here over Intel's HD 4000 is solid as well.

Our next benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. We're now using a development build from the version 2.0 branch, and we've moved on to a more complex scene that hopefully will provide a greater challenge to our GPUs.

SmallLuxGPU 2.0d4

Intel significantly shrinks the gap between itself and Trinity in this test, and AMD doesn't really move performance forward that much compared to Llano either.

For our next benchmark we're looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher. Note that this test fails on all Intel processor graphics, so the results below only include AMD APUs and discrete GPUs.

AESEncryptDecrypt

We see a pretty hefty increase in performance over Llano in our AES benchmark. The on-die Radeon HD 7660D even manages to outperform NVIDIA's GeForce GT 640, a $100+ discrete GPU.

Our fourth benchmark is once again looking at compute shader performance, this time through the Fluid simulation sample in the DirectX SDK. This program simulates the motion and interactions of a 16k particle fluid using a compute shader, with a choice of several different algorithms. In this case we're using an (O)n^2 nearest neighbor method that is optimized by using shared memory to cache data.

DirectX11 Compute Shader Fluid Simulation - Nearest Neighbor

For our last compute test, Trinity does a reasonable job improving performance over Llano. If you're in need of a lot of GPU computing horsepower you're going to be best served by a discrete GPU, but it's good to see the processor based GPUs inch their way up the charts.

Synthetic Performance

Moving on, we'll take a few moments to look at synthetic performance. Synthetic performance is a poor tool to rank GPUs—what really matters is the games—but by breaking down workloads into discrete tasks it can sometimes tell us things that we don't see in games.

Our first synthetic test is 3DMark Vantage's pixel fill test. Typically this test is memory bandwidth bound as the nature of the test has the ROPs pushing as many pixels as possible with as little overhead as possible, which in turn shifts the bottleneck to memory bandwidth so long as there's enough ROP throughput in the first place.

3DMark Vantage Pixel Fill

Since our Llano and Trinity numbers were both run at DDR3-1866, there's no real performance improvement here. Ivy Bridge actually does quite well in this test, at least the HD 4000.

Moving on, our second synthetic test is 3DMark Vantage's texture fill test, which provides a simple FP16 texture throughput test. FP16 textures are still fairly rare, but it's a good look at worst case scenario texturing performance.

3DMark Vantage Texture Fill

Trinity is able to outperform Llano here by over 30%, although NVIDIA's GeForce GT 640 shows you what a $100+ discrete GPU can offer beyond processor graphics.

Our final synthetic test is the set of settings we use with Microsoft's Detail Tessellation sample program out of the DX11 SDK. Since IVB is the first Intel iGPU with tessellation capabilities, it will be interesting to see how well IVB does here, as IVB is going to be the de facto baseline for DX11+ games in the future. Ideally we want to have enough tessellation performance here so that tessellation can be used on a global level, allowing developers to efficiently simulate their worlds with fewer polygons while still using many polygons on the final render.

DirectX11 Detail Tessellation Sample - Normal

DirectX11 Detail Tessellation Sample - Max

The tessellation results here were a bit surprising given the 8th gen tessellator in Trinity's GPU. AMD tells us it sees much larger gains internally (up to 2x), but using different test parameters. Trinity should be significantly faster than Llano when it comes to tessellation performance, depending on the workload that is.

Minecraft & Civilization V Performance Power Consumption
Comments Locked

139 Comments

View All Comments

  • dishayu - Thursday, September 27, 2012 - link

    Hate to be offtopic here, i wanted to ask what happened to this weeks Podcast? Was really looking forward to a talk about IDF and Haswell.
  • Ryan Smith - Thursday, September 27, 2012 - link

    Busy,. Busy busy busy. Perhaps on the next podcast Anand will tell you what he's been up to and how many times he's flown somewhere this month.
  • idealego - Thursday, September 27, 2012 - link

    I don't think the load GPU power consumption is fair and will explain why.

    The AMD processors are achieving higher frame rates than the Intel processors in Metro 2033, the game used for the power consumption chart. If you calculated watts per frame AMD would actually be more efficient than Intel.

    Another way of running this test would be to use game settings that all the processors could handle at 30 fps and then cap all tests at 30 fps. Under these test conditions each processor would be doing the same amount of work. I would be curious to see the results of such a test.

    Good article as always!
  • SleepyFE - Thursday, September 27, 2012 - link

    True.
    But you are asking for consumption/performance charts. You can do those yourself out of the data given.
    They test consumption under max load because noone will cap all their games at 30fps to keep consumption down. People use what they get and that is what you would get if you played Metro 2033.
  • idealego - Thursday, September 27, 2012 - link

    Some people want to know the max power usage of the processor to help them select a power supply or help them predict how much cooling will be needed in their case.

    Other people, like me, are more interested in the efficiency of the architecture of the processor in general and as a comparison to the competition. This is why I'm more interested in a frames per watt or watts at a set fps, otherwise it's like comparing the "efficiency" of a dump truck to a van by comparing only fuel economy.
  • CeriseCogburn - Thursday, October 11, 2012 - link

    LMAO - faildozer now a dump truck, sounds like amd is a landfill of waste and garbage, does piledriver set the posts for the hazardous waste of PC purchase money signage ?

    Since it's great doing 30fps in low low mode so everyone can play and be orange orange instead of amd losing terribly sucking down the power station, just buy the awesome Intel Sandy Bridge with it's super efficient arch and under volting and OC capabilities and be happy.

    Or is that like verboten for amd fanboys ?
  • IntelUser2000 - Thursday, September 27, 2012 - link

    We can't even calculate it fairly because they are measuring system power, not CPU power.
  • iwod - Thursday, September 27, 2012 - link

    I think Trinity is pretty good chip for low cost PC. Which seems to be the case for majority of PCs sold today. I wonder why is it now selling well compared to Intel.
  • Hardcore69 - Thursday, September 27, 2012 - link

    I bought a 3870K in February. I've now sold it and replaced it with a G540. APU's are rather pointless unless you are a cheap ass gamer that can't afford a 7870 or above or for a HTPC. Even there, I built a HTPC with a G540. You don't really need more anyway. Match it to a decent Nvidia GPU if you want all the fancy rendering. Personally I don't see the point for MadVR and I can't see the difference between 23.976 @ 23.976 or 23.976 at 50Hz.

    All that being said, I bet that on the CPU side, AMD has failed. Again. CPU grunt is more important anyway. A G620 can compete generally with a 3870K on the CPU side. That is just embarrassing. The 5800K isn't much of an improvement.

    Bottom line, a Celeron is better for a basic office/pornbox, skip the Pentium, skip the i3, get an i5 if you do editing or encoding, i7 if you want to splurge. GPU performance is rather moot for most uses. Intel's HD 1000 does the job. Yes, it can accelerate via Quicksync or DXVA, yes its good enough for youtube. Again, if you want to game, get a gaming GPU. I've given up on AMD. Its CPU tech is too crap and its GPU side can't compensate.
  • Fox5 - Thursday, September 27, 2012 - link

    A 7870 goes for at least $220 right now, that's a pretty big price jump.

    AMD has a market, it's if you want the best possible gaming experience at a minimum in price. You can't really beat the ~$100 price for decent cpu and graphics performance, when it would cost you at least half that much (probably more) for a graphics card of that performance level. Also, in the HTPC crowd, form factor and power usage are critical, so AMD wins there; I don't want a discrete card in my HTPC if I can avoid it.

Log in

Don't have an account? Sign up now