Compute & Synthetics

One of the major promises of AMD's APUs is the ability to harness the incredible on-die graphics power for general purpose compute. While we're still waiting for the holy grail of heterogeneous computing applications to show up, we can still evaluate just how strong Trinity's GPU is at non-rendering workloads.

Our first compute benchmark comes from Civilization V, which uses DirectCompute 5 to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game's leader scenes. And while games that use GPU compute functionality for texture decompression are still rare, it's becoming increasingly common as it's a practical way to pack textures in the most suitable manner for shipping rather than being limited to DX texture compression.

Compute: Civilization V

Similar to what we've already seen, Trinity offers a 15% increase in performance here compared to Llano. The compute advantage here over Intel's HD 4000 is solid as well.

Our next benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. We're now using a development build from the version 2.0 branch, and we've moved on to a more complex scene that hopefully will provide a greater challenge to our GPUs.

SmallLuxGPU 2.0d4

Intel significantly shrinks the gap between itself and Trinity in this test, and AMD doesn't really move performance forward that much compared to Llano either.

For our next benchmark we're looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher. Note that this test fails on all Intel processor graphics, so the results below only include AMD APUs and discrete GPUs.

AESEncryptDecrypt

We see a pretty hefty increase in performance over Llano in our AES benchmark. The on-die Radeon HD 7660D even manages to outperform NVIDIA's GeForce GT 640, a $100+ discrete GPU.

Our fourth benchmark is once again looking at compute shader performance, this time through the Fluid simulation sample in the DirectX SDK. This program simulates the motion and interactions of a 16k particle fluid using a compute shader, with a choice of several different algorithms. In this case we're using an (O)n^2 nearest neighbor method that is optimized by using shared memory to cache data.

DirectX11 Compute Shader Fluid Simulation - Nearest Neighbor

For our last compute test, Trinity does a reasonable job improving performance over Llano. If you're in need of a lot of GPU computing horsepower you're going to be best served by a discrete GPU, but it's good to see the processor based GPUs inch their way up the charts.

Synthetic Performance

Moving on, we'll take a few moments to look at synthetic performance. Synthetic performance is a poor tool to rank GPUs—what really matters is the games—but by breaking down workloads into discrete tasks it can sometimes tell us things that we don't see in games.

Our first synthetic test is 3DMark Vantage's pixel fill test. Typically this test is memory bandwidth bound as the nature of the test has the ROPs pushing as many pixels as possible with as little overhead as possible, which in turn shifts the bottleneck to memory bandwidth so long as there's enough ROP throughput in the first place.

3DMark Vantage Pixel Fill

Since our Llano and Trinity numbers were both run at DDR3-1866, there's no real performance improvement here. Ivy Bridge actually does quite well in this test, at least the HD 4000.

Moving on, our second synthetic test is 3DMark Vantage's texture fill test, which provides a simple FP16 texture throughput test. FP16 textures are still fairly rare, but it's a good look at worst case scenario texturing performance.

3DMark Vantage Texture Fill

Trinity is able to outperform Llano here by over 30%, although NVIDIA's GeForce GT 640 shows you what a $100+ discrete GPU can offer beyond processor graphics.

Our final synthetic test is the set of settings we use with Microsoft's Detail Tessellation sample program out of the DX11 SDK. Since IVB is the first Intel iGPU with tessellation capabilities, it will be interesting to see how well IVB does here, as IVB is going to be the de facto baseline for DX11+ games in the future. Ideally we want to have enough tessellation performance here so that tessellation can be used on a global level, allowing developers to efficiently simulate their worlds with fewer polygons while still using many polygons on the final render.

DirectX11 Detail Tessellation Sample - Normal

DirectX11 Detail Tessellation Sample - Max

The tessellation results here were a bit surprising given the 8th gen tessellator in Trinity's GPU. AMD tells us it sees much larger gains internally (up to 2x), but using different test parameters. Trinity should be significantly faster than Llano when it comes to tessellation performance, depending on the workload that is.

Minecraft & Civilization V Performance Power Consumption
Comments Locked

139 Comments

View All Comments

  • DanNeely - Thursday, September 27, 2012 - link

    Does AMD share chipsets between their desktop and mobile platforms? AMD's done this for years (all of their desktop chipsets?); and all the legacy embedded devices you listed are typically connected via the LPC (low pin count) bus, a semi-parallelized implementation of the '80's era 8 bit ISA bus.
  • jamyryals - Thursday, September 27, 2012 - link

    I liked the sneak peak. I don't care if amd wants to hold off on the CPU benchmarks, they'll be out shortly anyways. It was already hinted at as what to expect by Anand in the article. The only thing that's troublesome is the people who take this as an opportunity to besmirch someone's credibility. Take a deep breath and in a few days you'll be able to justify your own viewpoint no matter what the numbers say anyways.

    At this point, it's more about the direction amd is headed that is interesting than this product. What is the target goal for this family of chips and will that be more successful than competing head to head with Intel.
  • Torrijos - Thursday, September 27, 2012 - link

    In august an interesting article, treating of the influence of CPU was posted :
    http://techreport.com/review/23246/inside-the-seco...

    The idea was not to measure average FPS, but instead to measure millisecond/frame for all the frames in a benchmark in order to see if performances were constant or would fall harshly for some frames (having a clear impact on playability).

    The thing is with the current waive of CPUs with iGPUs it might be time to switch benchmarks to a similar methodology, in order to see which architectures handle the memory work better.
  • taltamir - Thursday, September 27, 2012 - link

    I just did a double take, had to look twice, and indeed this is 100% a GPU benchmark with not a single test about the CPU.

    The only test relevant to the CPU might have been the AES acceleration (a fixed function test) and the power test (where intel still spanks AMD).
  • Jamahl - Thursday, September 27, 2012 - link

    This is what happens when you look at the graphs without actually reading anything.
  • Torrijos - Thursday, September 27, 2012 - link

    I read the fact that they can't talk about CPU now, I was trying to say that FPS is an antiquated metric...

    My point was that APUs tend to share memory bandwidth between the CPU and GPU resulting in unreliable peak performances (even when coupled with a discreet GPU) while still maintaining a good average FPS.

    In the end the FPS metric isn't the best available number to clearly evaluate performance of these chips. a full plot of milliseconds per frame for the entire test run offers a clearer vision.

    An alternante measure would be % of frames that took more than XX milliseconds to generate.
  • James5mith - Thursday, September 27, 2012 - link

    I know that the Desktop CPU has had more and more integration, but when did Anandtech decide to start calling them SoC's, as if they were the all-in-one packages inside a smartphone?

    It's still an APU, or CPU+GPU+IMC, or whatever you want to call it. It is not a complete system. It still needs a southbridge chipset for all the sundry interconnects.
  • SleepyFE - Thursday, September 27, 2012 - link

    Will everyone please give up on the measuring competitions (referring to mine is bigger). I'm using Phenom 2 x2 555 and it works just fine 3 years running. I'm an average price conscious gamer. I look for 100€ CPU's and 150€ GPU's (right now i have 6870 Radeon). Everything i do works just fine with very high 2xAA settings. Having an i7 would make no difference in performance because games don't put more cores to good use and every other program i use can't even put a single core to good use.

    I will say again:"I AM AVERAGE!!" And it all works for me. ALL the CPU's right now are sufficient for the average man (or woman).

    The reason AMD is stressing the GPU side of APU's is because that's what matters. When you can buy an APU for 200€ that has a HD Radeon x870 (x being the generation number) class GPU in it that saves me money and cancels one very loud fan. It's a win win.
  • jwcalla - Thursday, September 27, 2012 - link

    "Average" people don't need a GPU any more powerful than what you'd need to drive a simple display. Because "average" people are nowhere near interested in PC gaming.

    And this is why AMD's strategy is a little silly.

    The key to marketshare is making sweet deals with Dell, HP, etc.
  • jaydee - Thursday, September 27, 2012 - link

    I noticed the motherboard has 3 digital video outputs and VGA. Can all three (DVI, HDMI, DP) be used at the same time with the APU?

Log in

Don't have an account? Sign up now