Original Link: http://www.anandtech.com/show/6332/amd-trinity-a10-5800k-a8-5600k-review-part-1



After years of waiting, AMD finally unveiled its Llano APU platform fifteen months ago. The APU promise was a new world where CPUs and GPUs would live in harmony on a single, monolithic die. Delivering the best of two very different computing architectures would hopefully pave the way for a completely new class of applications. That future is still distant, but today we're at least at the point where you can pretty much take for granted that if you buy a modern CPU it's going to ship with a GPU attached to it.

Four months ago AMD took the wraps off of its new Trinity APU: a 32nm SoC with up to four Piledriver cores and a Cayman based GPU. Given AMD's new mobile-first focus, Trinity launched as a notebook platform. The desktop PC market is far from dead, just deprioritized. Today we have the first half of the Trinity desktop launch. Widespread APU availability won't be until next month, but AMD gave us the green light to begin sharing some details including GPU performance starting today.


AMD's Trinity APU, 2 Piledriver modules (4 cores)

We've already gone over the Trinity APU architecture in our notebook post earlier this year. As a recap, Piledriver helped get Bulldozer's power consumption under control, while the Cayman GPU's VLIW4 architecture improved efficiency on the graphics side. Compared to Llano this is a fairly big departure with fairly different CPU and GPU architectures. Given that we're still talking about the same 32nm process node, there's not a huge amount of room for performance improvements without ballooning die area but through architecture changes and some more transistors AMD was able to deliver something distinctly faster.

Trinity Physical Comparison
  Manufacturing Process Die Size Transistor Count
AMD Llano 32nm 228mm2 1.178B
AMD Trinity 32nm 246mm2 1.303B
Intel Sandy Bridge (4C) 32nm 216mm2 1.16B
Intel Ivy Bridge (4C) 22nm 160mm2 1.4B

On the desktop Trinity gets the benefit of much higher TDPs and thus higher clock speeds. The full lineup, sans pricing, is below:

Remember the CPU cores we're counting here are integer cores, FP resources are shared between every two cores. Clock speeds are obviously higher compared to Llano, but Bulldozer/Piledriver did see some IPC regression compared to the earlier core design. You'll notice a decrease in GPU cores compared to Llano as well (384 vs. 400 for the top end part), but core efficiency should be much higher in Trinity.

Again AMD isn't talking pricing today, other than to say that it expects Trinity APUs to be priced similarly to Intel's Core i3 parts. Looking at Intel's price list that gives AMD a range of up to $134. We'll find out more on October 2nd, but for now the specs will have to be enough.

Socket-FM2 & A85X Chipset

The desktop Trinity APUs plug into a new socket: FM2. To reassure early adopters of Llano's Socket-FM1 that they won't get burned again, AMD is committing to one more generation beyond Trinity for the FM2 platform.

The FM2 socket itself is very similar to FM1, but keyed differently so there's no danger of embarrassingly plugging a Llano into your new FM2 motherboard.


Socket-FM1 (left) vs. Socket-FM2 (Right)

AMD both borrows from Llano as well as expands when it comes to FM2 chipset support. The A55 and A75 chipsets make another appearance here on new FM2 motherboards, but they're joined by a new high-end option: the A85X chipset.

The big differentiators are the number of 6Gbps SATA and USB 3.0 ports. On the A85X you also get the ability to support two discrete AMD GPUs in CrossFire although obviously there's a fairly competent GPU on the Trinity APU die itself.

The Terms of Engagement

As I mentioned earlier, AMD is letting us go live with some Trinity data earlier than its official launch. The only stipulation? Today's preview can only focus on GPU performance. We can't talk about pricing, overclocking and aren't allowed to show any x86 CPU performance either. Obviously x86 CPU performance hasn't been a major focus of AMD's as of late, it's understandable that AMD would want to put its best foot forward for these early previews. Internally AMD is also concerned that that any advantages it may have in the GPU department are overshadowed by their x86 story. AMD's recent re-hire of Jim Keller was designed to help address the company's long-term CPU roadmap, however until then AMD is still in the difficult position of trying to sell a great GPU attached to a bunch of CPU cores that don't land at the top of the x86 performance charts.

It's a bold move by AMD, to tie a partial NDA to only representing certain results. We've seen embargoes like this in the past, allowing only a subset of tests to be used in a preview. AMD had no influence on what specifics benchmarks we chose, just that we limit the first part of our review to looking at the GPU alone. Honestly with some of the other stuff we're working on I don't mind so much as I wouldn't be able to have a full review ready for you today anyway. Our hands are tied, so what we've got here is the first part of a two part look at the desktop Trinity APU. If you want to get some idea of Trinity CPU performance feel free to check out our review of the notebook APU. You won't get a perfect idea of how Piledriver does against Ivy Bridge on the desktop, but you'll have some clue. From my perspective, Piledriver seemed more about getting power under control - Steamroller on the other hand appears to address more on the performance side.

We'll get to the rest of the story on October 2nd, but until then we're left with the not insignificant task of analyzing the performance of the graphics side of AMD's Trinity APU on the desktop.

The Motherboard

AMD sent over a Gigabyte GA-F2A85X-UP4 motherboard along with an A10-5800K and A8-5600K. The board worked flawlessly in our testing, and it also gave us access to AMD's new memory profiles. A while ago AMD partnered up with Patriot to bring AMD branded memory to market. AMD's Performance line of memory includes support for AMD's memory profiles, which lets you automatically set frequency, voltage and timings with a single BIOS setting.

We've always done these processor graphics performance comparisons using DDR3-1866, so there's no difference for this review. The only change is we only had to set a single option to configure the platform for stable 1866MHz operation.

Processor graphics performance scales really well with additional memory bandwidth, making this an obvious fit. There's nothing new about memory profiles, this is just something new for AMD's APU platform.



Crysis: Warhead

Our first graphics test is Crysis: Warhead, which in spite of its relatively high system requirements is the oldest game in our test suite. Crysis was the first game to really make use of DX10, and set a very high bar for modern games that still hasn't been completely cleared. And while its age means it's not heavily played these days, it's a great reference for how far GPU performance has come since 2008. For an iGPU to even run Crysis at a playable framerate is a significant accomplishment, and even more so if it can do so at better than performance (low) quality settings.

Crysis: Warhead - Frost Bench

Crysis: Warhead - Frost Bench

Crysis: Warhead - Frost Bench

Crysis sets the tone for a lot of what we'll see in this performance review. The Radeon HD 7660D on AMD's A10-5800K boosts performance by around 15 - 26% over the top end Llano part. The smaller, Radeon HD 7560D GPU manages a small increase over the top-end Llano at worst, and at best pulls ahead by 18%.

Compared to Ivy Bridge, well, there's no comparison. Trinity is significantly faster than Intel's HD 4000, and compared to HD 2500 the advantage is tremendous.

 

Metro 2033

Our next graphics test is Metro 2033, another graphically challenging game. Like Crysis this is a game that is traditionally unplayable on many integrated GPUs, even in DX9 mode.

Metro 2033

Metro 2033

Metro 2033

Metro 2033 shows us a 6 - 13% performance advantage for the top end Trinity part compared to Llano. The advantage over Intel's HD 4000 ranges from 20 - 40% depending on the resolution/quality settings. In general AMD is able to either deliver the same performance at much better quality or better performance at the same quality as Ivy Bridge.

The more important comparison is looking at the A8-5600K vs. Intel's HD 4000 and 2500. AMD is still able to hold onto a significant advantage there, even with its core-reduced GPU.



DiRT 3

DiRT 3 is our next DX11 game. Developer Codemasters Southam added DX11 functionality to their EGO 2.0 engine back in 2009 with DiRT 2, and while it doesn't make extensive use of DX11 it does use it to good effect in order to apply tessellation to certain environmental models along with utilizing a better ambient occlusion lighting model. As a result DX11 functionality is very cheap from a performance standpoint, meaning it doesn't require a GPU that excels at DX11 feature performance.

DiRT 3

DiRT 3

DiRT 3 shows a relatively small performance advantage compared to Llano - only about 12 - 15% when comparing the two top end parts. More exciting from AMD's perspective is that it can deliver performance similar to the 3870K's 400-core GPU with the 256-core GPU in the A8-5600K.

The advantage over Intel's HD 4000/2500 remains significant.

 

Total War: Shogun 2

Total War: Shogun 2 is the latest installment of the long-running Total War series of turn based strategy games, and alongside Civilization V is notable for just how many units it can put on a screen at once. Adding to the load is the use of DX11 features such as tessellation and high definition ambient occlusion, which means it can give any GPU a run for its money.

Total War: Shogun 2

Total War: Shogun 2

Total War: Shogun 2

We see similar scaling to DiRT 3 in Shogun: about a 15% improvement over Llano, or flat performance if you compare to the 2nd fastest Trinity GPU configuration.



Portal 2

Portal 2 continues to be the latest and greatest Source engine game to come out of Valve's offices. While Source continues to be a DX9 engine, and hence is designed to allow games to be playable on a wide range of hardware, Valve has continued to upgrade it over the years to improve its quality, and combined with their choice of style you'd have a hard time telling it's over 7 years old at this point. From a rendering standpoint Portal 2 isn't particularly geometry heavy, but it does make plenty of use of shaders.

Portal 2

Portal 2

Portal 2 performance is one of the stronger showings for Trinity. In both of these tests we're seeing aorund a 28% increase in performance compared to the A8-3870K. Ivy Bridge doesn't stand a chance as the A10-5800K is more than twice as fast as Intel's HD 4000.

 

Battlefield 3

Its popularity aside, Battlefield 3 may be the most interesting game in our benchmark suite for a single reason: it was the first AAA DX10+ game. Consequently it makes no attempt to shy away from pushing the graphics envelope, and pushing GPUs to their limits at the same time. Even at low settings Battlefield 3 is a handful, and to be able to run it on an iGPU would no doubt make quite a few traveling gamers happy.

Battlefield 3

We're back down to more modest gains in our Battlefield 3 test: Trinity shows a 15% increase in performance compared to Llano at the high end. The advantage compared to Intel remains healthy at over 50%.



Starcraft 2

Our next game is Starcraft II, Blizzard's 2010 RTS megahit. Starcraft II is a DX9 game that is designed to run on a wide range of hardware, and given the growth in GPU performance over the years it's often CPU limited before it's GPU limited on higher-end cards.

Starcraft 2 - GPU Bench

Starcraft 2 - GPU Bench

Starcraft 2 - GPU Bench

Despite being heavily influenced by CPU performance, Starcraft 2 shows big gains when moving to Trinity. The improvement over Llano ranges from 16 - 27% in our tests. The performance advantage over Ivy Bridge is huge.

 

The Elder Scrolls V: Skyrim

Bethesda's epic sword & magic game The Elder Scrolls V: Skyrim is our RPG of choice for benchmarking. It's altogether a good CPU benchmark thanks to its complex scripting and AI, but it also can end up pushing a large number of fairly complex models and effects at once. This is a DX9 game so it isn't utilizing any new DX11 functionality, but it can still be a demanding game.

The Elder Scrolls V: Skyrim

The Elder Scrolls V: Skyrim

We see some mild improvements over Llano in our Skyrim tests, and even Intel is able to catch up a bit. Trinity still does quite well, only NVIDIA's GeForce GT 640 can really deliver better performance than the top-end A10-5800K SKU.



Minecraft

Switching gears for the moment we have Minecraft, our OpenGL title. It's no secret that OpenGL usage on the PC has fallen by the wayside in recent years, and as far major games go Minecraft is one of but a few recently released major titles using OpenGL. Minecraft is incredibly simple—not even utilizing pixel shaders let alone more advanced hardware—but this doesn't mean it's easy to render. Its use of massive amounts of blocks (and the overdraw that creates) means you need solid hardware and an efficient OpenGL implementation if you want to hit playable framerates with a far render distance. Consequently, as the most successful OpenGL game in quite some number of years (at over 7.5mil copies sold), it's a good reminder for GPU manufacturers that OpenGL is not to be ignored.

Minecraft

Minecraft does incredibly well on Trinity. While the improvement over Llano is only 15%, the advantage over Ivy Bridge is tremendous.

 

Civilization V

Our final game, Civilization V, gives us an interesting look at things that other RTSes cannot match, with a much weaker focus on shading in the game world, and a much greater focus on creating the geometry needed to bring such a world to life. In doing so it uses a slew of DirectX 11 technologies, including tessellation for said geometry, driver command lists for reducing CPU overhead, and compute shaders for on-the-fly texture decompression. There are other games that are more stressful overall, but this is likely the game most stressing of DX11 performance in particular.

Civilization V

Civilization V

Civilization V shows some of the mildest gains in all of our tests vs. Llano. The 5800K/7660D manage to outperform Llano by only 8 -11% depending on the test. The advantage over Intel is huge of course.



Compute & Synthetics

One of the major promises of AMD's APUs is the ability to harness the incredible on-die graphics power for general purpose compute. While we're still waiting for the holy grail of heterogeneous computing applications to show up, we can still evaluate just how strong Trinity's GPU is at non-rendering workloads.

Our first compute benchmark comes from Civilization V, which uses DirectCompute 5 to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game's leader scenes. And while games that use GPU compute functionality for texture decompression are still rare, it's becoming increasingly common as it's a practical way to pack textures in the most suitable manner for shipping rather than being limited to DX texture compression.

Compute: Civilization V

Similar to what we've already seen, Trinity offers a 15% increase in performance here compared to Llano. The compute advantage here over Intel's HD 4000 is solid as well.

Our next benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. We're now using a development build from the version 2.0 branch, and we've moved on to a more complex scene that hopefully will provide a greater challenge to our GPUs.

SmallLuxGPU 2.0d4

Intel significantly shrinks the gap between itself and Trinity in this test, and AMD doesn't really move performance forward that much compared to Llano either.

For our next benchmark we're looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher. Note that this test fails on all Intel processor graphics, so the results below only include AMD APUs and discrete GPUs.

AESEncryptDecrypt

We see a pretty hefty increase in performance over Llano in our AES benchmark. The on-die Radeon HD 7660D even manages to outperform NVIDIA's GeForce GT 640, a $100+ discrete GPU.

Our fourth benchmark is once again looking at compute shader performance, this time through the Fluid simulation sample in the DirectX SDK. This program simulates the motion and interactions of a 16k particle fluid using a compute shader, with a choice of several different algorithms. In this case we're using an (O)n^2 nearest neighbor method that is optimized by using shared memory to cache data.

DirectX11 Compute Shader Fluid Simulation - Nearest Neighbor

For our last compute test, Trinity does a reasonable job improving performance over Llano. If you're in need of a lot of GPU computing horsepower you're going to be best served by a discrete GPU, but it's good to see the processor based GPUs inch their way up the charts.

Synthetic Performance

Moving on, we'll take a few moments to look at synthetic performance. Synthetic performance is a poor tool to rank GPUs—what really matters is the games—but by breaking down workloads into discrete tasks it can sometimes tell us things that we don't see in games.

Our first synthetic test is 3DMark Vantage's pixel fill test. Typically this test is memory bandwidth bound as the nature of the test has the ROPs pushing as many pixels as possible with as little overhead as possible, which in turn shifts the bottleneck to memory bandwidth so long as there's enough ROP throughput in the first place.

3DMark Vantage Pixel Fill

Since our Llano and Trinity numbers were both run at DDR3-1866, there's no real performance improvement here. Ivy Bridge actually does quite well in this test, at least the HD 4000.

Moving on, our second synthetic test is 3DMark Vantage's texture fill test, which provides a simple FP16 texture throughput test. FP16 textures are still fairly rare, but it's a good look at worst case scenario texturing performance.

3DMark Vantage Texture Fill

Trinity is able to outperform Llano here by over 30%, although NVIDIA's GeForce GT 640 shows you what a $100+ discrete GPU can offer beyond processor graphics.

Our final synthetic test is the set of settings we use with Microsoft's Detail Tessellation sample program out of the DX11 SDK. Since IVB is the first Intel iGPU with tessellation capabilities, it will be interesting to see how well IVB does here, as IVB is going to be the de facto baseline for DX11+ games in the future. Ideally we want to have enough tessellation performance here so that tessellation can be used on a global level, allowing developers to efficiently simulate their worlds with fewer polygons while still using many polygons on the final render.

DirectX11 Detail Tessellation Sample - Normal

DirectX11 Detail Tessellation Sample - Max

The tessellation results here were a bit surprising given the 8th gen tessellator in Trinity's GPU. AMD tells us it sees much larger gains internally (up to 2x), but using different test parameters. Trinity should be significantly faster than Llano when it comes to tessellation performance, depending on the workload that is.



Power Consumption

Thanks to extensive power/clock gating, idle power consumption of Trinity is easily on par with the best Intel has to offer with Ivy Bridge. It's entirely possible that we'd see lower numbers with more power efficient motherboards/PSUs, but at least here it looks like AMD's idle power concerns are a thing of the past. Load power is another issue altogether. Trinity may be more efficient than Llano at delivering performance, but it's still built on the same 32nm process node. Load power consumption doesn't go up significantly compared to Llano, but it doesn't go down either. It's going to take new process nodes and design techniques to really drive active power down in future APUs.

GPU Power Consumption - Idle

GPU Power Consumption - Load (Metro 2033)

 



Final Words

On average, Trinity's high-end 384-core GPU manages to be around 16% faster than the fastest Llano GPU, while consuming around 7% more power when active. Given that Trinity is built on the same process node at Llano, I'd call that a relatively good step forward for AMD's equivalent of a "tick". From AMD's perspective, the fact that it can continue to deliver a tangible GPU performance advantage over Intel's latest and greatest even with its die harvested APU (256-core Trinity) is good news. For anyone looking to build a good entry level gaming PC, the Trinity platform easily delivers the best processor graphics performance on the market today. If you're able to spend an extra $100 on a discrete GPU you'll get better performance, but below that Trinity rules. The trick, as always, will be selling the GPU performance advantage alongside the presumably lower x86 CPU performance. We'll have to wait another week to find out the full story on that of course, but if you're mostly concerned about GPU gaming performance, Trinity delivers.

Ivy Bridge was a good step forward for Intel, the problem is that only the high-end Ivy Bridge graphics configuration borders on acceptable. The HD 2500's performance is really bad unfortunately. It's easy to appreciate how far Intel has come when we look at improvements from one generation to the next, but when you start running benchmarks on Trinity it really compresses the progress Intel has made. When Haswell shows up it may be a different game entirely, but until then if you're interested in a platform with processor graphics (with an emphasis on the graphics part), Trinity is as good as it gets.

Log in

Don't have an account? Sign up now