Original Link: http://www.anandtech.com/show/5831/amd-trinity-review-a10-4600m-a-new-hope
The AMD Trinity Review (A10-4600M): A New Hopeby Jarred Walton on May 15, 2012 12:00 AM EST
Introduction and Piledriver Overview
Brazos and Llano were both immensely successful parts for AMD. The company sold tons despite not delivering leading x86 performance. The success of these two APUs gave AMD a lot of internal confidence that it was possible to build something that didn't prioritize x86 performance but rather delivered a good balance of CPU and GPU performance.
AMD's commitment to the world was that we'd see annual updates to all of its product lines. Llano debuted last June, and today AMD gives us its successor: Trinity.
At a high level, Trinity combines 2-4 Piledriver x86 cores (1-2 Piledriver modules) with up to 384 VLIW4 Northern Islands generation Radeon cores on a single 32nm SOI die. The result is a 1.303B transistor chip (up from 1.178B in Llano) that measures 246mm^2 (compared to 228mm^2 in Llano).
|Trinity Physical Comparison|
|Manufacturing Process||Die Size||Transistor Count|
|Intel Sandy Bridge (4C)||32nm||216mm2||1.16B|
|Intel Ivy Bridge (4C)||22nm||160mm2||1.4B|
Without a change in manufacturing process, AMD is faced with the tough job of increasing performance without ballooning die size. Die size has only gone up by around 7%, but both CPU and GPU performance see double-digit increases over Llano. Power consumption is also improved over Llano, making Trinity a win across the board for AMD compared to its predecessor. If you liked Llano, you'll love Trinity.
The problem is what happens when you step outside of AMD's world. Llano had a difficult time competing with Sandy Bridge outside of GPU workloads. AMD's hope with Trinity is that its hardware improvements combined with more available OpenCL accelerated software will improve its standing vs. Ivy Bridge.
Piledriver: Bulldozer Tuned
While Llano featured as many as four 32nm x86 Stars cores, Trinity features up to two Piledriver modules. Given the not-so-great reception of Bulldozer late last year, we were worried about how a Bulldozer derivative would stack up in Trinity. I'm happy to say that Piledriver is a step forward from the CPU cores used in Llano, largely thanks to a bunch of clean up work from the Bulldozer foundation.
Piledriver picks up where Bulldozer left off. Its fundamental architecture remains completely unchanged, but rather improved in all areas. Piledriver is very much a second pass on the Bulldozer architecture, tidying everything up, capitalizing on low hanging fruit and significantly improving power efficiency. If you were hoping for an architectural reset with Piledriver, you will be disappointed. AMD is committed to Bulldozer and that's quite obvious if you look at Piledriver's high level block diagram:
Each Piledriver module is the same 2+1 INT/FP combination that we saw in Bulldozer. You get two integer cores, each with their own schedulers, L1 data caches, and execution units. Between the two is a shared floating point core that can handle instructions from one of two threads at a time. The single FP core shares the data caches of the dual integer cores.
Each module appears to the OS as two cores, however you don't have as many resources as you would from two traditional AMD cores. This table from our Bulldozer review highlights part of problem when looking at the front end:
|Front End Comparison|
|AMD Phenom II||AMD FX||Intel Core i7|
|Instruction Decode Width||3-wide||4-wide||4-wide|
|Single Core Peak Decode Rate||3 instructions||4 instructions||4 instructions|
|Dual Core Peak Decode Rate||6 instructions||4 instructions||8 instructions|
|Quad Core Peak Decode Rate||12 instructions||8 instructions||16 instructions|
|Six/Eight Core Peak Decode Rate||18 instructions (6C)||16 instructions||24 instructions (6C)|
It's rare that you get anywhere near peak hardware utilization, so don't be too put off by these deltas, but it is a tradeoff that AMD made throughout Bulldozer. In general, AMD opted for better utilization of fewer resources (partially through increasing some data structures and other elements that feed execution units) vs. simply throwing more transistors at the problem. AMD also opted to reduce the ratio of integer to FP resources within the x86 portion of its architecture, clearly to support a move to the APU world where the GPU can be a provider of a significant amount of FP support. Piledriver doesn't fundamentally change any of these balances. The pipeline depth remains unchanged, as does the focus on pursuing higher frequencies.
Fundamental to Piledriver is a significant switch in the type of flip-flops used throughout the design. Flip-flops, or flops as they are commonly called, are simple pieces of logic that store some form of data or state. In a microprocessor they can be found in many places, including the start and end of a pipeline stage. Work is done prior to a flop and committed at the flop or array of flops. The output of these flops becomes the input to the next array of logic. Normally flops are hard edge elements—data is latched at the rising edge of the clock.
In very high frequency designs however, there can be a considerable amount of variability or jitter in the clock. You either have to spend a lot of time ensuring that your design can account for this jitter, or you can incorporate logic that's more tolerant of jitter. The former requires more effort, while the latter burns more power. Bulldozer opted for the latter.
In order to get Bulldozer to market as quickly as possible, after far too many delays, AMD opted to use soft edge flops quite often in the design. Soft edge flops are the opposite of their harder counterparts; they are designed to allow the clock signal to spill over the clock edge while still functioning. Piledriver on the other hand was the result of a systematic effort to swap in smaller, hard edge flops where there was timing margin in the design. The result is a tangible reduction in power consumption. Across the board there's a 10% reduction in dynamic power consumption compared to Bulldozer, and some workloads are apparently even pushing a 20% reduction in active power. Given Piledriver's role in Trinity, as a mostly mobile-focused product, this power reduction was well worth the effort.
At the front end, AMD put in additional work to improve IPC. The schedulers are now more aggressive about freeing up tokens. Similar to the soft vs. hard flip flop debate, it's always easier to be conservative when you retire an instruction from a queue. It eases verification as you don't have to be as concerned about conditions where you might accidentally overwrite an instruction too early. With the major effort of getting a brand new architecture off of the ground behind them, Piledriver's engineers could focus on greater refinement in the schedulers. The structures didn't get any bigger; AMD just now makes better use of them.
The execution units are also a bit beefier in Piledriver, but not by much. AMD claims significant improvements in floating point and integer divides, calls and returns. For client workloads these gains show minimal (sub 1%) improvements.
Prefetching and branch prediction are both significantly improved with Piledriver. Bulldozer did a simple sequential prefetch, while Piledriver can prefetch variable lengths of data and across page boundaries in the L1 (mainly a server workload benefit). In Bulldozer, if prefetched data wasn't used (incorrectly prefetched) it would clog up the cache as it would come in as the most recently accessed data. However if prefetched data isn't immediately used, it's likely it will never be used. Piledriver now immediately tags unused prefetched data as least-recently-used, allowing the cache controller to quickly evict it if the prefetch was incorrect.
Another change is that Piledriver includes a perceptron branch predictor that supplements the primary branch predictor in Bulldozer. The perceptron algorithm is a history based predictor that's better suited for predicting certain branches. It works in parallel with the old predictor and simply tags branches that it is known to be good at predicting. If the old predictor and the perceptron predictor disagree on a tagged branch, the perceptron's path is taken. Improving branch prediction accuracy is a challenge, but it's necessary in highly pipelined designs. These sorts of secondary predictors are a must as there's no one-size-fits-all when it comes to branch prediction.
Finally, Piledriver also adds new instructions to better align its ISA with Haswell: FMA3 and F16C.
Trinity features a much improved version of AMD's Turbo Core technology compared to Llano. First and foremost, both CPU and GPU turbo are now supported. In Llano only the CPU cores could turbo up if there was additional TDP headroom available, while the GPU cores ran no higher than their max specified frequency. In Trinity, if the CPU cores aren't using all of their allocated TDP but the GPU is under heavy load, it can exceed its typical max frequency to capitalize on the available TDP. The same obviously works in reverse.
Under the hood, the microcontroller that monitors all power consumption within the APU is much more capable. In Llano, the Turbo Core microcontroller looked at activity on the CPU/GPU and performed a static allocation of power based on this data. In Trinity, AMD implemented a physics based thermal calculation model using fast transforms. The model takes power and translates it into a dynamic temperature calculation. Power is still estimated based on workload, which AMD claims has less than a 1% error rate, but the new model gets accurate temperatures from those estimations. The thermal model delivers accuracy at or below 2C, in real time. Having more accurate thermal data allows the turbo microcontroller to respond quicker, which should allow for frequencies to scale up and down more effectively.
At the end of the day this should improve performance, although it's difficult to compare directly to Llano since so much has changed between the two APUs. Just as with Llano, AMD specifies nominal and max turbo frequencies for the Trinity CPU/GPU.
A Beefy Set of Interconnects
The holy grail for AMD (and Intel for that matter) is a single piece of silicon with CPU and GPU style cores that coexist harmoniously, each doing what they do best. We're not quite there yet, but in pursuit of that goal it's important to have tons of bandwidth available on chip.
Trinity still features two 64-bit DDR3 memory controllers with support for up to DDR3-1866 speeds. The controllers add support for 1.25V memory. Notebook bound Trinities (Socket FS1r2 and Socket FP2) support up to 32GB of memory, while the desktop variants (Socket FM2) can handle up to 64GB.
Hyper Transport is gone as an external interconnect, leaving only PCIe for off-chip IO. The Fusion Control Link is a 128-bit (each direction) interface giving off-chip IO devices access to system memory. Trinity also features a 256-bit (in each direction, per memory channel) Radeon Memory Bus (RMB) direct access to the DRAM controllers. The excessive width of this bus likely implies that it's also used for CPU/GPU communication as well.
IOMMU v2 is also supported by Trinity, giving supported discrete GPUs (e.g. Tahiti) access to the CPU's virtual memory. In Llano, you used to take data from disk, copy it to memory, then copy it from the CPU's address space to pinned memory that's accessible by the GPU, then the GPU gets it and brings it into its frame buffer. By having access to the CPU's virtual address space now the data goes from disk, to memory, then directly to the GPU's memory—you skip that intermediate mem to mem copy. Eventually we'll get to the point where there's truly one unified address space, but steps like these are what will get us there.
The Trinity GPU
Trinity's GPU is probably the most well understood part of the chip, seeing as how its basically a cut down Cayman from AMD's Northern Islands family. The VLIW4 design features 6 SIMD engines, each with 16 VLIW4 arrays, for a total of up to 384 cores. The A10 SKUs get 384 cores while the lower end A8 and A6 parts get 256 and 192, respectively. FP64 is supported but at 1/16 the FP32 rate.
As AMD never released any low-end Northern Islands VLIW4 parts, Trinity's GPU is a bit unique. It technically has fewer cores than Llano's GPU, but as we saw with AMD's transition from VLIW5 to VLIW4, the loss didn't really impact performance but rather drove up efficiency. Remember that most of the time that 5th unit in AMD's VLIW5 architectures went unused.
The design features 24 texture units and 8 ROPs, in line with what you'd expect from what's effectively 1/4 of a Cayman/Radeon HD 6970. Clock speeds are obviously lower than a full blown Cayman, but not by a ton. Trinity's GPU runs at a normal maximum of 497MHz and can turbo up as high as 686MHz.
Trinity includes AMD's HD Media Accelerator, which includes accelerated video decode (UVD3) and encode components (VCE). Trinity borrows Graphics Core Next's Video Codec Engine (VCE) and is actually functional in the hardware/software we have here today. Don't get too excited though; the VCE enabled software we have today won't take advantage of the identical hardware in discrete GCN GPUs. AMD tells us this is purely a matter of having the resources to prioritize Trinity first, and that discrete GPU VCE support is coming.
Mobile Trinity Lineup
Trinity is of course coming in two flavors, just like Llano before it. On the desktop, we’ll have Virgo chips, but those are coming later this year (around Q3); right now, Trinity is only on laptops. On laptops the codename for Trinity is Comal. AMD has also dropped wattages on their mobile flavors, so where Llano saw 35W and 45W mobile parts, with Comal AMD will have 17W, 25W, and 35W parts. (The desktop Trinity chips will apparently retain their 65W and 100W targets.) There aren’t a ton of mobile Trinity chips launching today; instead, AMD has five different APUs and each one targets a distinct market segment. Here’s the quick rundown:
|AMD Trinity A-Series Fusion APUs for Notebooks|
|“Piledriver” CPU Cores||4||4||2||4||2|
|CPU Clock (Base/Max)||2.3/3.2GHz||1.9/2.8GHz||2.7/3.2GHz||2.0/2.8GHz||2.1/2.6GHz|
|L2 Cache (MB)||4||4||1||4||2|
|Radeon Model||HD 7660G||HD 7640G||HD 7520G||HD 7620G||HD 7500G|
|GPU Clock (Base/Max)||497/686MHz||497/655MHz||497/686MHz||360/497MHz||327/424MHz|
As a Bulldozer-derived architecture, Trinity uses CPU modules that each contain two Piledriver CPU cores with a shared FP/SSE (Floating Point) unit. From one perspective, that makes Trinity a quad-core or dual-core processor; others would argue that it’s not quite the same as a “true” quad-core setup. We’re not going to worry too much about the distinction here, though, as we’ll let the performance results tell that story. Compared to Llano’s K10-derived CPU core, clock speeds in Trinity are substantially higher—both the base and Turbo Core clocks. The top-end A10-4600M has a base clock that’s 53% higher than the 1.5GHz A8-3500M we reviewed when Llano launched, while maximum turbo speeds are up 33%. Unfortunately, while clock speeds might be substantially higher, Trinity’s Piledriver cores have substantially longer pipelines than Llano’s K10+ cores; we’ll see in the benchmarks what that means for typical performance.
The GPU side of the equation is are also substantially different from Llano. Llano used a Redwood GPU core (e.g. Radeon 5600 series) with a VLIW5 architecture (e.g. the Evergreen family of GPUs), and the various APUs had either 400, 320, or 240 Radeon cores. Trinity changes out the GPU core for a VLIW4 design (Northern Islands family of GPU cores), and this is the only time we’ve seen AMD use VLIW4 outside of the 6900 series desktop GPUs. The maximum number of Radeon cores is now 384, but we should see better efficiency out of the design, and clock speeds are substantially higher than on Llano—the mobile clocks are typically 55-60% higher. Again, how this plays out in terms of actual performance is something we’ll look at momentarily.
Looking at the complete lineup of Trinity APUs, it’s interesting to see AMD using a new A10 branding for the top models while overlapping the existing A8 and A6 brands on lower spec models. We only have the A10-4600M in for testing right now, but AMD provided some performance estimates for the various performance levels. The A10-4600M delivers 56% better graphics performance and 29% better “productivity” performance than the A8-3500M—note that we put productivity in quotes because it’s not clear if AMD is talking specifically about CPU performance or some other metric. The new A8-4500M delivers 32% faster graphics performance than the A8-3500M and 19% higher productivity, which appears to be why it gets the same “A8” classification. Finally, even the single-module/dual-core A6-4400M delivers 16% better graphics than the A8-3500M and 5% higher productivity. I suspect that the various percentages AMD lists are more of an “up to” statement as opposed to being typical performance improvements, as it seems unlikely that 192 VLIW4 cores at 686MHz could consistently outperform 400 VLIW5 cores at 444MHz.
If we consider target markets, the A10-4600M will be the fastest Trinity APU for now, and it should go into mainstream laptops that will provide a well rounded experience with the ability for moderate gaming along with any other tasks you might want to run. The A8-4500M takes a pretty major chunk out of the GPU (one third of the GPU cores are gone along with a slight drop in maximum clock speed) while maintaining roughly 80% of the CPU performance, so it can fit into slightly cheaper laptops but will likely drop gaming performance from “moderate” to “light”. The A6-4400M ends up as the extreme budget offering, with higher clocks on the CPU making up for the removal of two cores; the GPU likewise gets a slight trim relative to the A8-4500M, and we’re now down to half the graphics performance potential of the A10-4600M. All of the standard voltage parts support up to DDR3-1600 memory, with low voltage DDR3-1600 and ultra low voltage DDR3-1333 also supported.
The other two APUs are low voltage and ultra low voltage parts, which should work well in laptops like HP’s “sleekbooks”—basically, they’re for AMD-based alternatives to ultrabooks. The A10-4655M has about 87% of the CPU performance potential of the A10-4600M, with 70% of the GPU performance potential, and it can fit into a 25W TDP. The A6-4455M drops the TDP to 17W, matching Intel’s ULV parts, but again the CPU and GPU cores get cut. This time we get two Piledriver cores, 256 Radeon cores, and lowered base and maximum clock speeds. The low/ultra low voltage parts also drop support for DDR3-1600 memory, moving all RAM options down one step to DDR3-1333, low voltage DDR3-1333 and ultra low voltage DDR3-1066.
The final piece of the puzzle for any platform is the chipset. AMD is using their A70M (Hudson M3) chipset, which is the same chipset used for Llano. That’s not really a problem, though, as the chipset provides everything Trinity needs: it has support for up to six native SATA 6Gbps ports, four USB 3.0 ports (and 10 USB 2.0 ports), RAID 0/1 support, and basically everything else you need for a mainstream laptop. PCI Express support in Trinity remains at PCIe 2.0, but that’s not really a problem considering the target market. PCIe 3.0 has been shown to improve performance in some GPGPU workloads with HD 7970, but that’s a GPU that provides nearly an order of magnitude more compute power (over 7X more based on clock speeds and shader count alone).
That takes care of the overview of AMD’s Mobile Trinity lineup, and Anand has covered the architectural information, so now it’s time to meet our prototype AMD Trinity laptop.
Meet the AMD Trinity/Comal Prototype Laptop
So I have to be honest: I’m a sucker for unique laptops. Not so much from the standpoint of actually using such laptops, but just as something cool to show my fellow computer nerds when they visit. The Trinity prototype is quite clearly a design that isn’t going to market without some changes, but unlike the Llano prototype (or the Intel SNB prototype), at least this one tries to stand out from the crowd a little bit. AMD has gone all-in on branding, with the AMD logo featured prominently on the cover, below the LCD on the bezel, and at the top-left of the keyboard. None of that makes the design any better from a functionality standpoint, but it’s still a cool tchotchke:
The bottom of the laptop is full of the usual warning about how the laptop may not meet regulatory requirements (and if you think that sticker is bad, you should see some of the dire warnings in the documentation for another prototype I’ve got hanging about waiting for the NDA to lift!). There’s also a bold “Prototype System” label, and the Blu-ray drive is clearly of a not-for-resale nature, with a fascia that doesn’t line up with the laptop shell. None of this affects the performance of the laptop, but it’s a nice diversion for what is otherwise an unremarkable system. In terms of specifications, just for completeness’ sake here’s the full rundown of the system components:
|AMD Trinity Prototype Laptop Specifications|
(Dual-module/quad-core 2.30-3.20GHz, 4MB L2, 32nm, 35W)
|Chipset||AMD A70M (Hudson M3)|
4GB (2x2GB) DDR3-1600 Samsung
8GB (2x4GB) DDR3-1600 Hynix
Radeon HD 7660G
(384 Radeon Cores, up to 686MHz)
14" WLED Matte 16:9 768p (1366x768)
(AU Optronics B140XW02)
128GB Samsung 830 SSD
240GB Intel 520 SSD
|Optical Drive||Blu-ray Combo Drive (PLDS DS-6E2SH)|
Gigabit Ethernet (Realtek 8168/8111)
802.11n WiFi (Broadcom BCM4313 2x2:2 MIMO, 2.4GHz)
Bluetooth 2.1 (Broadcom BCM2070)
Headphone and microphone jacks
Capable of 5.1 digital output (HDMI)
6-cell, 11.1V, >4.84Ah, ~56Wh
90W Max AC Adapter
WiFi On/Off Switch
2 x USB 3.0
1 x USB 2.0/eSATA Combo
AC Power Connection
Memory Card Reader
1 x USB 2.0
|Operating System||Windows 7 Ultimate 64-bit|
13.33" x 9.53" x 1.16-1.34" (WxDxH)
(339mm x 242mm x 29.5-34.0mm)
|Weight||4.7 lbs (2.14kg)|
Flash reader (MMC/MS/SD)
Everything is pretty much standard fare these days, though it’s interesting that AMD chose to ship us a laptop with an SSD drive instead of a regular HDD. You’ll note that we list two SSDs as well as two sets of memory; the reason is that we performed additional performance testing with hardware that’s slightly different than AMD’s shipping configuration. We wanted to make our comparisons with other laptops more apples-to-apples, so we used the memory from the Ivy Bridge laptop we recently reviewed to see if doubling the RAM made any difference for our benchmarks—it didn’t. We also tested five different laptops with a 240GB Intel 520 SSD, just to level the playing field for tests like PCMark.
The PCMark scores for the Samsung 830 and Intel 520 are within 1% of each other, and for most systems it’s really going to come down to a question of whether you have an SSD or not rather than what specific SSD you’re using. You may (or may not) be surprised to hear that the bigger impact from the SSD came in the area of battery life. The ASUS N56VM battery life remained essentially unchanged with the Intel 520 instead of the original 750GB 7200RPM Seagate HDD, so if you expect any SSD to improve battery life you might be surprised by that result. The other surprise was just how much of a difference there was between the Samsung 830 and Intel 520 SSDs in the Trinity laptop: the Samsung 830 improved battery life by nearly 10% in two out of three tests (and by 3% in the H.264 playback test). A quick look at the idle power consumption results from our SSD Bench provides the answer, of course: the 128GB Samsung 830 uses just 0.38W at idle compared to 0.82W for the 240GB Intel 520. For a desktop, it’s hardly worth mentioning, but for laptops that nearly half a watt definitely shows up.
We could complain about the usual items like build and LCD quality—neither one is particularly impressive for this test laptop—but they really don’t matter since this isn’t a retail sample. For the intended purpose, the laptop works fine—fix the optical drive bezel and I’m sure there would even be some enthusiasts interested in owning a piece of genuine AMD laptop kit. But since that’s not going to happen, let’s move on from the laptop and run some actual performance tests.
Before we get to the charts, let’s quickly discuss the list of laptops we’ve selected for this review. There’s always some debate and outcry over what we include/omit in the charts, which is one of the reasons we have Mobile Bench—you can perform any head-to-head comparison there if you’d like. With well over 100 laptop results in our Mobile Bench database, sifting through the complete charts can be a bit of a nightmare, so for our articles we try to prune things down. I settled on ten laptops for the majority of our charts, with an attempt to represent most of the interesting data points.
Naturally we have AMD’s Trinity prototype (highlighted in red), and to go along with the newest and latest hardware we’ve also included results from Intel’s quad-core Ivy Bridge notebook (in dark green). It’s important to consider that these two laptops do not target the same market: we expect the ASUS N56VM to sell for around $1200 with the tested configuration, whereas AMD’s Trinity laptops will hopefully be closer to half that price—obviously, without shipping hardware we really don’t know what OEMs will end up charging for Trinity. To fill in the rest of the charts, we have two AMD Llano laptops (orange)—one the original AMD prototype, only this time equipped with an SSD, and the second a standard Toshiba Satellite P755D. We’ve also got two primary Sandy Bridge comparisons (light green): one is the prototype quad-core i7-2820QM, and the second is a retail Dell Vostro V131 with i5-2410M; the only catch is that we retested both systems with the Intel 520 SSD.
Rounding out the rest of the selections, we have three ultrabooks: the Acer TimelineU with NVIDIA GT 640M graphics, a Dell XPS 13 with i7-2637M, and a Toshiba Z830 with i3-2367M. All three of these come with SSDs, and we thought it would be interesting to show where Trinity falls relative to the low and high marks set by Sandy Bridge ultrabooks. The last laptop in the list is Sony’s VAIO SE, which has switchable graphics with AMD’s HD 6630M. Given the i7-2640M CPU, the VAIO SE should give a pretty clear look at the maximum performance you can get from the discrete Radeon HD 6630M GPU, so we’ll be able to see if/when Trinity’s HD 7660G comes out ahead of previous generation mobile GPUs. All four of these laptops are in blue—our default “don’t pay too much attention to me” color.
AMD Trinity General Performance
Starting as usual with our general performance assessment, we’ve got several Futuremark benchmarks along with Cinebench and x264 HD encoding. The latter two focus specifically on stressing the CPU while PCMarks will cover most areas of system performance (including a large emphasis on storage) and 3DMarks will give us a hint at graphics performance. First up, PCMark 7 and Vantage:
As noted earlier, we ran several other laptops through PCMark 7 and PCMark Vantage testing using the same Intel 520 240GB SSD, plus all the ultrabooks come with SSDs. That removes the SSD as a factor from most of the PCMark comparisons, leaving the rest of the platform to sink or swim on its own. And just how does AMD Trinity do here? Honestly, it’s not too bad, despite positioning within the charts.
Obviously, Intel’s quad-core Ivy Bridge is a beast when it comes to performance, but it’s a 45W beast that costs over $300 just for the CPU. We’ll have to wait for dual-core Ivy Bridge to see exactly how Intel’s latest stacks up against AMD, but if you remember the Llano vs. Sandy Bridge comparisons it looks like we’re in for more of the same. Intel continues to offer superior CPU performance, and even their Sandy Bridge ULV processors can often surpass Llano and Trinity. In the overall PCMark 7 metric, Trinity ends up being 20% faster than a Llano A8-3500M laptop, while Intel’s midrange i5-2410M posts a similar 25% lead on Trinity. Outside of the SSD, we’d expect Trinity and the Vostro V131 to both sell for around $600 as equipped.
A 25% lead for Intel is pretty big, but what you don’t necessarily get from the charts is that for many users, it just doesn’t matter. I know plenty of people using older Core 2 Duo (and even a few Core Duo!) laptops, and for general office tasks and Internet surfing they’re fine. Llano was already faster in general use than Core 2 Duo and Athlon X2 class hardware, and it delivered great battery life. Trinity boosts performance and [spoiler alert!] battery life, so it’s a net win. If you’re looking for a mobile workstation or something to do some hardcore gaming, Trinity won’t cut it—you’d want a quad-core Intel CPU for the former, and something with a discrete GPU for the latter—but for everything else, we’re in the very broad category known as “good enough”.
When we start drilling down into other performance metrics, AMD’s CPU performance deficiency becomes pretty obvious. The Cinebench single-threaded score is up 15% from 35W Llano, but in a bit of a surprise the multi-threaded score is basically a wash. Turn to the x264 HD encoding test however and Trinity once again shows a decent 15% improvement over Llano. Against Sandy Bridge and Ivy Bridge, though? AMD’s Trinity doesn’t stand a chance: i5-2410M is 50% faster in single-threaded Cinebench, 27% faster in multi-threaded, and 5-10% faster in x264. It’s a good thing 99.99% of laptop users never actually run applications like Cinebench for “real work”, but if you want to do video encoding a 10% increase can be very noticeable.
Shift over to graphics oriented benchmarks and the tables turn once again...sort of. Sandy Bridge can’t run 3DMark11, since it only has a DX10 class GPU, but in Vantage Performance and 3DMark06 Trinity is more than twice as fast as HD 3000. Of course, Ivy Bridge’s HD 4000 is the new Intel IGP Sheriff around these parts, and interestingly we see Trinity and i7-3720QM basically tied in these two synthetics. (We’ll just ignore 3DMark Vantage’s Entry benchmark, as it’s so light on graphics quality that we’ve found it doesn’t really stress most GPUs much—even low-end GPUs like HD 3000 score quite well.) We’ll dig into graphics performance more with our gaming benchmarks next.
AMD Trinity Gaming Performance
After the 3DMark results, you might be wondering if Intel has finally caught up to AMD in terms of integrated graphics performance. The answer is…yes and no. Depending on the game, there are times where a fast Ivy Bridge CPU with HD 4000 will actually beat out Trinity; there are also times where Intel’s IGP really struggles to keep pace. The good news is that at least everyone is now onboard the DX11 bandwagon, and compatibility with games has improved yet again for Intel. Here are our “Value” benchmark results for seven recent games; we’ll have more information in a moment.
Out of our seven test titles, AMD’s Trinity leads any other IGP in four titles by a large margin. The other three titles actually have Ivy Bridge slightly ahead of Trinity, but the gaps aren’t nearly as big. Overall, the average performance across the seven games at our Value (medium) settings has AMD’s Trinity A10-4600M leading Intel’s i7-3720QM by 21%, and if we look at quad-core Sandy Bridge with HD 3000 (i7-2820QM) Trinity is 72% faster. Trinity is also around 20% faster than 35W Llano on average.
Let’s expand our gaming suite just a bit to see if things change, though. Just like we did with Ivy Bridge, we ran the eight games in our previous benchmark suite at medium detail settings. We can then compare performance across a wider 15 title selection to see how Trinity matches up against HD 4000, HD 3000, and HD 6620G (Llano). We’ll start with the bottom (HD 3000/Sandy Bridge) and move up.
Llano’s HD 6620G was already faster than HD 3000, and Trinity’s HD 7660G is faster than Llano, so the Sandy Bridge gaming matchup is a landslide victory in AMD’s favor. The closest Intel can get is in the same three titles where Ivy Bridge leads Trinity: Batman: Arkham City, DiRT 3, and Skyrim. Here, however, HD 3000 can’t actually close the gap and HD 6620G is at least 20% faster than HD 3000, with an average performance improvement of nearly 80%.
We found that across the same selection of 15 titles, Ivy Bridge and Llano actually ended up “tied”—Intel led in some games, AMD in others, but on average the two IGPs offered similar performance. This chart and the next chart will thus show a similar average increase in performance for Trinity, but the details in specific games are going to be different. Starting with Ivy Bridge and HD 4000, as with our earlier game charts we see there are some titles where Intel leads (Batman and Skyrim), a couple ties (DiRT 3 and Mass Effect 2), and the remainder of the games are faster on Trinity. Mafia II is close to our <10% “tie” range but comes in just above that mark, as do Left 4 Dead 2 and Metro 2033. The biggest gap is Civilization V, where Intel’s various IGPs have never managed good performance; Trinity is nearly twice as fast as Ivy Bridge in that title. Overall, it's a 20% lead for Trinity vs. quad-core Ivy Bridge.
Against Llano, Trinity is universally faster, but the smallest gap is in Mafia II (3%) while the largest gap is in StarCraft II (30%). On average, looking at these games Trinity is only 18% faster than Llano. What’s not entirely clear from the above chart is whether we’re hitting CPU limitations, memory bandwidth limitations (remember that Llano and Trinity share bandwidth with the rest of the system), or perhaps both. At our chosen settings, what is clear is that Trinity’s “up to 56% faster” graphics never make it that high.
We saw 35-45% higher scores in 3DMark 11 and Vantage, which tend to remove the CPU from the equation more than actual games, so our guess would be that if AMD continues with their APU plan they’re going to need to work more on the CPU side of the equation. We also see the same thing looking at the VAIO SE scores in the earlier gaming charts: the HD 6630M scores are 20% faster on average, but much of that appears to come from the faster CPU rather than the GPU.
AMD’s Heterogeneous Computing with Trinity
It’s not all about just CPU or GPU performance, though—or at least that’s what we’ve been hearing from various parties for a while now. The real question is how a platform performs as a whole. There are some tasks where pure CPU performance is what really matters, and there are other tasks where the parallel nature of GPUs pays serious dividends. AMD (and NVIDIA) has been pushing for more applications to make use of the GPU for tasks where it can provide a lot of number crunching prowess.
With Trinity, AMD provided us with a selection of applications that now leverage—to varying degrees—AMD’s App Acceleration, OpenCL, OpenGL, or other tools. For some of these applications, we don’t have any good way of measuring performance across a wide selection of hardware, and for some of those where benchmarks are possible I’ve run out of time to try to put anything concrete together. I don’t want to skip this section entirely, so what follows is a list of the applications, how they benefit from heterogeneous compute, and some general impressions of the application. We also have graphs for a few of the applications where performance seemed to matter the most.
Adobe Flash 11.2—The latest version of Flash continues to add GPU acceleration features, and now there are 3D hooks in addition to the video offload acceleration we first saw with Flash 10.x. There’s not too much of note here, as NVIDIA and Intel also support the latest features of Flash 11.2. Flash works fine on Trinity, but the same goes for Ivy Bridge and various NVIDIA GPUs. If you never saw the Epic Citadel demo for iOS or Android, there’s now a Flash-based version of the same demo that will run in your browser. (Warning: that link can take 10-15 minutes on a decent connection to download all the textures and other data!) Epic Citadel looks just as nice as it did on iOS, but now we need some actual games to take advantage of the tools. Then perhaps we can start looking into benchmarks of browser games or something….
Adobe Photoshop CS6—Photoshop started to take advantage of GPU acceleration back with the CS4 release, using OpenGL to improve performance on certain filters and features. With CS6, Adobe has begun using OpenCL. Fundamentally, I’m not sure how big of a change this represents, but there are quite a few functions in Photoshop that are now supposed to be faster/better with an OpenCL compatible graphics card. There are also two new features that leverage OpenCL; one is Iris Blur, which allows you to mimic depth of field using Photoshop instead of your camera, and the other is Liquify. Unfortunately, I’m by no means a Photoshop expert, so I’m not sure how much the features really help “power users”. I did try doing a benchmark of general Photoshop CS6 performance using the Photoshop Retouch benchmark with and without GPU acceleration enabled; unfortunately, it looks like most of the filters in that action script don’t benefit from the GPU acceleration, as the scores I got were essentially unchanged with or without GPU/OpenCL enabled. Overall, I’ll take the GPU acceleration, but for most of what I do in Photoshop it doesn’t appear to benefit; if you’re interested, you can read more about AMD’s work with Adobe.
GNU Image Manipulation Program (GIMP)—Going along with Photoshop CS6, AMD provided a special preview build of GIMP 2.8. GIMP is sort of the poor man’s Photoshop, as it’s completely free. At present, there are 19 filters that utilize OpenCL to speed of processing, and over the coming months as the release version of GIMP looks to take their new engine live there will undoubtedly be more additions. For now, probably only five of the filters are things I would use (e.g. noise reduction, maybe a light blur). I tested several of these, and there is sometimes an order of magnitude speedup vs. doing the work on just the CPU. The problem is that it also looks like GIMP isn't incredibly well threaded in many of these tasks, putting multicore CPUs at a disadvantage. My biggest complaint isn’t even about performance, though; sadly, I just find the GIMP UI and general performance to be really bad compared to Photoshop. I've tried several times over the years to use GIMP instead of Photoshop, but I’ve never felt comfortable with the tool. If on the other hand you prefer GIMP, hopefully when the current GEGL menu gets integrated into the main program you’ll realize a healthy performance boost.
ArcSoft MediaConverter 7.5—MediaConverter should be a familiar name by now if you’ve been following our reviews, as it’s one of the showcase titles for Intel’s Quick Sync transcoding. When we reviewed Ivy Bridge last month, we found that on Llano at least the version of MediaConverter we had ran slower on the GPU than on the CPU; with Trinity on the other hand, enabling GPU acceleration results in times that are about 60% faster than the CPU alone. That’s a good performance increase, but we’re looking at 154 seconds on the CPU compared to 98 seconds using the GPU. In contrast, dual-core Sandy Bridge on CPU transcoding took 127 seconds and with Quick Sync it only took 28 seconds—a 5X improvement. Quad-core Ivy Bridge was just as impressive, going from 68 seconds on the CPU down to 16 seconds with Quick Sync (4.25X). We’ve been hoping to see something more from AMD’s new Video Codec Engine (VCE), first announced over six months ago with HD 7970, but unless there’s substantial room for improvement it looks like Intel’s Quick Sync will continue to be the fastest transcoding tool for now.
CyberLink MediaEspresso 6.5—This tool is very similar to MediaConverter, and the results are also better this time around. We measured the assisted encode time at 74 seconds compared to 135 seconds on the CPU alone. The 74 second transcode time actually makes Trinity potentially faster than CPU-based transcoding on dual-core Sandy Bridge, but again Quick Sync (25 seconds on SNB, 12 seconds on IVB) remains the fastest way to transcode. Considering both of these tools are apparently using VCE, I have to state that I’m disappointed; with VCE I was expecting performance similar to what Intel is getting with Quick Sync—four or five times faster than CPU-based encoding for the same APU. That Trinity isn't quite twice as fast with VCE is unfortunate; even though there's a decent improvement, Intel is in a completely different category of performance. We’ll have to wait and see if anything more develops with VCE.
Handbrake— Yep, this popular open source video transcoding app is getting an OpenCL facelift. Check out our separate post on it here.
WinZip 16.5—This final application is one that I can see being very useful, assuming we see similar advancements in other compression utilities. WinZip 16.5 now supports OpenCL to improve compression times. We tested by compressing the entire Cinebench 11.5 directory with and without OpenCL enabled, and we also compared the results with 7-Zip. On Trinity, performance improved by about 20%, which is decent; Llano sees an even larger 28% improvement. Meanwhile, Sandy Bridge using CPU-based compression is about as fast as Trinity with OpenCL, and Ivy Bridge is still faster, but the 20% increase for “free” is nothing to scoff at. Unfortunately for WinZip, 7-Zip compressed the same directory to 95MB vs. 108MB in roughly the same time as the non-OpenCL WinZip, and 7-Zip is completely free and doesn't nag you and tell you to buy it. Where WinZip 16.5 is a good proof of concept, what will really help AMD is if all the other compression utilities (7-Zip, WinRAR, etc.) all start using OpenCL or other tools to improve performance.
The majority of the applications continue to focus on video and image manipulation, likely because those are areas where the parallel nature of GPUs can be readily utilized. WinZip on the other hand is an application showing other potential uses for GPGPU and heterogeneous compute. We’d love to see even more adoption of OpenCL and similar tools, but the stark reality is that coming up with new and useful ways of doing this is difficult—if it were easy, everyone would do it! The good news is that giving the creative people of the world more tools with which to work can only help, and we’ll just have to wait and see what else comes out.
There’s another interesting sidebar worth mentioning here. OpenCL is an open standard, and the latest Intel drivers actually install an OpenCL driver on Ivy Bridge and Sandy Bridge. Not surprisingly, not all implementations are created equal, so even with Intel’s drivers we couldn’t enable OpenCL in Photoshop or WinZip; GIMP on the other hand apparently worked okay with OpenCL on Intel—we measured a 5X performance improvement of the Noise Reduction filter with Ivy Bridge. Trinity also came in slightly faster with both leveraging OpenCL, while Intel was nearly twice as fast without.
AMD Trinity: Battery Life Also Improved
With all of the changes going into Trinity, one thing that hasn’t changed since Llano is the process technology. Trinity is once again coming on a 32nm process from GlobalFoundries. If we were talking about Intel, Trinity would represent a “Tock” on the roadmap—a new architecture on an existing process. We’ve looked at CPU and GPU performance, and this is a part that’s pretty much universally faster than its predecessor. Given the lengthier pipeline and Bulldozer-derived architecture, I admit that I was concerned Trinity might actually be a step back for battery life; it appears that my fears were unfounded, largely due to the improvements in Piledriver. As usual, we tested with all laptops set to 100 nits brightness in our idle, Internet, and H.264 playback tests. I also ran some additional tests which we’ll discuss in a moment. First, here are the standard battery life results:
With a similar capacity battery to the original Llano laptop, and the same size 14” panel, Trinity comes out of the gates and posts two clear wins: idle battery life and Internet battery life are both up substantially relative to Llano. In fact, looking at the normalized charts, the only laptops that can consistently beat Trinity are found in Sandy Bridge ultrabooks—we won’t even bother discussing Atom or Brazos netbooks, as they’re competing in a completely different performance bracket. In something of a surprise, H.264 battery life doesn’t see the same benefit unfortunately, and it’s the one discipline where Llano still holds on to a slight lead over Trinity. Sandy Bridge meanwhile has always done very well in H.264 battery tests, and we see that with the Vostro V131 posting a normalized score that’s 30% better than Trinity and Llano. Of course, on the other end of the spectrum we have Ivy Bridge; we’ve only looked at one Ivy Bridge laptop so far, but if the pattern holds than Ivy Bridge will generally be a moderate step back in battery life relative to Sandy Bridge, giving AMD an even larger lead in this area.
We also performed a few other tests that we won’t present in graph form. One set of tests we alluded to earlier: the charts show Trinity with a Samsung 830 SSD, but we also ran tests with an Intel 520 SSD. Idle battery life dropped to 476 minutes (an 8% decrease), Internet battery life checked in at 371 minutes (down 8% again), and H.264 battery life stayed nearly the same at 217 minutes (down less than 3%). If battery life is one of your primary concerns, remember: all SSDs are not created equal!
Another test that we ran is simulated gaming; we looped the four graphics tests in 3DMark06 at 1366x768 until the battery ran out. We’ve run this same test on quite a few other laptops, and Llano initially looked to be far and away the best solution. Later, we discovered that when we tested Llano we were letting the GPU run in power saving mode—basically half the performance you’d get compared to being plugged in. We retested and measured 98 minutes, so the extra graphics performance comes with a heavy cost. We only tested Trinity (and Ivy Bridge and Sandy Bridge) using higher performance graphics settings, and this is one more area where it scores worse than Llano: Trinity managed just 77 minutes. That’s about the same as Ivy Bridge and Sandy Bridge (79 and 73 minutes, respectively), so if you’re after better gaming performance while running off the mains, you might need to keep looking.
Before getting too carried away with the above results, you still need to consider how important battery life is for your usage model. Some people travel a lot and like to go all day without plugging in; others will go from place to place and plug in whenever they’re not on the go. If you fall in the latter category, battery life isn’t usually a problem with any decent laptop, while those looking for all-day computing will definitely want as much mobility as possible. Ultimately, battery life is a factor of battery capacity as well as power optimizations done by the OEMs. We’ve seen battery life improve by as much as 50% when comparing two otherwise similar notebooks, but at least AMD’s reference platform for Trinity delivers a great starting point.
Temperatures and Acoustics
One other item we wanted to quickly touch on is system temperatures. We typically use HWMonitor and check temperatures of laptops under idle and load conditions. We did this with Trinity as well, but unfortunately the current version of HWMonitor doesn’t give us a lot of information. The only temperatures it reports are from the SSD and the HD 7660G graphics—there’s nothing about CPU core temperatures. That means we can’t provide much detail, other than to say that load temperature on the GPU topped out at 71C during extended testing, while the idle temperature was 39C. As usual, temperatures and noise levels go hand in hand, and the low 71C maximum GPU temperature matches up nicely with noise levels that never got above 37dB. It’s not the quietest laptop we’ve ever tested, and surface temperatures can get a little warm, but overall Trinity looks to be a good balance of performance and power requirements, which means quiet laptops are definitely possible.
Conclusion: What Makes a Trinity?
I have often wondered about where AMD came up with the codename Trinity (other than the river name, of course). Was it a reference to this being AMD’s third APU? Or maybe AMD was gunning for the Holy Trinity of Performance, Battery Life, and Cost—get wins in all three areas and you’d have a guaranteed best seller! If that’s what AMD was hoping to accomplish, they’ve got a good foundation but we’ll need to see what the laptop OEMs come up with before issuing a final verdict.
To recap, Trinity is AMD’s continued journey down the path they started with Llano. Both CPU and GPU performance have improved over Llano. The general purpose CPU performance gap vs. Intel is somewhere in the 20—25% range, while the GPU advantage continues to be significantly in AMD's favor. It is surprising that Intel's HD 4000 is able to win even in some tests, but overall AMD continues to deliver better GPU performance even compared to Ivy Bridge. It's worth pointing out that the concerns about AMD's battery life from a few years ago are now clearly put to rest. At least at the TDPs we've tested, AMD is easily competitive with Intel on battery life.
AMD's GPU accelerated software lineup this time around is significantly better than it was with Llano, but we're still not quite where we need to be yet. I will hand it to AMD though, progress is clearly being made. Battery life is generally a step forward vs. Llano, which is more than we've been able to say about Ivy Bridge thus far.
The improvements in Piledriver really appear to have saved Trinity. What was a very difficult to recommend architecture in AMD's FX products has really been improved to the point where it's suitable for mobile work. AMD couldn't push performance as aggressively as it would have liked given that it's still on a 32nm process and the APU needs to make money. A move to 2x-nm could help tremendously. Similarly the move to a more efficient VLIW4 GPU architecture and additional tuning helped give AMD a boost in GPU performance without increasing die size. Overall, Trinity is a very well designed part given the process constraints AMD was faced with.
As a notebook platform, Trinity's CPU performance isn’t going to set any new records but it’s certainly fast enough for most users; battery life isn’t at the head of the class, but it’s better than just about anything that doesn’t qualify as an ultrabook; and finally there’s the question of cost. That last item isn’t really in AMD’s control, as the final cost of a laptop is a product of many design decisions, so let’s do some quick investigation into laptop pricing.
If you figure on memory, motherboard, chassis, LCD, and storage as all being the same, a typical laptop will have a starting price point of around $300—for a cheap, injection molded plastic shell, 4GB RAM, a 5400RPM HDD, a 1366x768 TN panel, and a no-frills feature set. Take that same basic platform and you can make an Intel laptop and have a BoM (Bill of Materials) cost of around $450, or you can make an AMD laptop and your BoM might start at $400. Depending on what other upgrades an OEM makes, as well as marketing, R&D, and profit, and we end up at a final price tag that might be $600 for a Trinity laptop compared to $700 for an Ivy Bridge laptop. The problem is that AMD doesn't just compete against vanilla Ivy Bridge; it has to compete against all the existing laptops as well.
Right now, Llano A8 laptops at Newegg have a starting price of $480 for an A8-3500M Acer Aspire, and they range up to $700 for a 17.3” HP dv7. The highest performance laptop of the bunch is probably Samsung’s Series 3, which uses an A8-3510MX APU and goes for $680. I suspect we’ll see similar pricing for Trinity laptops. On the Sandy Bridge Core i3/i5 side of the fence, Newegg has a much larger selection of laptops, starting at $430 for a Lenovo G570, $550 for the cheapest Core i5 model (again from Acer), and going up to $680 or more for laptops with Core i5 and NVIDIA Optimus graphics. Or if you prefer some place other than Newegg, you can find Core i5-2450M with GT 540M in Acer’s AS4830TG for $600.
That pretty much defines the maximum price we should expect people to pay for Trinity, as Core i5 with Optimus will deliver better CPU and GPU performance based on our test results. Obviously there are other factors to consider, like build quality of the laptop(s), display quality, battery life, and features, but most people shopping for an inexpensive laptop are going to be looking at cost first and features second. On the other hand, if you want style as a consideration, HP’s new sleekbooks will have Trinity versions starting at $600 for 15.6” and $700 for 14”—though it’s not clear which APU you’ll get at those prices. As long as last-generation Sandy Bridge laptops are at clearing house prices, though, AMD’s partners are going to need to be under $600 for something like the A10-4600M laptop we’re reviewing today. Assuming they can manage that, Trinity should see plenty of volume with the back to school season coming over the next few months.
For those who are interested in more than just the bottom line, as usual the best laptop for you may not be the best laptop for everyone. Trinity in a 14” form factor like our prototype would make for a great laptop to lug around campus for a few years. It would be fast enough for most tasks, small enough to not break your back, battery life would be long enough to last through a full day of classes, and the price would be low enough to not break your bank. And if mom and dad are footing the bill, you even get to disguise the fact that it’s a gaming capable laptop by not having a discrete GPU specifically called out on the features list. On the other hand, if you’re after a higher performance laptop or you want a “real” gaming system—something that can hand high detail settings at 1600x900 for instance—your best bet continues to be laptops with an Intel CPU and a discrete GPU from NVIDIA, at least of the GT 640M level—I’d say AMD GPUs as well, but I’m still waiting for a better switchable graphics solution.
At this point, AMD has done everything they can to provide a compelling mobile solution. The difficulty is that there's no longer a single laptop configuration that will be "best" for everyone, and Trinity only serves to further muddy the water. Intel continues to offer better CPU performance, and if you need graphics—which mostly means you want to play games—they have a good partner with NVIDIA. AMD on the other hand is delivering better integrated graphics performance with less CPU power, and depending on what you want to do that might be a more well rounded approach to mobile computing. What we need to see now are actual laptops and their prices. To trot out a tired old saying once more, "There are no bad products; only bad prices." Now it's up to AMD's partners to make sure Trinity laptops are priced appropriately.