• What
    is this?
    You've landed on the AMD Portal on AnandTech. This section is sponsored by AMD. It features a collection of all of our independent AMD content, as well as Tweets & News from AMD directly. AMD will also be running a couple of huge giveaways here so check back for those.
    PRESENTED BY

Introduction and Piledriver Overview

Brazos and Llano were both immensely successful parts for AMD. The company sold tons despite not delivering leading x86 performance. The success of these two APUs gave AMD a lot of internal confidence that it was possible to build something that didn't prioritize x86 performance but rather delivered a good balance of CPU and GPU performance.

AMD's commitment to the world was that we'd see annual updates to all of its product lines. Llano debuted last June, and today AMD gives us its successor: Trinity.

At a high level, Trinity combines 2-4 Piledriver x86 cores (1-2 Piledriver modules) with up to 384 VLIW4 Northern Islands generation Radeon cores on a single 32nm SOI die. The result is a 1.303B transistor chip (up from 1.178B in Llano) that measures 246mm^2 (compared to 228mm^2 in Llano).

Trinity Physical Comparison
  Manufacturing Process Die Size Transistor Count
AMD Llano 32nm 228mm2 1.178B
AMD Trinity 32nm 246mm2 1.303B
Intel Sandy Bridge (4C) 32nm 216mm2 1.16B
Intel Ivy Bridge (4C) 22nm 160mm2 1.4B

Without a change in manufacturing process, AMD is faced with the tough job of increasing performance without ballooning die size. Die size has only gone up by around 7%, but both CPU and GPU performance see double-digit increases over Llano. Power consumption is also improved over Llano, making Trinity a win across the board for AMD compared to its predecessor. If you liked Llano, you'll love Trinity.

The problem is what happens when you step outside of AMD's world. Llano had a difficult time competing with Sandy Bridge outside of GPU workloads. AMD's hope with Trinity is that its hardware improvements combined with more available OpenCL accelerated software will improve its standing vs. Ivy Bridge.

Piledriver: Bulldozer Tuned

While Llano featured as many as four 32nm x86 Stars cores, Trinity features up to two Piledriver modules. Given the not-so-great reception of Bulldozer late last year, we were worried about how a Bulldozer derivative would stack up in Trinity. I'm happy to say that Piledriver is a step forward from the CPU cores used in Llano, largely thanks to a bunch of clean up work from the Bulldozer foundation.

Piledriver picks up where Bulldozer left off. Its fundamental architecture remains completely unchanged, but rather improved in all areas. Piledriver is very much a second pass on the Bulldozer architecture, tidying everything up, capitalizing on low hanging fruit and significantly improving power efficiency. If you were hoping for an architectural reset with Piledriver, you will be disappointed. AMD is committed to Bulldozer and that's quite obvious if you look at Piledriver's high level block diagram:

Each Piledriver module is the same 2+1 INT/FP combination that we saw in Bulldozer. You get two integer cores, each with their own schedulers, L1 data caches, and execution units. Between the two is a shared floating point core that can handle instructions from one of two threads at a time. The single FP core shares the data caches of the dual integer cores.

Each module appears to the OS as two cores, however you don't have as many resources as you would from two traditional AMD cores. This table from our Bulldozer review highlights part of problem when looking at the front end:

Front End Comparison
  AMD Phenom II AMD FX Intel Core i7
Instruction Decode Width 3-wide 4-wide 4-wide
Single Core Peak Decode Rate 3 instructions 4 instructions 4 instructions
Dual Core Peak Decode Rate 6 instructions 4 instructions 8 instructions
Quad Core Peak Decode Rate 12 instructions 8 instructions 16 instructions
Six/Eight Core Peak Decode Rate 18 instructions (6C) 16 instructions 24 instructions (6C)

It's rare that you get anywhere near peak hardware utilization, so don't be too put off by these deltas, but it is a tradeoff that AMD made throughout Bulldozer. In general, AMD opted for better utilization of fewer resources (partially through increasing some data structures and other elements that feed execution units) vs. simply throwing more transistors at the problem. AMD also opted to reduce the ratio of integer to FP resources within the x86 portion of its architecture, clearly to support a move to the APU world where the GPU can be a provider of a significant amount of FP support. Piledriver doesn't fundamentally change any of these balances. The pipeline depth remains unchanged, as does the focus on pursuing higher frequencies.

Fundamental to Piledriver is a significant switch in the type of flip-flops used throughout the design. Flip-flops, or flops as they are commonly called, are simple pieces of logic that store some form of data or state. In a microprocessor they can be found in many places, including the start and end of a pipeline stage. Work is done prior to a flop and committed at the flop or array of flops. The output of these flops becomes the input to the next array of logic. Normally flops are hard edge elements—data is latched at the rising edge of the clock.

In very high frequency designs however, there can be a considerable amount of variability or jitter in the clock. You either have to spend a lot of time ensuring that your design can account for this jitter, or you can incorporate logic that's more tolerant of jitter. The former requires more effort, while the latter burns more power. Bulldozer opted for the latter.

In order to get Bulldozer to market as quickly as possible, after far too many delays, AMD opted to use soft edge flops quite often in the design. Soft edge flops are the opposite of their harder counterparts; they are designed to allow the clock signal to spill over the clock edge while still functioning. Piledriver on the other hand was the result of a systematic effort to swap in smaller, hard edge flops where there was timing margin in the design. The result is a tangible reduction in power consumption. Across the board there's a 10% reduction in dynamic power consumption compared to Bulldozer, and some workloads are apparently even pushing a 20% reduction in active power. Given Piledriver's role in Trinity, as a mostly mobile-focused product, this power reduction was well worth the effort.

At the front end, AMD put in additional work to improve IPC. The schedulers are now more aggressive about freeing up tokens. Similar to the soft vs. hard flip flop debate, it's always easier to be conservative when you retire an instruction from a queue. It eases verification as you don't have to be as concerned about conditions where you might accidentally overwrite an instruction too early. With the major effort of getting a brand new architecture off of the ground behind them, Piledriver's engineers could focus on greater refinement in the schedulers. The structures didn't get any bigger; AMD just now makes better use of them.

The execution units are also a bit beefier in Piledriver, but not by much. AMD claims significant improvements in floating point and integer divides, calls and returns. For client workloads these gains show minimal (sub 1%) improvements.

Prefetching and branch prediction are both significantly improved with Piledriver. Bulldozer did a simple sequential prefetch, while Piledriver can prefetch variable lengths of data and across page boundaries in the L1 (mainly a server workload benefit). In Bulldozer, if prefetched data wasn't used (incorrectly prefetched) it would clog up the cache as it would come in as the most recently accessed data. However if prefetched data isn't immediately used, it's likely it will never be used. Piledriver now immediately tags unused prefetched data as least-recently-used, allowing the cache controller to quickly evict it if the prefetch was incorrect.

Another change is that Piledriver includes a perceptron branch predictor that supplements the primary branch predictor in Bulldozer. The perceptron algorithm is a history based predictor that's better suited for predicting certain branches. It works in parallel with the old predictor and simply tags branches that it is known to be good at predicting. If the old predictor and the perceptron predictor disagree on a tagged branch, the perceptron's path is taken. Improving branch prediction accuracy is a challenge, but it's necessary in highly pipelined designs. These sorts of secondary predictors are a must as there's no one-size-fits-all when it comes to branch prediction.

Finally, Piledriver also adds new instructions to better align its ISA with Haswell: FMA3 and F16C.

Improved Turbo, Beefy Interconnects and the Trinity GPU
POST A COMMENT

271 Comments

View All Comments

  • zepi - Tuesday, May 15, 2012 - link

    You've got it backwards.

    Stuff is priced according to the value it has for customers. To get as much money from their product as possible, regardless of manufacturing costs. Or that's what everybody is aiming for. Trinity is going to be cheap only because it's not good enough to get sales if priced higher.

    Best possible outcome for everybody would have been that cheapest Trinity-based laptops would cost about $1500, but they'd be about as fast as Ivy Bridge Quadcore-desktops with Geforce GTX680 and still achieve a battery-life of about 8min per Wh. And performance & price would both have only gone upwards from there on.

    That kind of performance-dominance would force Intel and Nvidia to drop their prices considerably (getting us the cheap laptops regardless of trinity being pricey) and we'd still have to option to go for über Trinity's if we'd have the cash.

    And it would save AMD from bankruptcy, ensuring that we'd have competition in future as well.

    Llano, Brazos and Bulldozer are all horrible products for AMD. Good product is characterized by the fact, that it has considerably more worth to the customer than it costs to manufacture it. If a product is good, it's easy to price it accordingly, and people will still buy it. AMD's CPU's are apparently very bad products, because AMD is making huge losses at the moment. And I don't think it's the GPU-division that's causing those losses.
    Reply
  • JarredWalton - Tuesday, May 15, 2012 - link

    Products are priced according to where the marketing folks think they'll sell. All you have to do is walk into Best Buy and talk to a sales person to realize that they'll push whatever they can on you, even if it's not faster/better. And I think the bean counters feel they can sell Trinity at $700 or more--and for many people, they're probably right. We'll see $600 and $500 Trinity as well, but that will be the A8 and A6 models, with less RAM and smaller HDDs.

    As far as competition, propping up an inferior product in the hope of having more competition isn't healthy, and if AMD has a superior product they simply charge as much as Intel. NVIDIA is the same. If someone came out with a chip that had the CPU performance of IVB and the GPU performance of a GTX card, all while using the power of Brazos...well, you can bet they'd charge an arm and a leg for it. They wouldn't sell it for $1500, they'd sell it for $2500--and some people would buy it.

    Ultimately, they're all big businesses, and they (try to) do what's best for the business, so I buy whatever product fits my needs best. I wish Trinity were more impressive, particularly on the CPU side of the equation. I think if Trinity's CPU were as fast as Ivy Bridge, the GPU portion would probably end up being 50% faster than HD 4000; unfortunately, there are titles that require more CPU work (Skyrim for instance) and that starts to level the playing field. But wishing for something that isn't here, or playing the "what if" game, just doesn't really accomplish anything.
    Reply
  • Targon - Wednesday, May 16, 2012 - link

    And you can get a quad-core A6 laptop for under $500 right now. If you pay attention, you generally get what you pay for. For most users, going with an AMD quad-core laptop does provide a decent product for the price. For some, CPU power is more important, and for others, a more well rounded machine is more important. I expect that A10-4600 laptops will start closer to $600 than $700, unless you are looking at machines with a large screen, discrete graphics, or something else that increases the prices. Reply
  • CeriseCogburn - Wednesday, May 23, 2012 - link

    What you're all missing is all the then second tier Optimus laptops that will have much deflated pricing, as well as the load of $599 amd discrete laptops that will sell like wildfire and please those who waited - just like the amd fans are constantly waiting for nVidia to release so they can snag a second tier deflated price amd card.

    Since the "cpu doesn't matter !" as we have been told, there's no excuse to not snag a fine and cheap Optimus that won't have an IB.

    This is the "best time in the world" for all the amd fans to forget all prior generations of laptops and pretend, quite unlike in the video card area, that nothing else exists.

    I love how amd fans do that crap.
    Reply
  • evolucion8 - Tuesday, May 15, 2012 - link

    Also remember that Penryn was launched on 2007-2008 and until late 2009, several Core 2 Duo laptops were released. I have a Gateway MD7309u and it was launched on October 2009 and still feeling very snappy and has good battery life, I hate its GMA 4500M with my whole heart..... Reply
  • Nfarce - Tuesday, May 15, 2012 - link

    Yeah well I don't understand the point of buying a low-mid range laptop expecting to be enjoying playing games at basic laptop 1366x768 resolutions. What's the point?

    You can spend around $1,200 on a mid-range i5 turbo boost laptop with a discreet GPU and 1600x900 resolution screen that plays games decently without completely shutting down the eye candy sliders. Save up and get a better laptop - and Intel with a dedicated AMD or Nvidia GPU. If you can afford $600 now, you can afford $1,200 down the road and enjoy things much better.
    Reply
  • CeriseCogburn - Wednesday, May 23, 2012 - link

    I agree but the famdboy loves to torture itself and claim everyone else loves cheap frustrating crap - often characterized as a "mobile employee on the road, in the airplane, or at the hotel spot" needing a "game fix"...(in other words someone flush enough to buy +discrete) as you pointed out.
    The rest of the tremendous and greatly pleased "light gamers" will purportedly be playing at work( no scratch that) or on their couch at home (that sounds like the crew) ... and then one has to ask why aren't they using one of the desktops at home for gaming... a $100 vidcard in that will smoke the crap out of "the light gamer".

    That leaves "enthusiasts" who just want to play with it and see for a few minutes if they can OC it, and "how it does" with games... and after that they will want to throw it at a wall for how badly it sucks - not to mention their online multi-player avatar will get smoked so badly their stats will plummet... so that will last all of two days.

    So we get down to who this thing is really good for - and I suppose that's the young teen to pre teen brat - as a way to get the kid off mommy's or daddy's system so they can have the reigns uninterrupted... so the teeny bopper gets the crud low $ cheap walmart lappy system that should also keep them tamed since being too rough with it means the thing snaps in half a the plastic crumbles.

    Yep - there it is - teeny bopper punkster will just have to live with the jaggied pixelized low end no eye candy crawler - and why not they still love it much more than homework and have no problem eyeballing the screen.
    Reply
  • Latzara - Tuesday, May 15, 2012 - link

    While i agree with the 'nothing earthshattering' part I have to wonder what kind of average Internet browsing usage are you commenting on when you say 'People want their laptop to be responsive when doing work, watching movies and browsing' -- Most of the CPUs on the entire board presented here are enough for work - not graphics modelling mind you - excell, DB, mail, presentations, average calculation load, and even smaller programming projects - which constitutes most of the workload an average worker is gonna get, movies stopped being an issue way before, and what kind of browsing are we talking about that will make your platform unresponsive (i don't mean frozen)? 25 tabs at once? Cause i've done that with a much weaker platform and had no issues...

    The main problem i see is that the plaform hasn't moved as much as ppl hoped, but enough to be a new iteration in terms of progress - and with the right pricing it could be the sweet spot for many of the broader average consumers - not just the '1% of the 1% of people looking for great gaming" ...
    Reply
  • BSMonitor - Tuesday, May 15, 2012 - link

    Load up a couple Java runtime environments in those browsers. Some flash. I did have an etc in there. I am a multi-tasker, and cannot stand waiting any amount of time. For the majority of real laptop owners, a late Pentium M, Athlon 64/X2, is not enough power for any real work. Reply
  • Spunjji - Wednesday, May 16, 2012 - link

    Please define a "real" laptop owner? I own an Alienware and I don't do any of that sort of crap. Mind you, most users I have met express more patience than you do, too. regardless, in none of these metrics do you appear to represent the majority, which is the target market for this chip. Reply

Log in

Don't have an account? Sign up now