Barts: The Next Evolution of Cypress

At the heart of today’s new cards is Barts, the first member of AMD’s Northern Island GPUs. As we quickly hinted at earlier, Barts is a very direct descendant of Cypress. This is both a product of design, and a product of consequences.

It should come as no surprise that AMD was originally looking to produce what would be the Northern Islands family on TSMC’s 32nm process; as originally scheduled this would line up with the launch window AMD wanted, and half-node shrinks are easier for them than trying to do a full-node shrink. Unfortunately the 32nm process quickly became doomed for a number of reasons.

Economically, per-transistor it was going to be more expensive than the 40nm process, which is a big problem when you’re trying to make an economical chip like Barts. Technologically, 32nm was following TSMC’s troubled 40nm process; TSMC’s troubles ended up being AMD’s troubles when they launched the 5800 series last year, as yields were low and wafers were few, right at a time where AMD needed every chip they could get to capitalize on their lead over NVIDIA. 32nm never reached completion so we can’t really talk about yields or such, but it’s sufficient to say that TSMC had their hands full fixing 40nm and bringing up 28nm without also worrying about 32nm.

Ultimately 32nm was canceled around November of last year. But even before that AMD made the hard choice to take a hard turn to the left and move what would become Barts to 40nm. As a result AMD had to make some sacrifices and design choices to make Barts possible on 40nm, and to make it to market in a short period of time.

For these reasons, architecturally Barts is very much a rebalanced Cypress, and with the exception of a few key changes we could talk about Barts in the same way we talked about Juniper (the 5700 series) last year.


Click to enlarge

Barts continues AMD’s DirectX 11 legacy, building upon what they’ve already achieved with Cypress. At the SPU level, like Cypress and every DX10 AMD design before it continues to use AMD’s VLIW5 design. 5 stream processors – the w, x, y, z, and t units – work together with a branch unit and a set of GPRs to process instructions. The 4 simple SPs can work together to process 4 FP32 MADs per clock, while the t unit can either do FP32 math like the other units or handle special functions such as a transcendental. Here is a breakdown of what a single Barts SPU can do in a single clock cycle:

  • 4 32-bit FP MAD per clock
  • 4 24-bit Int MUL or ADD per clock
  • SFU : 1 32-bit FP MAD per clock

Compared to Cypress, you’ll note that FP64 performance is not quoted, and this isn’t a mistake. Barts isn’t meant to be a high-end product (that would be the 6900 series) so FP64 has been shown the door in order to bring the size of the GPU down. AMD is still a very gaming-centric company versus NVIDIA’s philosophy of GPU computing everywhere, so this makes sense for AMD’s position, while NVIDIA’s comparable products still offer FP64 if only for development purposes.

Above the SPs and SPUs, we have the SIMD. This remains unchanged from Cypress, with 80 SPs making up a SIMD. The L1 cache and number of texture units per SIMD remains at 16KB L1 texture, 8KB L1 compute, and 4 texture units per SIMD.

At the macro level AMD maintains the same 32 ROP design (which combined with Barts’ higher clocks, actually gives it an advantage over Cypress). Attached to the ROPs are AMD’s L2 cache and memory controllers; there are 4 128KB blocks of L2 cache (for a total of 512KB L2) and 4 64bit memory controllers that give Barts a 256bit memory bus.

Barts is not just a simple Cypress derivative however. For non-gaming/compute uses, UVD and the display controller have both been overhauled. Meanwhile for gaming Barts did receive one important upgrade: an enhanced tessellation unit. AMD has responded to NVIDIA’s prodding about tessellation at least in part, equipping Barts with a tessellation unit that in the best-case scenario can double their tessellation performance compared to Cypress. AMD has a whole manifesto on tessellation that we’ll get in to, but for now we’ll work with the following chart:

AMD has chosen to focus on tessellation performance at lower tessellation factors, as they believe these are the most important factors for gaming purposes. From their own testing the advantage over Cypress approaches 2x between factors 6 and 10, while being closer to a 1.5x increase before that and after that up to factor 13 or so. At the highest tessellation factors Barts’ tessellation unit falls to performance roughly in line with Cypress’, squeezing out a small advantage due to the 6870’s higher clockspeed. Ultimately this means tessellation performance is improved on AMD products at lower tessellation factors, but AMD’s tessellation performance is still going to more-or-less collapse at high factors when they’re doing an extreme amount of triangle subdivision.

So with all of this said, Barts ends up being 25% smaller than Cypress, but in terms of performance we’ve found it to only be 7% slower when comparing the 6870 to the 5870. How AMD accomplished this is the rebalancing we mentioned earlier.

Based on AMD’s design decisions and our performance data, it would appear that Cypress has more computing/shading power than it necessarily needs. True, Barts is slower, but it’s a bit slower and a lot smaller. AMD’s various compute ratios, such as compute:geometry and compute:rasterization would appear to be less than ideal on Cypress. So Barts changes the ratios.

Compared to Cypress and factoring in 6870/5870 clockspeeds, Barts has about 75% of the compute/shader/texture power of Cypress. However it has more rasterization, tessellation, and ROP power than Cypress; or in other words Barts is less of a compute/shader GPU and a bit more of a traditional rasterizing GPU with a dash of tessellation thrown in. Even in the worst case scenarios from our testing the drop-off at 1920x1200 is only 13% compared to Cypress/5870, so while Cypress had a great deal of compute capabilities, it’s clearly difficult to make extremely effective use of it even on the most shader-heavy games of today.

However it’s worth noting that internally AMD was throwing around 2 designs for Barts: a 16 SIMD (1280 SP) 16 ROP design, and a 14 SIMD (1120 SP) 32 ROP design that they ultimately went with. The 14/32 design was faster, but only by 2%. This along with the ease of porting the design from Cypress made it the right choice for AMD, but it also means that Cypress/Barts is not exclusively bound on the shader/texture side or the ROP/raster side.

Along with selectively reducing functional blocks from Cypress and removing FP64 support, AMD made one other major change to improve efficiency for Barts: they’re using Redwood’s memory controller. In the past we’ve talked about the inherent complexities of driving GDDR5 at high speeds, but until now we’ve never known just how complex it is. It turns out that Cypress’s memory controller is nearly twice as big as Redwood’s! By reducing their desired memory speeds from 4.8GHz to 4.2GHz, AMD was able to reduce the size of their memory controller by nearly 50%. Admittedly we don’t know just how much space this design choice saved AMD, but from our discussions with them it’s clearly significant. And it also perfectly highlights just how hard it is to drive GDDR5 at 5GHz and beyond, and why both AMD and NVIDIA cited their memory controllers as some of their biggest issues when bringing up Cypress and GF100 respectively.

Ultimately all of these efficiency changes are necessary for AMD to continue to compete in the GPU market, particularly in the face of NVIDIA and the GF104 GPU powering the GTX 460. Case in point, in the previous quarter AMD’s graphics division only made $1mil in profit. While Barts was in design years before that quarter, the situation still succinctly showcases why it’s important to target each market segment with an appropriate GPU; harvested GPUs are only a stop-gap solution, in the end purposely crippling good GPUs is a good way to cripple a company’ s gross margin.

Index Seeing the Future: DisplayPort 1.2
Comments Locked

197 Comments

View All Comments

  • GeorgeH - Friday, October 22, 2010 - link

    WRT comments complaining about the OC 460 -

    It's been clear from the 460 launch that a fully enabled and/or higher clocked 460 would compete very well with a 470. It would have been stupid for NVIDIA to release such a card, though - it would have made the already expensive GF100 even more so by eliminating a way to get rid of their supply of slightly defective GF100 chips (as with the 465) and there was no competitive reason to release a 460+.

    Now that there is a competitive reason to release one, do you really think Nvidia is going to sit still and take losses (or damn close to it) on the 470 when it has the capability of launching a 460+? Do you really think that Nvidia still can't make fully functional GF104 chips? Including the OC 460 is almost certainly Ryan's way of hinting without hinting (NDAs being what they are) what Nvdia is prepping for release.

    (And if you really think AT is anyone's shill, you're obviously very new to AT.)
  • AnandThenMan - Friday, October 22, 2010 - link

    "And if you really think AT is anyone's shill, you're obviously very new to AT."

    Going directly against admitted editorial policy doesn't exactly bolster your argument now does it. As for your comment about a 460+ or whatever you were trying to say, who cares? Reviews are supposed to be about hardware that is available to everyone now, not some theoretical card in the future.
  • MGSsancho - Friday, October 22, 2010 - link

    A vendor could just as likely sell an overclocked 470 card as well as a 480. But I think you made the right assumption that team green might be releasing overclocked cards that all have a minimum of 1gb of ram to make it look like their cards are faster than team red's. maybe it will be for near equal price points, the green cards will all be 20~30% overclocked to make it look like they are 10% faster than the red offerings at similar prices. Red cards could just be sold over clocked as well (we have to wait a bit more to see how well they overclock). All of this does not really matter. In the end of the day, buyers will look at whats the fastest product they can purchase at their price point. Maybe secondly they will notice that hey this thing gets hot and is very loud and just blindly blaming the green/red suits and thirdly they will look at features. Who really knows.

    Personally I purchase the slightly slower products then over clock them myself if i find a game that needs it. I would rather have the headroom vs buying a card that is always going to be hot enough to rival volcanoes even if it is factory warrantied.
  • Golgatha - Friday, October 22, 2010 - link

    The nVidia volcanoes comment is really, really overstated. I have a mid-tower case with a 120mm exhaust and 2x92mm intakes (Antec Solo for reference), and a GTX 480. None of these case fans are high performance fans. Under very stressful gaming conditions, I hit in the 80-85°C range, and Folding@Home's GPU3 client will get it up to 91°C under 100% torturous load.

    Although I don't like the power consumption of the GTX 480 for environmental reasons, it is rock solid stable, has none of the drawbacks of multi-GPU setups (I actually downgraded from a Crossfire 5850 setup due to game crashing and rendering issues), and it seems to be top dog in a lot of cases when it comes to minimum FPS (even when compared to multi-GPU setups).
  • Parhel - Friday, October 22, 2010 - link

    "And if you really think AT is anyone's shill, you're obviously very new to AT"

    I think you're referring to me, since I'm the one who used the word "shill." Let me tell you, I've been reading AT since before Tom's Hardware sucked, and that's a loooong time.

    If I were going to buy a card today, I'd buy the $180 GTX 460 1GB, no question. I'm not an AMD fan, nor am I an NVidia fan. I am, however, an Anandtech fan. And their decision to include the FTW edition card in this review means I can no longer come here and assume I'm reading something something unbiased and objective.
  • GeorgeH - Friday, October 22, 2010 - link

    It was actually more of a shotgun blast aimed at the several silly posts implying AT was paid off by EVGA or Nvidia.

    If you've been reading AT for ~10 years, why would you assume that Ryan (or any other longtime contributor) suddenly decided to start bowing to outside pressure? If you stop lighting the torches and sharpening the pitchforks for half a second, you might realize that Ryan probably has a very good reason for including the OC card.

    Even if I'm smoking crack WRT a GTX460+, what's the point of a review? It's not to give AMD and Nvidia a "fair" fight, it's to give us an idea of the best card to spend our money on - and if AMD or Nvidia get screwed in the process, I'm not going to be losing any sleep.

    Typically, OC cards with a significant clock bump are fairly rare "Golden samples" and/or only provide marginal performance benefits without significantly increasing heat, noise, and power consumption. With the 460, Nvidia all but admitted they could've bumped the stock clocks quite significantly, but didn't want to threaten their other cards (*cough* 470 *cough*) if they didn't have to. This is reflected in what you can actually buy at Newegg - of the ~30 1GB 460's, only ~5 are running stock. 850MHz is still high, but is also right in line with the average of what you can expect any 460 to get to, so I don't think it's too far out of place.

    Repeating what I said above, including the OC card was unfair to AMD, but is highly relevant to me and my wallet. I couldn't care less if AMD (or Nvidia) get screwed by an AT review - I just want to know what's best for me, and this article delivers. If the tables were turned, I'm sure that Ryan would have no problem including an OC AMD card in a Nvidia review - because it isn't about being a shill, it's about informing me, the consumer.
  • SandmanWN - Friday, October 22, 2010 - link

    What? Put the crack down... Really, if you are short on time to review a product and you steal time away from that objective just to review a specially delivered hand selected opponents card instead of completing your assignment then you've not exactly been genuine to your readers or in this case to AMD.

    If you have time to add in an overclocked card then you need to do the same with the review card, otherwise the OC'd cards need to wait another day.

    I have no idea how you can claim some great influence on your wallet when you have no idea of the OC capabilities of the 6000 series. If you actually bought the 460 off this review then you are banking that the overclock will hold up against a unknown variable. That's not exactly relevant to anyone's wallet.
  • GeorgeH - Friday, October 22, 2010 - link

    An OC'd 460 competes with the 6870, and the 6870 doesn't really overclock at all.

    Even overclocked, a 6850 isn't going to touch a 6870, unless you're going to well over 1GHz (which short of a miracle isn't going to happen.)

    It was disappointing that the review wasn't fleshed out more, but I'd say what's missing isn't as relevant to my buying decisions as how well the plethora of OC'd 460s compare to the 6870.
  • Parhel - Saturday, October 23, 2010 - link

    "the 6870 doesn't really overclock at all"

    What? You're talking out of your ass No review site has even attempted a serious overclock yet. It's not even possible, as far as I know, to modify the voltage yet! We have no way to gauge how these cards overclock, and won't for several weeks.

    "850MHz is still high, but is also right in line with the average of what you can expect any 460 to get to"

    Now you're sounding like the shill. 850Mhz is not a realistic number if we're talking about 24/7 stability with stock cooling. No way.
  • GeorgeH - Saturday, October 23, 2010 - link

    850MHz unrealistic? Nvidia flat out admitted that most cards are capable of at least ~800MHz (no volt mods, no nothing) and reviews around the web have backed this up, showing low to mid 800's on most stock cards, at stock voltages, running stock cooling. If you're worried about reliability, grab one of the many cards that come factory OC'd with a warranty.

    The 6870 doesn't now and never will overclock much at all, at least not in the way the 460 does. As with any chip, there will be golden sample cards that will go higher with voltage tweaks and extra cooling, but AMD absolutely did not leave ~20-25% of the 6870's average clockspeed potential on the table. The early OC reviews back this up as well, showing the 6870 as having minimal OC'ing headroom at stock voltages.

    If you're waiting to compare the maximum performance that you can stretch out of a cherry-picked 6870 with careful volt mods and aftermarket cooling, you're going to be comparing it with a 460 @ ~950MHz, not ~850MHz.

    As a guess, I'd say that your ignorance of these items is what led you to be so outraged at the inclusion of the OC 460 in the review. The magnitude of the OC potential of the 460 is highly atypical (at least in mid-range to high end cards), which is why I and many other posters have no issue with its similarly atypical inclusion in the review.

Log in

Don't have an account? Sign up now