Barts: The Next Evolution of Cypress

At the heart of today’s new cards is Barts, the first member of AMD’s Northern Island GPUs. As we quickly hinted at earlier, Barts is a very direct descendant of Cypress. This is both a product of design, and a product of consequences.

It should come as no surprise that AMD was originally looking to produce what would be the Northern Islands family on TSMC’s 32nm process; as originally scheduled this would line up with the launch window AMD wanted, and half-node shrinks are easier for them than trying to do a full-node shrink. Unfortunately the 32nm process quickly became doomed for a number of reasons.

Economically, per-transistor it was going to be more expensive than the 40nm process, which is a big problem when you’re trying to make an economical chip like Barts. Technologically, 32nm was following TSMC’s troubled 40nm process; TSMC’s troubles ended up being AMD’s troubles when they launched the 5800 series last year, as yields were low and wafers were few, right at a time where AMD needed every chip they could get to capitalize on their lead over NVIDIA. 32nm never reached completion so we can’t really talk about yields or such, but it’s sufficient to say that TSMC had their hands full fixing 40nm and bringing up 28nm without also worrying about 32nm.

Ultimately 32nm was canceled around November of last year. But even before that AMD made the hard choice to take a hard turn to the left and move what would become Barts to 40nm. As a result AMD had to make some sacrifices and design choices to make Barts possible on 40nm, and to make it to market in a short period of time.

For these reasons, architecturally Barts is very much a rebalanced Cypress, and with the exception of a few key changes we could talk about Barts in the same way we talked about Juniper (the 5700 series) last year.


Click to enlarge

Barts continues AMD’s DirectX 11 legacy, building upon what they’ve already achieved with Cypress. At the SPU level, like Cypress and every DX10 AMD design before it continues to use AMD’s VLIW5 design. 5 stream processors – the w, x, y, z, and t units – work together with a branch unit and a set of GPRs to process instructions. The 4 simple SPs can work together to process 4 FP32 MADs per clock, while the t unit can either do FP32 math like the other units or handle special functions such as a transcendental. Here is a breakdown of what a single Barts SPU can do in a single clock cycle:

  • 4 32-bit FP MAD per clock
  • 4 24-bit Int MUL or ADD per clock
  • SFU : 1 32-bit FP MAD per clock

Compared to Cypress, you’ll note that FP64 performance is not quoted, and this isn’t a mistake. Barts isn’t meant to be a high-end product (that would be the 6900 series) so FP64 has been shown the door in order to bring the size of the GPU down. AMD is still a very gaming-centric company versus NVIDIA’s philosophy of GPU computing everywhere, so this makes sense for AMD’s position, while NVIDIA’s comparable products still offer FP64 if only for development purposes.

Above the SPs and SPUs, we have the SIMD. This remains unchanged from Cypress, with 80 SPs making up a SIMD. The L1 cache and number of texture units per SIMD remains at 16KB L1 texture, 8KB L1 compute, and 4 texture units per SIMD.

At the macro level AMD maintains the same 32 ROP design (which combined with Barts’ higher clocks, actually gives it an advantage over Cypress). Attached to the ROPs are AMD’s L2 cache and memory controllers; there are 4 128KB blocks of L2 cache (for a total of 512KB L2) and 4 64bit memory controllers that give Barts a 256bit memory bus.

Barts is not just a simple Cypress derivative however. For non-gaming/compute uses, UVD and the display controller have both been overhauled. Meanwhile for gaming Barts did receive one important upgrade: an enhanced tessellation unit. AMD has responded to NVIDIA’s prodding about tessellation at least in part, equipping Barts with a tessellation unit that in the best-case scenario can double their tessellation performance compared to Cypress. AMD has a whole manifesto on tessellation that we’ll get in to, but for now we’ll work with the following chart:

AMD has chosen to focus on tessellation performance at lower tessellation factors, as they believe these are the most important factors for gaming purposes. From their own testing the advantage over Cypress approaches 2x between factors 6 and 10, while being closer to a 1.5x increase before that and after that up to factor 13 or so. At the highest tessellation factors Barts’ tessellation unit falls to performance roughly in line with Cypress’, squeezing out a small advantage due to the 6870’s higher clockspeed. Ultimately this means tessellation performance is improved on AMD products at lower tessellation factors, but AMD’s tessellation performance is still going to more-or-less collapse at high factors when they’re doing an extreme amount of triangle subdivision.

So with all of this said, Barts ends up being 25% smaller than Cypress, but in terms of performance we’ve found it to only be 7% slower when comparing the 6870 to the 5870. How AMD accomplished this is the rebalancing we mentioned earlier.

Based on AMD’s design decisions and our performance data, it would appear that Cypress has more computing/shading power than it necessarily needs. True, Barts is slower, but it’s a bit slower and a lot smaller. AMD’s various compute ratios, such as compute:geometry and compute:rasterization would appear to be less than ideal on Cypress. So Barts changes the ratios.

Compared to Cypress and factoring in 6870/5870 clockspeeds, Barts has about 75% of the compute/shader/texture power of Cypress. However it has more rasterization, tessellation, and ROP power than Cypress; or in other words Barts is less of a compute/shader GPU and a bit more of a traditional rasterizing GPU with a dash of tessellation thrown in. Even in the worst case scenarios from our testing the drop-off at 1920x1200 is only 13% compared to Cypress/5870, so while Cypress had a great deal of compute capabilities, it’s clearly difficult to make extremely effective use of it even on the most shader-heavy games of today.

However it’s worth noting that internally AMD was throwing around 2 designs for Barts: a 16 SIMD (1280 SP) 16 ROP design, and a 14 SIMD (1120 SP) 32 ROP design that they ultimately went with. The 14/32 design was faster, but only by 2%. This along with the ease of porting the design from Cypress made it the right choice for AMD, but it also means that Cypress/Barts is not exclusively bound on the shader/texture side or the ROP/raster side.

Along with selectively reducing functional blocks from Cypress and removing FP64 support, AMD made one other major change to improve efficiency for Barts: they’re using Redwood’s memory controller. In the past we’ve talked about the inherent complexities of driving GDDR5 at high speeds, but until now we’ve never known just how complex it is. It turns out that Cypress’s memory controller is nearly twice as big as Redwood’s! By reducing their desired memory speeds from 4.8GHz to 4.2GHz, AMD was able to reduce the size of their memory controller by nearly 50%. Admittedly we don’t know just how much space this design choice saved AMD, but from our discussions with them it’s clearly significant. And it also perfectly highlights just how hard it is to drive GDDR5 at 5GHz and beyond, and why both AMD and NVIDIA cited their memory controllers as some of their biggest issues when bringing up Cypress and GF100 respectively.

Ultimately all of these efficiency changes are necessary for AMD to continue to compete in the GPU market, particularly in the face of NVIDIA and the GF104 GPU powering the GTX 460. Case in point, in the previous quarter AMD’s graphics division only made $1mil in profit. While Barts was in design years before that quarter, the situation still succinctly showcases why it’s important to target each market segment with an appropriate GPU; harvested GPUs are only a stop-gap solution, in the end purposely crippling good GPUs is a good way to cripple a company’ s gross margin.

Index Seeing the Future: DisplayPort 1.2
Comments Locked

197 Comments

View All Comments

  • Quidam67 - Friday, October 29, 2010 - link

    Well that's odd.

    After reading about the EVGA FTW, and its mind-boggling factory overclock, I went looking to see if I could pick one of these up in New Zealand.

    Seems you can, or maybe not. As per this example http://www.trademe.co.nz/Browse/Listing.aspx?id=32... the clocks are 763Mhz and 3.8 on the memory?!?

    What gives, how can EVGA give the same name to a card and then have different specifications on it? So good thing I checked the fine-print or else I would have been bumbed out if I'd bought it and then realised it wasn't clocked like I thought it would be..
  • Murolith - Friday, October 29, 2010 - link

    So..how about that update in the review checking out the quality/speed of MLAA?
  • CptChris - Sunday, October 31, 2010 - link

    As the cards were compared to the OC nVidia card I would be interested in seeing how the 6800 series also compares to a card like the Sapphire HD5850 2GB Toxic Edition. I know it is literally twice the price as the HD6850 but would it be enough of a performance margin to be worth the price difference?
  • gochichi - Thursday, November 4, 2010 - link

    You know, maybe I hang in the wrong circles but I by far keep up to date on GPUs more than anyone I know. Not only that, but I am eager to update my stuff if it's reasonable. I want it to be reasonable so badly because I simply love computer hardware (more than games per say, or as much as the games... it's about hardware for me in and of itself).

    Not getting to my point fast enough. I purchased a Radeon 3870 at Best Buy (Best Buy had an oddly good deal on these at the time, Best Buy doesn't tend to keep competitive prices on video cards at all for some reason). 10 days later (so I returned my 3870 at the store) I purchased a 4850, and wow, what a difference it made. The thing of it is, the 3870 played COD 4 like a champ, the 4850 was ridiculously better but I was already satisfied.

    In any case, the naming... the 3870 was no more than $200.00 I think it was $150.00. And it played COD4 on 24" 1900x1200 monitor with a few settings not maxed out, and played it so well. The 4850 allowed me to max out my settings. Crysis sucked, crysis still sucks and crysis is still a playable benchmark. Not to say I don't look at it as a benchmark. The 4850 on the week of its release was $199.99 at Best Buy.

    Then gosh oh golly there was the 4870 and the 4890, which simply took up too much power... I am simply unwilling to buy a card that uses more than one extra 6-pin connector just so I can go out of my way to find something that runs better. So far, my 4850 has left me wanting more in GTA IV, (notice again how it comes down to hardware having to overcome bad programming, the 4850 is fast enough for 1080p but it's not a very well ported game so I have to defer to better hardware). You can stop counting the ways my 4850 has left me wanting more at 1900 x 1200. I suppose maxing out Starcraft II would be nice also.

    Well, then came out the 5850, finally a card that would eclipse my 4850... but oh wait, though the moniker was the same (3850 = so awesome, so affordable, the 4850 = so awesome, so affordable, the 5850 = two 6-pin connectors, so expensive, so high end) it was completely out of line with what I had come to expect. The 4850 stood without a successor. Remember here that I was going from 3870 to 4850, same price range, way better performance. Then came the 5770, and it was marginally faster but just not enough change to merit a frivolous upgrade.

    Now, my "need" to upgrade is as frivolous as ever, but finally, a return to sanity with the *850 moniker standing for fast, and midrange. I am a *850 kind of guy through and through, I don't want crazy power consumption, I don't want to be able to buy a whole, really good computer for the price of just a video card.

    So, anyhow, that's my long story basically... that the strange and utterly upsetting name was the 5850, the 6850 is actually right in line with what the naming should have always staid as. I wouldn't know why the heck AMD tossed a curve ball for me via the 5850, but I will tell you that it's been a really long time coming to get a true successor in the $200 and under range.

    You know, around the time of the 9800GT and the 4850, you actually heard people talk about buying video cards while out with friends. The games don't demand much more than that... so $500 cards that double their performance is just silly silly stuff and people would rather buy an awesome phone, an iPad, etc. etc. etc.

    So anyhow, enough of my rambling, I reckon I'll be silly and get the true successor to my 4850... though I am assured that my Q6600 isn't up to par for Starcraft II... oh well.
  • rag2214 - Sunday, November 7, 2010 - link

    The 6800 series my not beat the 5870 yet but it is the start of the HDMI 1.4 for 3dHD not available in any other ATI graphics cards.
  • Philip46 - Monday, November 15, 2010 - link

    The review stated why was there a reson to buy a 460(not OC'ed).

    How about benchmarks of games using Physx?

    For instance Mafia 2 hits 32fps @ 1080p(I7-930 cpu) when using Physx on high, while the 5870 manages only 16.5fps, while i tested both cards.

    How about a GTA:IV benchmark?, because the Zotac 2GB GTX 460, runs the game more smoothly(the same avg fps, except the min fps on the 5850 are lower in the daytime) then the 5850 (2GB).

    How about even a Far Cry 2 benchmark?

    Co'me on anandtech!, lets get some real benchmarks that cover all aspects of gaming features.

    How about adding in driver stability? Ect..

    And before anyone calls me biased, i had both the Zotac GTX 460 and Saffire 5850 2GB a couple weeks back, and overall i went with the Zotac 460, and i play Crysis/Stalker/GTA IV/Mafia 2/Far Cry 2..ect @ 1080p, and the 460 just played them all more stable..even if Crysis/Stalker were some 10% faster on the 5850.

    BTW: Bad move by anandtech to include the 460 FTC !
  • animekenji - Saturday, December 25, 2010 - link

    Barts is the replacement for Juniper, NOT Cypress. Cayman is the replacement for Cypress. If you're going to do a comparison to the previous generation, then at least compare it to the right card. HD6850 replaces HD5750. HD6870 replaces HD5770. HD6970 replaces HD5870. You're giving people the false impression that AMD knocked performance down with the new cards instead of up when HD6800 vastly outperforms HD5700 and HD6900 vastly outperforms HD5800. Stop drinking the green kool-aid, Anandtech.

Log in

Don't have an account? Sign up now