Barts: The Next Evolution of Cypress

At the heart of today’s new cards is Barts, the first member of AMD’s Northern Island GPUs. As we quickly hinted at earlier, Barts is a very direct descendant of Cypress. This is both a product of design, and a product of consequences.

It should come as no surprise that AMD was originally looking to produce what would be the Northern Islands family on TSMC’s 32nm process; as originally scheduled this would line up with the launch window AMD wanted, and half-node shrinks are easier for them than trying to do a full-node shrink. Unfortunately the 32nm process quickly became doomed for a number of reasons.

Economically, per-transistor it was going to be more expensive than the 40nm process, which is a big problem when you’re trying to make an economical chip like Barts. Technologically, 32nm was following TSMC’s troubled 40nm process; TSMC’s troubles ended up being AMD’s troubles when they launched the 5800 series last year, as yields were low and wafers were few, right at a time where AMD needed every chip they could get to capitalize on their lead over NVIDIA. 32nm never reached completion so we can’t really talk about yields or such, but it’s sufficient to say that TSMC had their hands full fixing 40nm and bringing up 28nm without also worrying about 32nm.

Ultimately 32nm was canceled around November of last year. But even before that AMD made the hard choice to take a hard turn to the left and move what would become Barts to 40nm. As a result AMD had to make some sacrifices and design choices to make Barts possible on 40nm, and to make it to market in a short period of time.

For these reasons, architecturally Barts is very much a rebalanced Cypress, and with the exception of a few key changes we could talk about Barts in the same way we talked about Juniper (the 5700 series) last year.


Click to enlarge

Barts continues AMD’s DirectX 11 legacy, building upon what they’ve already achieved with Cypress. At the SPU level, like Cypress and every DX10 AMD design before it continues to use AMD’s VLIW5 design. 5 stream processors – the w, x, y, z, and t units – work together with a branch unit and a set of GPRs to process instructions. The 4 simple SPs can work together to process 4 FP32 MADs per clock, while the t unit can either do FP32 math like the other units or handle special functions such as a transcendental. Here is a breakdown of what a single Barts SPU can do in a single clock cycle:

  • 4 32-bit FP MAD per clock
  • 4 24-bit Int MUL or ADD per clock
  • SFU : 1 32-bit FP MAD per clock

Compared to Cypress, you’ll note that FP64 performance is not quoted, and this isn’t a mistake. Barts isn’t meant to be a high-end product (that would be the 6900 series) so FP64 has been shown the door in order to bring the size of the GPU down. AMD is still a very gaming-centric company versus NVIDIA’s philosophy of GPU computing everywhere, so this makes sense for AMD’s position, while NVIDIA’s comparable products still offer FP64 if only for development purposes.

Above the SPs and SPUs, we have the SIMD. This remains unchanged from Cypress, with 80 SPs making up a SIMD. The L1 cache and number of texture units per SIMD remains at 16KB L1 texture, 8KB L1 compute, and 4 texture units per SIMD.

At the macro level AMD maintains the same 32 ROP design (which combined with Barts’ higher clocks, actually gives it an advantage over Cypress). Attached to the ROPs are AMD’s L2 cache and memory controllers; there are 4 128KB blocks of L2 cache (for a total of 512KB L2) and 4 64bit memory controllers that give Barts a 256bit memory bus.

Barts is not just a simple Cypress derivative however. For non-gaming/compute uses, UVD and the display controller have both been overhauled. Meanwhile for gaming Barts did receive one important upgrade: an enhanced tessellation unit. AMD has responded to NVIDIA’s prodding about tessellation at least in part, equipping Barts with a tessellation unit that in the best-case scenario can double their tessellation performance compared to Cypress. AMD has a whole manifesto on tessellation that we’ll get in to, but for now we’ll work with the following chart:

AMD has chosen to focus on tessellation performance at lower tessellation factors, as they believe these are the most important factors for gaming purposes. From their own testing the advantage over Cypress approaches 2x between factors 6 and 10, while being closer to a 1.5x increase before that and after that up to factor 13 or so. At the highest tessellation factors Barts’ tessellation unit falls to performance roughly in line with Cypress’, squeezing out a small advantage due to the 6870’s higher clockspeed. Ultimately this means tessellation performance is improved on AMD products at lower tessellation factors, but AMD’s tessellation performance is still going to more-or-less collapse at high factors when they’re doing an extreme amount of triangle subdivision.

So with all of this said, Barts ends up being 25% smaller than Cypress, but in terms of performance we’ve found it to only be 7% slower when comparing the 6870 to the 5870. How AMD accomplished this is the rebalancing we mentioned earlier.

Based on AMD’s design decisions and our performance data, it would appear that Cypress has more computing/shading power than it necessarily needs. True, Barts is slower, but it’s a bit slower and a lot smaller. AMD’s various compute ratios, such as compute:geometry and compute:rasterization would appear to be less than ideal on Cypress. So Barts changes the ratios.

Compared to Cypress and factoring in 6870/5870 clockspeeds, Barts has about 75% of the compute/shader/texture power of Cypress. However it has more rasterization, tessellation, and ROP power than Cypress; or in other words Barts is less of a compute/shader GPU and a bit more of a traditional rasterizing GPU with a dash of tessellation thrown in. Even in the worst case scenarios from our testing the drop-off at 1920x1200 is only 13% compared to Cypress/5870, so while Cypress had a great deal of compute capabilities, it’s clearly difficult to make extremely effective use of it even on the most shader-heavy games of today.

However it’s worth noting that internally AMD was throwing around 2 designs for Barts: a 16 SIMD (1280 SP) 16 ROP design, and a 14 SIMD (1120 SP) 32 ROP design that they ultimately went with. The 14/32 design was faster, but only by 2%. This along with the ease of porting the design from Cypress made it the right choice for AMD, but it also means that Cypress/Barts is not exclusively bound on the shader/texture side or the ROP/raster side.

Along with selectively reducing functional blocks from Cypress and removing FP64 support, AMD made one other major change to improve efficiency for Barts: they’re using Redwood’s memory controller. In the past we’ve talked about the inherent complexities of driving GDDR5 at high speeds, but until now we’ve never known just how complex it is. It turns out that Cypress’s memory controller is nearly twice as big as Redwood’s! By reducing their desired memory speeds from 4.8GHz to 4.2GHz, AMD was able to reduce the size of their memory controller by nearly 50%. Admittedly we don’t know just how much space this design choice saved AMD, but from our discussions with them it’s clearly significant. And it also perfectly highlights just how hard it is to drive GDDR5 at 5GHz and beyond, and why both AMD and NVIDIA cited their memory controllers as some of their biggest issues when bringing up Cypress and GF100 respectively.

Ultimately all of these efficiency changes are necessary for AMD to continue to compete in the GPU market, particularly in the face of NVIDIA and the GF104 GPU powering the GTX 460. Case in point, in the previous quarter AMD’s graphics division only made $1mil in profit. While Barts was in design years before that quarter, the situation still succinctly showcases why it’s important to target each market segment with an appropriate GPU; harvested GPUs are only a stop-gap solution, in the end purposely crippling good GPUs is a good way to cripple a company’ s gross margin.

Index Seeing the Future: DisplayPort 1.2
Comments Locked

197 Comments

View All Comments

  • StriderGT - Friday, October 22, 2010 - link

    I agree with you that the inclusion of the FTW card was a complete caving and casts shadows to a so far excellent reputation of anandtech. I believe the whole motivation was PR related, retaining a workable relation with nvidia, but was it worth it?!

    Look how ugly can this sort of thing get, they do not even include the test setup... Quote from techradar.com:

    We expected the 6870 to perform better than it did – especially as this is essentially being pitched as a GTX 460 killer.
    The problem is, Nvidia's price cuts have made this an impossible task, with the FTW edition of the GTX 460 rolling in at just over £170, yet competently outperforming the 6870 in every benchmark we threw at it.
    In essence, therefore, all the 6870 manages is to unseat the 5850 which given its end of life status isn't too difficult a feat. We'd still recommend buying a GTX 460 for this sort of cash. All tests ran at 1,920 x 1,080 at the highest settings, apart from AvP, which was ran at 1,680 x 1,050.

    http://www.techradar.com/reviews/pc-mac/pc-compone...
  • oldscotch - Friday, October 22, 2010 - link

    ...where a Civilization game would be used for a GPU benchmark.
  • AnnihilatorX - Friday, October 22, 2010 - link

    It's actually quite taxing on the maps. It lags on my HD4850.

    The reason is, it uses DX 11 DirectCompute features on texture decompression. The performance is noticeably better on DX11 cards.
  • JonnyDough - Friday, October 22, 2010 - link

    "Ultimately this means we’re looking at staggered pricing. NVIDIA and AMD do not have any products that are directly competing at the same price points: at every $20 you’re looking at switching between AMD and NVIDIA."

    Not when you figure in NVidia's superior drivers, or power consumption...depending on which one matters most to you.
  • Fleeb - Friday, October 22, 2010 - link

    I looked at the load power consumption charts and saw the Radeon cards are better in this department and I don't clearly understand your statement. Did you mean that the nVidia cards in these tests should be better because of superior power consumption or that their power consumption is superior in a sense that nVidia cards consume more power?
  • jonup - Friday, October 22, 2010 - link

    I think he meant the nVidia has better drivers but worse power consumption. So it all depends on what you value most. At least that's how I took it.
  • zubzer0 - Friday, October 22, 2010 - link

    Great review!

    If you have the time I would be wery happy if you test how well these boards do in Age of Conan DX10?

    Some time ago you included (feb. 2009) Age of Conan in your reviews, but since then DX10 support was added to the game. I have yet to see an official review of the current graphics cards performance in AoC DX10.

    Btw. With the addon "Rise of the godslayer" the graphics in the new Khitai zone are gorgeous!
  • konpyuuta_san - Friday, October 22, 2010 - link

    In my case (pun intended), the limiting factor is the physical size of the card. I've abandoned the ATX formats completely, going all out for mini-ITX (this one is Silverstone's sugo sg06). The king of ITX cases might still be the 460, but this is making me feel a bit sore about the 460 I'm just about to buy. Especially since the 6870 is actually only $20 more than the 6850 where I live and the 6850 is identically priced to the 460. There's just no way I can fit a 10.5 inch card into a 9 inch space. The 9 inch 6850 would fit, but there's a large radiator mounted on the front of the case, connected to a cpu water cooling block, that will interfere with the card. I've considered some crazy mods to the case, but those options just don't feel all that attractive. The GTX460 is a good quarter inch shorter and I'm getting a model with top-mounted power connectors so there's ample room for everything in this extremely packed little gaming box. I'm still kind of trying to find a way to put a 6850 in there (bangs and bucks and all that), which leads to my actual question, namely:

    The issue of rated power consumption; recommended minimum for the 460 is 450W (which I can support), but for the 6850 it's 500W (too much). How critical are those requirements? Does the 6850 really require a 500W supply? Despite having lower power consumption than the 460?! Or is that just to ensure the PSU can supply enough amps on whatever rail the card runs off? If my 450W SFF PSU can't supply the 6850, it really doesn't matter how much better or cheaper it is ....
  • joshua4000 - Friday, October 22, 2010 - link

    let me get this straigt, fermi was once too expensive to manufacture due to its huge die and stuff but its striped down versions sell for less and outpace newley released amd cards (by a wide margin when looked at the 470)

    amds cheaper to manufacture cards (5xxx) on the other hand came in overpriced once the 460 had been released (if they havent been over priced all along...), still, the price did not drop to levels nvidia could not sell products without making a loss.

    amd has optimised an already cheap product price wise, that does not outperforme the 470 or an oced 460 while at the same time selling for the same amount $.

    considering manufacturing and pricing of the 4870 in its last days, i guess amd will still be making money out of those 6xxx when dropping the price by 75% msrp.
  • NA1NSXR - Friday, October 22, 2010 - link

    Granted there have been a lot of advancements in the common feature set of today's cards and improvement in power/heat/noise, but the absolute 3D performance has been stagnant. I am surprised the competition was called alive and well in the final words section. I built my PC back in 7/2009 using a 4890 which cost $180 then. Priced according to the cards in question today, it would slot in roughly the same spot, meaning pretty much no performance improvement at all since then. Yes, I will repeat myself to ward off what is certainly coming - I know the 4890 is a pig (loud, noisy, power hungry) compared to the cards here. However, ignoring those factors 3D performance has barely budged in more than a year. Price drops on 5xxx was a massive disappointment for me. They never came in the way I thought was reasonable to expect after 4xxx. I am somewhat indifferent because in my own PC cycle I haven't been in the market for a card, but like I said before, disappointment in the general market and i wouldn't really agree with the statement that competition is alive and well, at least in any sense that is benefiting people who weight performance more heavily in criteria.

Log in

Don't have an account? Sign up now