Barts: The Next Evolution of Cypress

At the heart of today’s new cards is Barts, the first member of AMD’s Northern Island GPUs. As we quickly hinted at earlier, Barts is a very direct descendant of Cypress. This is both a product of design, and a product of consequences.

It should come as no surprise that AMD was originally looking to produce what would be the Northern Islands family on TSMC’s 32nm process; as originally scheduled this would line up with the launch window AMD wanted, and half-node shrinks are easier for them than trying to do a full-node shrink. Unfortunately the 32nm process quickly became doomed for a number of reasons.

Economically, per-transistor it was going to be more expensive than the 40nm process, which is a big problem when you’re trying to make an economical chip like Barts. Technologically, 32nm was following TSMC’s troubled 40nm process; TSMC’s troubles ended up being AMD’s troubles when they launched the 5800 series last year, as yields were low and wafers were few, right at a time where AMD needed every chip they could get to capitalize on their lead over NVIDIA. 32nm never reached completion so we can’t really talk about yields or such, but it’s sufficient to say that TSMC had their hands full fixing 40nm and bringing up 28nm without also worrying about 32nm.

Ultimately 32nm was canceled around November of last year. But even before that AMD made the hard choice to take a hard turn to the left and move what would become Barts to 40nm. As a result AMD had to make some sacrifices and design choices to make Barts possible on 40nm, and to make it to market in a short period of time.

For these reasons, architecturally Barts is very much a rebalanced Cypress, and with the exception of a few key changes we could talk about Barts in the same way we talked about Juniper (the 5700 series) last year.


Click to enlarge

Barts continues AMD’s DirectX 11 legacy, building upon what they’ve already achieved with Cypress. At the SPU level, like Cypress and every DX10 AMD design before it continues to use AMD’s VLIW5 design. 5 stream processors – the w, x, y, z, and t units – work together with a branch unit and a set of GPRs to process instructions. The 4 simple SPs can work together to process 4 FP32 MADs per clock, while the t unit can either do FP32 math like the other units or handle special functions such as a transcendental. Here is a breakdown of what a single Barts SPU can do in a single clock cycle:

  • 4 32-bit FP MAD per clock
  • 4 24-bit Int MUL or ADD per clock
  • SFU : 1 32-bit FP MAD per clock

Compared to Cypress, you’ll note that FP64 performance is not quoted, and this isn’t a mistake. Barts isn’t meant to be a high-end product (that would be the 6900 series) so FP64 has been shown the door in order to bring the size of the GPU down. AMD is still a very gaming-centric company versus NVIDIA’s philosophy of GPU computing everywhere, so this makes sense for AMD’s position, while NVIDIA’s comparable products still offer FP64 if only for development purposes.

Above the SPs and SPUs, we have the SIMD. This remains unchanged from Cypress, with 80 SPs making up a SIMD. The L1 cache and number of texture units per SIMD remains at 16KB L1 texture, 8KB L1 compute, and 4 texture units per SIMD.

At the macro level AMD maintains the same 32 ROP design (which combined with Barts’ higher clocks, actually gives it an advantage over Cypress). Attached to the ROPs are AMD’s L2 cache and memory controllers; there are 4 128KB blocks of L2 cache (for a total of 512KB L2) and 4 64bit memory controllers that give Barts a 256bit memory bus.

Barts is not just a simple Cypress derivative however. For non-gaming/compute uses, UVD and the display controller have both been overhauled. Meanwhile for gaming Barts did receive one important upgrade: an enhanced tessellation unit. AMD has responded to NVIDIA’s prodding about tessellation at least in part, equipping Barts with a tessellation unit that in the best-case scenario can double their tessellation performance compared to Cypress. AMD has a whole manifesto on tessellation that we’ll get in to, but for now we’ll work with the following chart:

AMD has chosen to focus on tessellation performance at lower tessellation factors, as they believe these are the most important factors for gaming purposes. From their own testing the advantage over Cypress approaches 2x between factors 6 and 10, while being closer to a 1.5x increase before that and after that up to factor 13 or so. At the highest tessellation factors Barts’ tessellation unit falls to performance roughly in line with Cypress’, squeezing out a small advantage due to the 6870’s higher clockspeed. Ultimately this means tessellation performance is improved on AMD products at lower tessellation factors, but AMD’s tessellation performance is still going to more-or-less collapse at high factors when they’re doing an extreme amount of triangle subdivision.

So with all of this said, Barts ends up being 25% smaller than Cypress, but in terms of performance we’ve found it to only be 7% slower when comparing the 6870 to the 5870. How AMD accomplished this is the rebalancing we mentioned earlier.

Based on AMD’s design decisions and our performance data, it would appear that Cypress has more computing/shading power than it necessarily needs. True, Barts is slower, but it’s a bit slower and a lot smaller. AMD’s various compute ratios, such as compute:geometry and compute:rasterization would appear to be less than ideal on Cypress. So Barts changes the ratios.

Compared to Cypress and factoring in 6870/5870 clockspeeds, Barts has about 75% of the compute/shader/texture power of Cypress. However it has more rasterization, tessellation, and ROP power than Cypress; or in other words Barts is less of a compute/shader GPU and a bit more of a traditional rasterizing GPU with a dash of tessellation thrown in. Even in the worst case scenarios from our testing the drop-off at 1920x1200 is only 13% compared to Cypress/5870, so while Cypress had a great deal of compute capabilities, it’s clearly difficult to make extremely effective use of it even on the most shader-heavy games of today.

However it’s worth noting that internally AMD was throwing around 2 designs for Barts: a 16 SIMD (1280 SP) 16 ROP design, and a 14 SIMD (1120 SP) 32 ROP design that they ultimately went with. The 14/32 design was faster, but only by 2%. This along with the ease of porting the design from Cypress made it the right choice for AMD, but it also means that Cypress/Barts is not exclusively bound on the shader/texture side or the ROP/raster side.

Along with selectively reducing functional blocks from Cypress and removing FP64 support, AMD made one other major change to improve efficiency for Barts: they’re using Redwood’s memory controller. In the past we’ve talked about the inherent complexities of driving GDDR5 at high speeds, but until now we’ve never known just how complex it is. It turns out that Cypress’s memory controller is nearly twice as big as Redwood’s! By reducing their desired memory speeds from 4.8GHz to 4.2GHz, AMD was able to reduce the size of their memory controller by nearly 50%. Admittedly we don’t know just how much space this design choice saved AMD, but from our discussions with them it’s clearly significant. And it also perfectly highlights just how hard it is to drive GDDR5 at 5GHz and beyond, and why both AMD and NVIDIA cited their memory controllers as some of their biggest issues when bringing up Cypress and GF100 respectively.

Ultimately all of these efficiency changes are necessary for AMD to continue to compete in the GPU market, particularly in the face of NVIDIA and the GF104 GPU powering the GTX 460. Case in point, in the previous quarter AMD’s graphics division only made $1mil in profit. While Barts was in design years before that quarter, the situation still succinctly showcases why it’s important to target each market segment with an appropriate GPU; harvested GPUs are only a stop-gap solution, in the end purposely crippling good GPUs is a good way to cripple a company’ s gross margin.

Index Seeing the Future: DisplayPort 1.2
Comments Locked

197 Comments

View All Comments

  • 529th - Saturday, October 23, 2010 - link

    the marketers wanted to differentiate themselves from Nvidia, that's why they are using their second place cards to be in the same category as nvidias second place cards

    If you are shopping for a top of the line card you should know atleast a little bit about them although the un-educated video-card shopper would think that a 470 and 5870 or 6870 is on the SAME performance level, WHICH ISN'T TOO FAR FROM THE TRUTH, but I think it's here where AMD marketers are trying to make a statement

    i could be wrong, i have had very little sleep last night, cedar point was a blast!
  • SininStyle - Saturday, October 23, 2010 - link

    Can I just say THANK YOU for adding a OC edition of the 460. Don't know why everyone is whining. If you don't want to know how an OC edition compares then ignore the stupid bench for it. Why is such a huge deal?
    I personally am glad they included it and this is why. The 460 1gb stock is 675mhz and can OC "reliably" to 850mhz.. That's 175mhz gain and its noticeable. Stock volt stock fan. And for those that wanna claim heat, mine shows 64c at 75% fan on OCCT. The 6870 get 50hz OC at stock volt/fan. SEE why this is important people? $180 vs $240 with same results.

    Now with volt changes I'm sure they both have room to go I'm not sure how much. I tend to shy away from higher voltages at least for now.

    The 6850 is the better buy between the 2 68xx cards. That has allot of headroom to OC. That would even be a better comparison to the 460 due to the price. And owning the 460 doesn't make me a fanboy and I will say you can flip a coin for value on these 2.

    So again thanks for the added information. Cant see why anyone would complain about more info. If you don't like the info ignore it if it makes you feel better. Feel free to add OCed 6850s and 6870s I look forward to the comparison.
  • Parhel - Saturday, October 23, 2010 - link

    "The 460 1gb stock is 675mhz and can OC "reliably" to 850mhz"

    No, it absolutely cannot. the FTW card is a "golden sample" which is why there are so few available. Stock cooling on a stock card will not get you to 850Mhz with 24/7 reliability. You *might* get to 800Mhz, probably a bit less. That's a great value, IMO. If I were in the market at the moment, I'd pick a base model GTX 460 and OC it. Not arguing that point at all. But presenting this card in the 6870 launch article is a sham and major black eye to Anandtech's credibility.
  • rom0n - Saturday, October 23, 2010 - link

    Is it possible to post the GPUZ of the HD6850. It seems there are numerous cases where HD6850 has 1120 sent out to reviewers. See
    http://benchmarkreviews.com/index.php?option=com_c... If this happens to be one of them the results may be a little misleading. If not then it'll reaffirm the results.
  • GullLars - Saturday, October 23, 2010 - link

    This means a 6870 with open-air fan optimized for noise will be my early winter solstice present for myself, togheter with the 4x C300 64GB i just got :D
    I went for a value-upgrade of my old rigg with P2x6 1090T, 8GB kingston value DDR3, and AM3 mobo with SB850, so once i get both the SSD in RAID-0 and the GPU, I'll be a happy camper (or rusher) <3
    It'll tide me over untill i can get Bulldozer or a next gen Intel (high end/workstation) around winter 2011/2012.
  • poohbear - Saturday, October 23, 2010 - link

    "Apparently a small number of the AMD Radeon HD 6850 press samples shipped from AIB partners have a higher-than-expected number of stream processors enabled.

    This is because some AIBs used early engineering ASICs intended for board validation on their press samples. The use of these ASICs results in the incorrect number of stream processors. If you have an HD 6850 board sample from an AIB, please test using a utility such as GPU-z to determine the number of active stream processors. If that number is greater than 960, please contact us and we will work to have your board replaced with a production-level sample.

    All boards available in the market, as well as AMD-supplied media samples, have production-level GPUs with the correct 960 stream processors."

    so which one did Anandtech get? false marketing is such BS, just wanna be sure your benchmarks for the 6850 are reliable and we're not getting overrated benchmarks due to a cherry picked review sample.
  • lakrids - Saturday, October 23, 2010 - link

    The review ended up looking like an advertisement for EVGA at page 7 and beyond. Why EVGA? Why not some other brand?
    Why include that brand at all? Just mark the card "GTX 460 OC'd 850MHz".

    At the very first benchmark: Crysis 2560x1600, you didn't include the reference GTX 460, you pitched the HD6870 against the EVGA overclocked version. EVGA here, EVGA there, EVGA everywhere.

    Would you blame me if I suspect you of being on EVGA's paycheck?
  • Lolimaster - Sunday, October 24, 2010 - link

    When I call you a Intel/Nvidia biased site I'm saying the truth. Are you reviewing the HD6000 or doins an EVGA product reviews.

    This is an insult.

    Message:
    Nvidia will disappear like the dodo, just a bit more time and at that time all this sh1t will end.
  • SininStyle - Sunday, October 24, 2010 - link

    You do understand if Nvidia vanishes the price of GPUs goes through the roof right? Nvidia isnt going to vanish any earlier then Radeon. Saying either just translates into "Im a fanboy"

    Stop defending a sticker and start shopping price performance. Neither company would hesitate to rape your wallet if the other would allow it. Case in point look at the price of the 57xx and 58xx 2 months ago. Then look at the price of the same cards including the 68xx cards now. Any of these cards perform less then they did 2 months ago? But the price is a whole lot cheaper isnt it? Well you can thank the 460 for that. Competition results in better pricing for the same performance. You should be thanking Nvidia not hating them.
  • Super_Herb - Sunday, October 24, 2010 - link

    I love it - "as a matter of policy we do not include overclocked cards on general reviews"..........but this time nVidia said pretty please so we did. But because our strict ethical policy doesn't allow us to include them we'll just tell you we did it this one special time because a manufacturer specifically sent us a special card and then our integrity is still 100% intact......right? Besides, the "special" card nVidia sent us was so shiny and pretty!

    Back to [H]ard to get the real story.

Log in

Don't have an account? Sign up now