Barts: The Next Evolution of Cypress

At the heart of today’s new cards is Barts, the first member of AMD’s Northern Island GPUs. As we quickly hinted at earlier, Barts is a very direct descendant of Cypress. This is both a product of design, and a product of consequences.

It should come as no surprise that AMD was originally looking to produce what would be the Northern Islands family on TSMC’s 32nm process; as originally scheduled this would line up with the launch window AMD wanted, and half-node shrinks are easier for them than trying to do a full-node shrink. Unfortunately the 32nm process quickly became doomed for a number of reasons.

Economically, per-transistor it was going to be more expensive than the 40nm process, which is a big problem when you’re trying to make an economical chip like Barts. Technologically, 32nm was following TSMC’s troubled 40nm process; TSMC’s troubles ended up being AMD’s troubles when they launched the 5800 series last year, as yields were low and wafers were few, right at a time where AMD needed every chip they could get to capitalize on their lead over NVIDIA. 32nm never reached completion so we can’t really talk about yields or such, but it’s sufficient to say that TSMC had their hands full fixing 40nm and bringing up 28nm without also worrying about 32nm.

Ultimately 32nm was canceled around November of last year. But even before that AMD made the hard choice to take a hard turn to the left and move what would become Barts to 40nm. As a result AMD had to make some sacrifices and design choices to make Barts possible on 40nm, and to make it to market in a short period of time.

For these reasons, architecturally Barts is very much a rebalanced Cypress, and with the exception of a few key changes we could talk about Barts in the same way we talked about Juniper (the 5700 series) last year.

Click to enlarge

Barts continues AMD’s DirectX 11 legacy, building upon what they’ve already achieved with Cypress. At the SPU level, like Cypress and every DX10 AMD design before it continues to use AMD’s VLIW5 design. 5 stream processors – the w, x, y, z, and t units – work together with a branch unit and a set of GPRs to process instructions. The 4 simple SPs can work together to process 4 FP32 MADs per clock, while the t unit can either do FP32 math like the other units or handle special functions such as a transcendental. Here is a breakdown of what a single Barts SPU can do in a single clock cycle:

  • 4 32-bit FP MAD per clock
  • 4 24-bit Int MUL or ADD per clock
  • SFU : 1 32-bit FP MAD per clock

Compared to Cypress, you’ll note that FP64 performance is not quoted, and this isn’t a mistake. Barts isn’t meant to be a high-end product (that would be the 6900 series) so FP64 has been shown the door in order to bring the size of the GPU down. AMD is still a very gaming-centric company versus NVIDIA’s philosophy of GPU computing everywhere, so this makes sense for AMD’s position, while NVIDIA’s comparable products still offer FP64 if only for development purposes.

Above the SPs and SPUs, we have the SIMD. This remains unchanged from Cypress, with 80 SPs making up a SIMD. The L1 cache and number of texture units per SIMD remains at 16KB L1 texture, 8KB L1 compute, and 4 texture units per SIMD.

At the macro level AMD maintains the same 32 ROP design (which combined with Barts’ higher clocks, actually gives it an advantage over Cypress). Attached to the ROPs are AMD’s L2 cache and memory controllers; there are 4 128KB blocks of L2 cache (for a total of 512KB L2) and 4 64bit memory controllers that give Barts a 256bit memory bus.

Barts is not just a simple Cypress derivative however. For non-gaming/compute uses, UVD and the display controller have both been overhauled. Meanwhile for gaming Barts did receive one important upgrade: an enhanced tessellation unit. AMD has responded to NVIDIA’s prodding about tessellation at least in part, equipping Barts with a tessellation unit that in the best-case scenario can double their tessellation performance compared to Cypress. AMD has a whole manifesto on tessellation that we’ll get in to, but for now we’ll work with the following chart:

AMD has chosen to focus on tessellation performance at lower tessellation factors, as they believe these are the most important factors for gaming purposes. From their own testing the advantage over Cypress approaches 2x between factors 6 and 10, while being closer to a 1.5x increase before that and after that up to factor 13 or so. At the highest tessellation factors Barts’ tessellation unit falls to performance roughly in line with Cypress’, squeezing out a small advantage due to the 6870’s higher clockspeed. Ultimately this means tessellation performance is improved on AMD products at lower tessellation factors, but AMD’s tessellation performance is still going to more-or-less collapse at high factors when they’re doing an extreme amount of triangle subdivision.

So with all of this said, Barts ends up being 25% smaller than Cypress, but in terms of performance we’ve found it to only be 7% slower when comparing the 6870 to the 5870. How AMD accomplished this is the rebalancing we mentioned earlier.

Based on AMD’s design decisions and our performance data, it would appear that Cypress has more computing/shading power than it necessarily needs. True, Barts is slower, but it’s a bit slower and a lot smaller. AMD’s various compute ratios, such as compute:geometry and compute:rasterization would appear to be less than ideal on Cypress. So Barts changes the ratios.

Compared to Cypress and factoring in 6870/5870 clockspeeds, Barts has about 75% of the compute/shader/texture power of Cypress. However it has more rasterization, tessellation, and ROP power than Cypress; or in other words Barts is less of a compute/shader GPU and a bit more of a traditional rasterizing GPU with a dash of tessellation thrown in. Even in the worst case scenarios from our testing the drop-off at 1920x1200 is only 13% compared to Cypress/5870, so while Cypress had a great deal of compute capabilities, it’s clearly difficult to make extremely effective use of it even on the most shader-heavy games of today.

However it’s worth noting that internally AMD was throwing around 2 designs for Barts: a 16 SIMD (1280 SP) 16 ROP design, and a 14 SIMD (1120 SP) 32 ROP design that they ultimately went with. The 14/32 design was faster, but only by 2%. This along with the ease of porting the design from Cypress made it the right choice for AMD, but it also means that Cypress/Barts is not exclusively bound on the shader/texture side or the ROP/raster side.

Along with selectively reducing functional blocks from Cypress and removing FP64 support, AMD made one other major change to improve efficiency for Barts: they’re using Redwood’s memory controller. In the past we’ve talked about the inherent complexities of driving GDDR5 at high speeds, but until now we’ve never known just how complex it is. It turns out that Cypress’s memory controller is nearly twice as big as Redwood’s! By reducing their desired memory speeds from 4.8GHz to 4.2GHz, AMD was able to reduce the size of their memory controller by nearly 50%. Admittedly we don’t know just how much space this design choice saved AMD, but from our discussions with them it’s clearly significant. And it also perfectly highlights just how hard it is to drive GDDR5 at 5GHz and beyond, and why both AMD and NVIDIA cited their memory controllers as some of their biggest issues when bringing up Cypress and GF100 respectively.

Ultimately all of these efficiency changes are necessary for AMD to continue to compete in the GPU market, particularly in the face of NVIDIA and the GF104 GPU powering the GTX 460. Case in point, in the previous quarter AMD’s graphics division only made $1mil in profit. While Barts was in design years before that quarter, the situation still succinctly showcases why it’s important to target each market segment with an appropriate GPU; harvested GPUs are only a stop-gap solution, in the end purposely crippling good GPUs is a good way to cripple a company’ s gross margin.

Index Seeing the Future: DisplayPort 1.2


View All Comments

  • campbbri - Friday, October 22, 2010 - link

    Thanks for the great review. I don't know why everyone is complaining about mixing OC and Non-OC cards when you were extremely explicit in pointing it out. Reply
  • krumme - Friday, October 22, 2010 - link

    I dont think you dont know why everyone is complaining.

    First. To be fair its far from everyone :), unfortunately because Anand is surrounded by far to many yes sayers. All positve. Great in many ways. But it does not develop the site as it could. There is a great huge community, and there is plenty of ressources to get ideas to new methology.

    Its good - if not vital - that Kyle is explicit about it. Otherwise it wouldnt be worth critizicing, then it would just look like a payed job, and nobody would care. Its not. But beeing explicit is not enough even if its most important and a huge quality. You need to have a good case. And Anand does have a very bad case.

    Read what Kyle wrote againg. Do you think this is his best and most sound decicion in his life? do he feel comfortable about it?

    He did betray himself a little bit. And he shouldnt do it. He should lissen to his own doubt.
  • snarfbot - Friday, October 22, 2010 - link

    yes i understand that, but i cant see how you can call a direct replacement that fails to outperform its predecessor as a success.

    especially when you consider that the prices have increased after launch as opposed to decrease as is normal. and have remained artificially high since, due to limitations at tsmc, which renders the cost argument pretty much moot.

    how about an analogy.

    6870 is to 5870 as 4770 is to 4870.

    and its on the same process which makes it even worse, although you cant really blame amd for that.

    you can very much blame their marketing department for making such a terrible decision though.

    its a terrible name, thats the whole point, at whatever price you cant call it a 6870 if it cant beat a 5870.
  • Trefugl - Friday, October 22, 2010 - link

    yes i understand that, but i cant see how you can call a direct replacement that fails to outperform its predecessor as a success.

    But the issue is that the 68xx series alone aren't really replacing the 58xx series. I think they are really splitting what the direct replacement to that market would have been into two - the 69xx (high-end enthusiast) and the 68xx (high-end mid-range).

    I agree that the naming scheme isn't the best, but I think a lot of that could have been mitigated (and maybe even made a non-issue) if the 68xx's weren't the first to launch. If the 69xx came out first people would have accepted them and been happy, but instead we have b*tching because of naming confusion...
  • Targon - Sunday, October 24, 2010 - link

    I missed this too until someone pointed out what I missed. The Radeon 6900 series will replace the 5800 series at the high end, and IS the proper high end part you are looking for.

    Back when DirectX 9 first came out, ATI only had DirectX 9 support in the old Radeon 9500 and 9700. When the X300, X600, and X800 came out, notice that AMD took the cards and started at 600 and 800, rather than 500 and 700 for the mid ranged and high end cards. This has continued a bit. In the HD 2000 series, you even had the HD 2900XT on the high end of the series, but then they went to the 3800, 4800, and 5800 series to mark the high end cards.

    So, AMD/ATI has been tweaking the names a fair bit. What initially threw me off is that the next generation high end cards are not the first cards to show up, and we have the mid-ranged cards showing up first.

    If the article said clearly, "We are reviewing the next generation mid range cards with the high generation 6900 due out next month" right up front in the article instead of buried in the text somewhere on page 2(or was it 3), there would have been less confusion.

    I don't mind the change in numbers if all parts come out at the same time, but for now, there is ONLY confusion because we have yet to see the 6970.
  • GaMEChld - Friday, October 14, 2011 - link

    I love how people are arguing over this naming change. As if people who buy discrete cards or look at video card specs don't know what their doing. If you don't know what you're buying, it serves you right.

    I don't know why this was so hard for people to understand. The 5700 was incredibly successful. AMD wanted to preserve that card for its performance and value. Thus, the 6700 name was taken. The 6800 model is a new model that sits BETWEEN where the 5700 and 5800 line had. If you recall, there was a MASSIVE performance gap between those lines, and AMD felt they should have something to bridge that gap.

    The new 6800 line bridges that gap. It offers NEAR 5800 power at a significant price reduction.

    And now ALL of the top tier cards are housed under the 6900 bracket, with the 6990 taking the dual GPU slot. If I had anything to complain about its the abandonment of the X2 designation on dual GPU cards.

    In fact, the only thing people should be angry about is the fact that the 6700 is virtually identical to the 5700 and offers little performance advantage. THAT is what is reminiscent of the 8800GT -> 9800GT transition. However, since the 5700 was a midrange product, maybe it received less attention than it should have.
  • DanaG - Friday, October 22, 2010 - link

    Now, if the 6870 is what should've been a 6770, and a 6970 is what should've been a 6870... then what'll they call what should've been a 6970? 6-10-70 / 6ten70? 6X70? 6999? Or will they go to 6970 X2? Reply
  • spigzone - Saturday, October 23, 2010 - link

    6990 ... yhat wasn't so hard now, was it? Reply
  • AMD_Pitbull - Saturday, October 23, 2010 - link

    Gotta say, I agree 100%. I really don't understand why everyone is getting so bloody upset with this. New product, new line. You couldn't predict what was going to happen? Sorry. Companies like to keep people guessing.

    Also, if you really want to get technical, this 6870 DOES beat the 5870 if a few things as well. Overall greater effective product AND cheaper? Win in my books. Sorry QQ'ers.
  • dvijaydev46 - Saturday, October 23, 2010 - link

    I tried converting a video file using my 5770 Hawk with MediaEspresso 6 (with hardware acceleration enabled of course), I wasn't impressed but Mediashow 5 properly utilized the GPU power and the speed difference in converting was clear. I'm not sure if there was a problem in the installation of my copy of MediaEspresso 6, but I think you guys can use Mediashow 5 to see if there is any difference in video conversion time with an AMD GPU as I don't have any other card. Reply

Log in

Don't have an account? Sign up now