Re-evaluating Strategy, Creating the RV770 in 2005

Remembering what lead to success rather than the success itself is generally more important, and around 2005 ATI had another R300 moment. What made R300 successful was ATI re-evaluating the way it made GPUs and deciding on something that made sense. At the time, it made sense for ATI to work towards building the biggest chip possible, win at the high end and translate that into better, more competitive products at all other price points. It is worth mentioning that part of R300’s continued success was due to the fact that NVIDIA slipped up royally with the GeForce FX, it was the perfect storm and ATI capitalized. ATI built the biggest chip and NVIDIA had no real answer for some time to come.

Just as Intel bumped into the power wall when it tried to ramp Pentium 4 clock speeds to ridiculous levels, ATI was noticing that GPUs were getting a little too big for comfort. Design cycles were getting longer, die size was growing and financially there were some things that didn’t quite make sense when it came to making these large GPUs.

ATI asserted that a very large GPU, without the use of repair structures or harvesting (two techniques I’ll describe in a bit) may only have a 30% yield. That means for every 100 GPUs produced on a single wafer, only 30% of them would be fully functional and could be sold as advertised. Manufacturing these complex circuits is particularly expensive, those fab plants easily cost a couple of billion dollars to build and thus having most of your wafer go to waste isn’t really a good way of running a business.

Thankfully there are ways to improve that crappy 30% yield to a value north of 90%. The first is a technique generally known as repairability. The idea behind repairability is simple: build redundancy into your design. If you have a unit with 10 shader processors, actually build 11 but use one as a spare should there be a defect in any of the remaining processors. The same technique is often used for on-die memory, include additional cells so that a defect in one part of a cache won’t render the entire chip unusable. There are tradeoffs here however, if you build in too much repairability then you run the risk of having a die that’s large but offers no additional performance benefit. If you don’t build in enough repairability then your yield suffers.

Harvesting is another technique that all of you are quite familiar with, although its street name is feature binning. The idea here is that you assume that only a small percentage of the die on a wafer will be completely functional, but instead of throwing away the ones that aren’t fully functional, you disable features and sell them as a lower end part. For example, if your target architecture has 10 shader processors but only 30% of your production will have 10 working shader processors but 50% of the production will have 8 working units you’d sell a model with 10 SPs and a model with 8 SPs, the latter being at a reduced cost. This way instead of throwing away 70% of the wafer, you only throw away 20% - and if possible you could make a third SKU with fewer number of functional SPs and use close to 100% of the wafer.

Yields do improve over time, and what eventually happens if you build a product relying on harvesting is you hurt yourself financially. In the example above where you have two configurations of the same die, yields will eventually improve to the point where your 8 SP die could just as easily have 10 functional SPs but you have to sell them as reduced functionality parts because you’ve built up market demand for them. In a sense, you’re selling something worth $1 for $0.80. You could always create a new version of the chip with a smaller die, but that takes additional engineering resources, time and money.

The most recent public example of die harvesting was actually with NVIDIA’s GT200. The GeForce GTX 260 was simply a harvested version of the GTX 280 with 192 SPs vs. 240. As yields improved, NVIDIA introduced the GeForce GTX 260 Core 216 with 216 SPs. Note that NVIDIA didn’t use any repairability for the shader processors on GT200 there are no more than 240 functional SPs on a GT200 die, but that’s because NVIDIA would rely on harvesting anyway - if any of the 240 SPs didn’t work the chip would be branded as a GTX 260 or Core 216.

This is in stark contrast to RV770 where both versions of the GPU were functionally identical, the only differences were clock speeds and power requirements. But from a die standpoint, ATI didn’t have to sell any reduced versions of the chip to keep yields high.

Note that whether or not a company has to rely on harvesting doesn’t really impact the end user. In this case, the GeForce GTX 260 Core 216 actually benefitted the end user as it manages to deliver the same sort of value that AMD does with the Radeon HD 4870. It isn’t ideal for NVIDIA financially and if a company has to rely on harvesting for too long it will eventually take its toll, but I suspect that in the GT200 case NVIDIA will bail itself out with a 55nm revision of GT200 before things get too bad.

ATI didn’t want to build a GPU that would rely on excessive repair and harvesting to keep yields high. And then came the killer argument: building such a GPU was no longer in the best interests of its customers.

Sidenote: Building a GPU

Before we get too far down the rabbit hole it’s worth adding some perspective. ATI’s RV770 design took three years of work, that means ATI started in 2005. It takes about a year for manufacturability from tapeout to final product ship, another 1 - 1.5 years for design and another year for architecture.

The team that I met with in Santa Clara was talking about designs that would be out in 2012, we’re talking 4 TFLOPS of performance here, 4x the speed of RV770.

Decisions being made today are going to impact those products and in a sense it’s a big guessing game, ATI has no idea what NVIDIA will have in 2012 and NVIDIA has no idea what ATI will have. Each company simply tries to do the best it can for the customers and hopes they’ve got the right mix of performance, price and power.

To give you an idea of the intensity of some of these debates, Carrell debated, every evening, for three hours, for three weeks, the amount of repairability that would go into the RV770. Building redundant structures in the RV770 was an even bigger deal than before because ATI was going to be working within serious die constraints, simply making the chip bigger wasn’t an option. Repair structures in the RV770 wouldn’t add any performance but would increase the die size, on an already large chip that’s fine, but on a chip that’s supposed to target the sweet spot and can’t grow without limit it’s a big deal.

The Beginning: The Shot Heard Around the World Building a GPU for the Masses
POST A COMMENT

115 Comments

View All Comments

  • MrSpadge - Saturday, December 06, 2008 - link

    Exactly what I was thinking! That's why I got a 8500LE back then, when Geforce 4 was not in (public) sight yet. Reply
  • FireSnake - Wednesday, December 03, 2008 - link

    ... which one is Anand (on the picture at the beginning of the article)?

    I always wondered how he looks like ... I guess the one on the right.
    Reply
  • 3DoubleD - Wednesday, December 03, 2008 - link

    I've had Anandtech as my home page for 5 years and I've read almost every article since (and even some of the older ones). This is by far one of your greatest works!

    Thanks
    Reply
  • hellstrider - Wednesday, December 03, 2008 - link

    Kudos to Anand for such a great article, extremely insightful. I may even go out and purchase AMD stock now :)

    I love AMD even when it’s on the bottom, I own 780G + X2 + hd4850, in hopes that Deneb (or AM3 processors for that matter) will come in time to repeat the success of rv770 launch, at which point I will upgrade my obsolete X2 and have a sweet midrange machine.

    My only concern is that Nvidia is looking at all this smirking and planning an onslaught with the 55nm refresh. There is a very “disturbing” article at Xbitlabs that Nvidia is stock-piling the 55nm GT200 parts; seems like that’s something they would do – start selling those soon and undercut 4800 series badly.
    I’m just a concerned hd4850 owner and I don’t want to see my card obsolete within couple of months. I don’t really see AMD’s answer to 55nm GT200 in such short period of time?!?!

    Any thoughts?
    Reply
  • Goty - Wednesday, December 03, 2008 - link

    I don't think you'll have to worry too badly about the 55nm G200s. NVIDIA won't drop prices much, if at all; they're already smarting from the price drops enacted after the RV770 launch. There's also the fact that the 4850 isn't in the same market space as any of the G200 cards, so they're not really competitive anyhow. Reply
  • ltcommanderdata - Wednesday, December 03, 2008 - link

    I always imagined designing GPUs would be very stressful given you're trying to guess things years in advance, but this inside look at how things are done was very informative.

    On GDDR5, it's interesting to read that ATI was pushing so hard for this technology and they felt it was their only hope for the RV770. What about GDDR4? I thought ATI was a big supporter of it too and was the first to implement it. I'm pretty sure Samsung announced GDDR4 that could run at 3.2GBit/s in 2006 which isn't far from the 3.6GBit/s GDDR5 used in the 4870, and 4GBit/s GDDR4 was available in 2007. I guess there are still power savings to be had from GDDR5, but performance-wise I don't think it would have been a huge loss if GDDR5 had been delayed and ATI had to stick with GDDR4.

    And another interesting point in your article was definitely about the fate of the 4850. You report that ATI felt that the 4870 was perfectly specced and wasn't changed. I guess that meant they were always targeting the 750MHz core frequency that it launched with. Yet ATI was originally targeting the 4850 at 500MHz clock. With the 4870 being clocked 50% faster, I think it should be obvious to anyone just looking at the clock speed that there would be a huge performance gap between the 4850 and 4870. I believe the X1800XL and X1800XT had a similarly large performance gap. Thankfully Dave Baumann convinced them to clock the 4850 up to a more reasonable 625MHz core.

    One thing that I feel was missing from the article was how the AMD acquisition effected the design of the RV770. Perhaps there wasn't much change or the design was already set so AMD couldn't have changed things even if they wanted to, but they must have had an opinion. AMD was probably nervous that they bought ATI at it's height when the R580 was out and top, but once acquired, the R600 came out and underperformed. Would be interesting to know what AMD's initial opinion of ATI's small die, non-top tier targetted strategy was although it now seems to be more consistent with AMD's CPU strategy since they aren't targeting the high-end there anymore either.
    Reply
  • hooflung - Wednesday, December 03, 2008 - link

    The final frontier market share wise is to steal a major vendor like eVGA. If they can get an eVGA, BFG or XFX to just sell boards with their warranties AMD would be really dominant. Reply
  • JonnyDough - Wednesday, December 03, 2008 - link

    The best thing I've ever read on a tech site. This is why you're better than THG.

    Only one typo! It was a "to" when it should have been a "too."

    Chalk one up for the red team. This makes my appreciation for AMD rise even more. Anyone willing to disclose internal perspectives about the market like this is a team with less secrecy that I will support with my hard earned cash. So many companies could stand up and take a lesson here from this (i.e. Apple, MS).

    Keep articles like this coming, and I'll keep coming back for more.

    Sincerely,

    ~Ryan
    Reply
  • epyon96 - Wednesday, December 03, 2008 - link

    I have been an avid reader of this site for close to 8 years. I used to read almost every CPU, GPU and novelty gadget articles page to page. But over the years, my patience is much lower and I realize I get just as much enjoyment and information from just reading the first page and last page and skimming a few benchmarks.

    However, this is the first article in a while that I spent reading all of it and I thoroughly enjoyed it. These little back stories with a human element in one of the most interesting recent launches provides a refreshing change from boring benchmark-oriented articles.

    I hope to find an article based on Nehpalem of a similar nature and other Intel launches.

    Reply
  • GFC - Wednesday, December 03, 2008 - link

    Wow, all i can say is that i loved this review. It was realy enjoyable to read, and i must give my thanks to Anandtech and Carrell! Reply

Log in

Don't have an account? Sign up now