The Payoff: How RV740 Saved Cypress

For its first 40nm GPU, ATI chose the biggest die that made sense in its roadmap. That was the RV740 (Radeon HD 4770):


The first to 40nm - The ATI Radeon HD 4770, April 2009

NVIDIA however picked a smaller die. While the RV740 was a 137mm2 GPU, NVIDIA’s first 40nm parts were the G210 and GT220 which measured 57mm2 and 100mm2. The G210 and GT220 were OEM-only for the first months of their life, and I’m guessing the G210 made up a good percentage of those orders. Note that it wasn’t until the release of the GeForce GT 240 that NVIDIA made a 40nm die equal in size to the RV740. The GT 240 came out in November 2009, while the Radeon HD 4770 (RV740) debuted in April 2009 - 7 months earlier.


NVIDIA's first 40nm GPUs shipped in July 2009

When it came time for both ATI and NVIDIA to move their high performance GPUs to 40nm, ATI had more experience and exposure to the big die problems with TSMC’s process.

David Wang, ATI’s VP of Graphics Engineering at the time, had concerns about TSMC’s 40nm process that he voiced to Carrell early on in the RV740 design process. David was worried that the metal handling in the fabrication process might lead to via quality issues. Vias are tiny connections between the different metal layers on a chip, and the thinking was that the via failure rate at 40nm was high enough to impact the yield of the process. Even if the vias wouldn’t fail completely, the quality of the via would degrade the signal going through the via.

The second cause for concern with TSMC’s 40nm process was about variation in transistor dimensions. There are thousands of dimensions in semiconductor design that you have to worry about. And as with any sort of manufacturing, there’s variance in many if not all of those dimensions from chip to chip. David was particularly worried about manufacturing variation in transistor channel length. He was worried that the tolerances ATI were given might not be met.


A standard CMOS transistor. Its dimensions are usually known to fairly tight tolerances.

TSMC led ATI to believe that the variation in channel length was going to be relatively small. Carrell and crew were nervous, but there’s nothing that could be done.

The problem with vias was easy (but costly) to get around. David Wang decided to double up on vias with the RV740. At any point in the design where there was a via that connected two metal layers, the RV740 called for two. It made the chip bigger, but it’s better than having chips that wouldn’t work. The issue of channel length variation however, had no immediate solution - it was a worry of theirs, but perhaps an irrational fear.

TSMC went off to fab the initial RV740s. When the chips came back, they were running hotter than ATI expected them to run. They were also leaking more current than ATI expected.

Engineering went to work, tearing the chips apart, looking at them one by one. It didn’t take long to figure out that transistor channel length varied much more than the initial tolerance specs. If you get a certain degree of channel length variance some parts will run slower than expected, while others would leak tons of current.

Engineering eventually figured a way to fix most of the leakage problem through some changes to the RV740 design. The performance was still a problem and the RV740 was mostly lost as a product because of the length of time it took to fix all of this stuff. But it served a much larger role within ATI. It was the pipe cleaner product that paved the way for Cypress and the rest of the Evergreen line.

As for how all of this applies to NVIDIA, it’s impossible to say for sure. But the rumors all seem to support that NVIDIA simply didn’t have the 40nm experience that ATI did. Last December NVIDIA spoke out against TSMC and called for nearly zero via defects.

The rumors surrounding Fermi also point at the same problems ATI encountered with the RV740. Low yields, the chips run hotter than expected, and the clock speeds are lower than their original targets. Granted we haven’t seen any GF100s ship yet, so we don’t know any of it for sure.

When I asked why it was so late with Fermi/GF100, NVIDIA pointed to parts of the architecture - not manufacturing. Of course, I was talking to an architect at the time. If Fermi/GF100 was indeed NVIDIA’s learning experience for TSMC’s 40nm I’d expect that its successor would go much smoother.

It’s not that TSMC doesn’t know how to run a foundry, but perhaps the company made a bigger jump than it should have with the move to 40nm:

Process 150nm 130nm 110nm 90nm 80nm 65nm 55nm 40nm
Linear Scaling - 0.866 0.846 0.818 0.888 0.812 0.846 0.727

 

You’ll remember that during the Cypress discussion, Carrell was convinced that TSMC’s 40nm process wouldn’t be as cheap as it was being positioned as. Yet very few others, whether at ATI or NVIDIA, seemed to believe the same. I asked Carrell why that was, why he was able to know what many others didn’t.

Carrell chalked it up to experience and recounted a bunch of stuff that I can’t publish here. Needless to say, he was more skeptical of TSMC’s ability to deliver what it was promising at 40nm. And it never hurts to have a pragmatic skeptic on board.

Process vs. Architecture: The Difference Between ATI and NVIDIA Preventing Espionage at AMD: How The Eyefinity Project Came to Be
POST A COMMENT

132 Comments

View All Comments

  • Stas - Sunday, February 14, 2010 - link

    Awesome. Thanks! Reply
  • Adul - Sunday, February 14, 2010 - link

    Really helps pass the time at work today. :) Keep it up.

    Btw when can we expect to see the new site launch?
    Reply
  • aapocketz - Sunday, February 14, 2010 - link

    [quote]I was convinced that ATI had embraced a new, refocused approach to GPU design, only to learn that they nearly threw out all of the learnings with the RV870. [/quote]

    It sounds like they have had some successes trying different techniques, but without stability in their production process it is hard to repeat success. I understand that they require constant innovation to stay competitive, but throwing out whole processes seems chaotic to me. I would like them to refine and improve successful processes rather than toss everything every time a new business guru is in charge.

    Also half of their effort was all about openness and collaboration. The opening up of the PRS document so that "everyone was involved in the process" seems to clash with the hyper-secret groups where "AMD has since incorporated much of Carrell’s brand of information compartmentalization into how it handled other upcoming features." This seems like a recipe for disaster to me. Which is it, broad openness, collaboration and consesus; or secret teams that have no idea what the other teams are doing?
    Reply
  • SuperGee - Sunday, February 14, 2010 - link

    The story told us that they didn't know it will become a succes because these desision where made before RV770 release. So doing the high risk choice again. Wasn't a nobrainer. But a risky choice. We know that it turn out good now. Reply
  • mckirkus - Sunday, February 14, 2010 - link

    It's kind of funny that you're not yet running one of these companies yet Anand.

    One of the reasons I check this site on a daily basis is because you also seem to also get the business side of the equation. It's downright refreshing to see someone bridging that gap. You pretty much saved the Vertex from self destruction. I'd like to see what interesting things you could build us if you put your mind to it.
    Reply
  • deputc26 - Sunday, February 14, 2010 - link

    Articles like these are what differentiate AnandTech from all the other sites out there. AnandTech goes from being one of the best review sites out there to something special.

    Beyond excellent, thanks Anand.
    Reply
  • rickyv - Sunday, February 14, 2010 - link

    As a loyal follower of your website for the past 15 years, I also felt that I just had to register and compliment you on an excellent article.

    With the rapid advancement of technology, it is very easy just to get caught up in the PR and marketing hype or focus only on the numbers game. We often lose sight of the fact that it is teams of dedicated people who make this possible. You have always had the ability to bring out the "human" side to this. I have not seen this on any other site nor in printed form (that is not an unashamedly PR marketing exercise).

    Thanks for staying true to your roots by giving honest opinions of the technology that you review. The latest releases are not necessarily always the greatest (as much as the marketing departments would like us to believe :-) )
    Reply
  • krish123 - Sunday, February 14, 2010 - link

    After i read the article, I found that "Engine is running well and firing on all the cylinders", It can create better products in the future, I can trust and buy ATI products, hope they deliver better products in the future for my upgrade.

    "Kudos to Anand for the excellent article".

    By the way graphics card is a product, not just hardware, it has to work in tandem with the software, its better ATI put some more effort on the driver/software side and fix all the issues.

    Krish
    Reply
  • smartalec - Sunday, February 14, 2010 - link

    "When companies like AMD and NVIDIA do a product the engineers don't know all of the answers, and the knowledge they do have isn't binary - it's probability, it's weight, it's guesses. Sometimes they guess right, and sometimes they guess very wrong. The best they can do is to all weigh in with their individual experiences and together come up with the best group of guesses to implement."

    I'm afraid this is how all engineering works. Project managers think that engineers can predict the future. That we know exactly how much time it'll take, and how much risk a given feature will bring.

    We don't. There's a lot of educated guesses. Sometimes we're pleasantly surprised that what we thought was a tough problem wasn't. Sometimes we're the bearer of bad news-- something we assumed would be trivial wasn't.

    My most frustrating issues aren't technical at all. I ask for 2000 hours, and are given 1000. Or are given 2 engineers instead of 3, and told-- figure out a way to get it done anyway, without impacting schedule.

    Great article Anand.
    Reply
  • mckirkus - Sunday, February 14, 2010 - link

    Any decent project manager (I do software) reviews the risks up front with the engineers before building the project plan / timeline.

    The fact that most project managers don't really understand the products they manage is the rule not the exception. The problem is that great engineers don't always make great PMs. Having a good tech lead helps.
    Reply

Log in

Don't have an account? Sign up now