And I thought the reason people bought consoles was to avoid dealing with the hardware nitty-gritty.

Xbox 360 Revision CPU GPU eDRAM
Xenon/Zephyr 90nm 90nm 90nm
Falcon/Opus 65nm 80nm 80nm
Jasper 65nm 65nm 80nm

First let's get the codenames right. The first Xbox 360 was released in 2005 and used a motherboard codenamed Xenon. The Xenon platform featured a 90nm Xenon CPU (clever naming there), a 90nm Xenos GPU and a 90nm eDRAM. Microsoft added HDMI support to Xenon and called it Zephyr, the big three chips were still all 90nm designs.


From top to bottom: Jasper (Arcade so no chrome on the DVD drive), Xenon, Falcon, Xenon. Can you tell them apart? I'll show you how.

In 2007 the 2nd generation Xbox 360 came out, codenamed Falcon. Falcon featured a 65nm CPU, 80nm GPU and 80nm eDRAM. Falcon came with HDMI by default, but Microsoft eventually made a revision without HDMI called Opus (opus we built a console that fails a lot! sorry, couldn't resist).

Finally, after much speculation, the 3rd generation Xbox 360 started popping up in stores right around the holiday buying season and it's called Jasper. Jasper keeps the same 65nm CPU from Falcon/Opus, but shrinks the GPU down to 65nm as well. The eDRAM remains at 80nm.

When Xenon came out, we bought one and took it apart. The same for Falcon, and naturally, the same for Jasper. The stakes are a lot higher with Jasper however; this may very well be the Xbox 360 to get, not only is it a lot cooler and cheaper for Microsoft to manufacture, but it may finally solve the 360's biggest issue to date.

A Cure for the Red Ring of Death?

The infamous Red Ring of Death (RRoD) has plagued Microsoft since the launch of the Xbox 360. The symptoms are pretty simple: you go to turn on your console and three of the four lights in a circle on your Xbox 360 turn red. I've personally had it happen to two consoles and every single one of my friends who has owned a 360 for longer than a year has had to send it in at least once. By no means is this the largest sample size, but it's a problem that impacts enough Xbox 360 owners for it to be a real issue.

While Microsoft has yet to publicly state the root cause of the problem, we finally have a Microsoft that's willing to admit that its consoles had an unacceptably high rate of failure in the field. The Microsoft solution was to extend all Xbox 360 warranties for the RRoD to 3 years, a solution that managed to help most users but not all.

No one ever got to the bottom of what caused the RRoD. Many suspected that it was the lead-free solder balls between the CPU and/or GPU and the motherboard losing contact. The clamps that Microsoft used to attach the heatsinks to the CPU and GPU put a lot of pressure on the chips; it's possible that the combination of the lead-free solder, a lot of heat from the GPU, inadequate cooling and the heatsink clamps resulted in the RRoD. The CPU and/or GPU would get very hot, the solder would either begin to melt or otherwise dislodge, resulting in a bad connection and an irrecoverable failure. That's where the infamous "towel trick" came into play, wrap your console in a towel so its internals heat up a lot and potentially reseat the misbehaving solder balls.


The glue between the GPU and the motherboard started appearing with the Falcon revision

With the Falcon revision Microsoft seemed to admit to this as being a problem by putting glue between the CPU/GPU and the motherboard itself, presumably to keep the chips in place should the solder weaken. We all suspected that Falcon might reduce the likelihood of the RRoD because shrinking the CPU down to 65nm and the GPU down to 80nm would reduce power consumption, thermal output and hopefully put less stress on the solder balls - if that was indeed the problem. Unfortunately with Falcon Microsoft didn't appear to eliminate RRoD, although anecdotally it seemed to have gotten better.

How About Some Wild Speculation?

This year NVIDIA fell victim to its own set of GPU failures resulting in a smaller-scale replacement strategy than what Microsoft had to implement with the Xbox 360. The NVIDIA GPU problem was well documented by Charlie over at The Inquirer, but in short the issue here was the solder bumps between the GPU die and the GPU package substrate (whereas the problem I just finished explaining is between the GPU package substrate and the motherboard itself).


The anatomy of a GPU, the Falcon Xbox 360 addressed a failure in the solder balls, but perhaps the problem resides in the bumps between the die and substrate?

Traditionally GPUs had used high-lead bumps between the GPU die and the chip package, these bumps can carry a lot of current but are quite rigid, and rigid materials tend to break in a high stress environment. Unlike the bumps between the GPU package and a motherboard (or video card PCB), the solder bumps between a GPU die and the GPU package are connecting two different materials, each with its own rate of thermal expansion. The GPU die itself gets hotter much quicker than the GPU package, which puts additional stress on the bumps themselves. The type of stress also mattered, while simply maintaining high temperatures for a period of time provided one sort of stress, power cycling the GPUs provided a different one entirely - one that eventually resulted in these bumps, and the GPU as a whole, failing.

The GPU failures ended up being most pronounced in notebooks because of the usage model. With notebooks the number of times you turn them on and off in a day is much greater than a desktop, which puts a unique type of thermal stress on the aforementioned solder bumps, causing the sorts of failures that plagued NVIDIA GPUs.

In 2005, ATI switched from high-lead bumps (90% lead, 10% tin) to eutectic bumps (37% lead, 63% tin). These eutectic bumps can't carry as much current as high-lead bumps, they have a lower melting point but most importantly, they are not as rigid as high-lead bumps. So in those high stress situations caused by many power cycles, they don't crack, and thus you don't get the same GPU failure rates in notebooks as you do with NVIDIA hardware.

What does all of this have to do with the Xbox 360 and its RRoD problems? Although ATI made the switch to eutectic bumps with its GPUs in 2005, Microsoft was in charge of manufacturing the Xenos GPU and it was still built with high-lead bumps, just like the failed NVIDIA GPUs. Granted NVIDIA's GPUs back in 2005 and 2006 didn't have these problems, but the Microsoft Xenos design was a bit ahead of its time. It is possible, although difficult to prove given the lack of publicly available documentation, that a similar problem to what plagued NVIDIA's GPUs also plagued the Xbox 360's GPU.

If this is indeed true, then it would mean that the RRoD failures would be caused by the number of power cycles (number of times you turn the box on and off) and not just heat alone. It's a temperature and materials problem, one that (if true) would eventually affect all consoles. It would also mean that in order to solve the problem Microsoft would have to switch to eutectic bumps, similar to what ATI did back in 2005, which would require fairly major changes to the GPU in order to fix. ATI's eutectic designs actually required an additional metal layer, meaning a new spin of the silicon, something that would have to be reserved for a fairly major GPU change.

With Falcon, the GPU definitely got smaller - the new die was around 85% the size of the old die. I surmised that the slight reduction in die size corresponded to either a further optimized GPU design (it's possible to get more area-efficient at the same process node) or a half-node shrink to 80nm; the latter seemed most likely. If Falcon truly only brought a move to 80nm, chances are that Microsoft didn't have enough time to truly re-work the design to include a move to eutectic bumps, they would most likely save that for the transition to 65nm.

Which brings us to Jasper today, a noticeably smaller GPU die thanks to the move to 65nm and a potentially complete fix to the dreaded RRoD. There are a lot of assumptions being made here and it's just as likely that none of this is correct, but given that Falcon and its glue-supported substrates didn't solve RRoD I'm wondering if part of the problem was actually not correctable without a significant redesign of the GPU, something I'm guessing had to happen with the move to 65nm anyways.

It took about a year for RRoD to really hit a critical mass with Xenon and it's only now been about a year for Falcon, so only time will tell if Jasper owners suffer the same fate. One thing is for sure, if it's a GPU design flaw, then Jasper was Microsoft's chance to correct it. And if it's a heat issue, Jasper should reduce the likelihood as well.

Who knows, after three years of production you may finally be able to buy an Xbox 360 that won't die on you.

Identifying a Jasper
POST A COMMENT

82 Comments

View All Comments

  • adhoc - Wednesday, December 10, 2008 - link

    I think the glue is probably there to minimize stress fractures between the PCB and BGA chips which bend at different rates from the high-force heatsink. When large (both in area and weight) heatsinks are tightly fastened to PCBs (like many PC motherboards and I think the Xbox360), it causes the PCB to bend. I would suspect the glue is used as a propping mechanism so that when the PCB flexes, the BGA chip does as well starting at the glued edges rather than the solder points of the BGAs. It also might have an effect of "broadening" the flex of the PCB under the BGA so that the surface of the PCB won't be so concave between pins. Reply
  • Staples - Wednesday, December 10, 2008 - link

    I really think the whole solder compound rumor is a myth. If it were that simple, then I am sure MS could have saved themselves a ton of money by using a different compound. It may cost a dollar more per system but it would save them 50$ a console since they would not have to replace 50% of them. Sure the solder joints may break but I doubt that it is because of it being made of the wrong material.

    And about the lead free solder, if that is what they are really using (which they should), it is possibly more brittle and could break under stress.
    Reply
  • The0ne - Wednesday, December 10, 2008 - link

    Typically it's not the compound but more so due to bad soldering/contact. We've recently had similar issues with a few of our PCBs and it had me chasing down the cause down to the compound as well. As it turned out their SMT was fully capable one way or another where we had lifted pins, cold joints, reverse components, etc. These data didn't really showed until we started gathering more and more data to prove it. Reply
  • sprockkets - Wednesday, December 10, 2008 - link

    Isn't it true now that all solder has to be lead free? Reply
  • UltraWide - Wednesday, December 10, 2008 - link

    Only if you want to be RoHS compliant. Reply
  • Zorlac - Wednesday, December 10, 2008 - link

    I think you meant "RRoD Compliant" ;) Reply
  • sprockkets - Wednesday, December 10, 2008 - link

    But now you cannot sell anything that is not RoHS compliant in the EU. Reply
  • sprockkets - Wednesday, December 10, 2008 - link

    OK, AMD's presentation explained it. Reply
  • strikeback03 - Wednesday, December 10, 2008 - link

    In the US we would say the number 12.1 as "Twelve point one". In places where comma and period usage are switched, how would 12,1 be spoken? Reply
  • Spoelie - Thursday, December 11, 2008 - link

    12,1
    "twaalf komma een"
    just as you say "point", we say "comma"
    Reply

Log in

Don't have an account? Sign up now