Hardware Behind the Consoles - Part II: Nintendo's GameCube

Name: Hardware Behind the Consoles - Part II: Nintendo's GameCube
Item: Hardware Behind the Consoles - Part II: Nintendo's GameCube
Author: Anand Lal Shimpi

by Anand Lal Shimpi on December 7, 2001 3:44 AM EST

Posted in
Systems

6 Comments | Add A Comment

6 Comments

"If cache is so fast, then why isn't everything made out of it?"

One of the most interesting things about the GameCube design is its focus on memory bandwidth efficiency. It attains this efficiency through the use of a special type of memory known as 1T-SRAM that offers lower latency operation and higher overall bus utilization than conventional DRAM. But before you understand exactly what that is, you have to look at the differences between conventional DRAM and SRAM.

The cache on the die of the Gekko CPU or any other CPU for that matter is a type of RAM known as Static RAM or SRAM. The prefix static comes from the fact that unlike DRAM (Dynamic Random Access Memory), SRAM cells do not have to be constantly "refreshed" in order to retain their data (since DRAM is capacitance based, it loses its charge after a while requiring a refresh of that charge in order to retain its data). One of the reasons DRAM is so much slower than SRAM is because of this constant refreshing process. It turns out that when reading the contents of a DRAM cell, the cell is actually refreshed making the most common way of refreshing DRAM cells to actually read the contents of the cell.

This is perfectly fine except for when the contents of the cells being refreshed are being read from or written to. SRAM avoids this by using a combination of usually 4 to 6 transistors to statically hold the data being stored in the memory. DRAM on the other hand only uses a single transistor in combination with a capacitor to hold data; the introduction of the capacitor greatly reduces the die size of DRAM cells thus making them cheaper to manufacturer but also introduces the problem of refreshing as we mentioned above.

Here you can see the problem with conventional SRAM being used in mass quantities since you can get multiple times the amount of SRAM out of DRAM at the same cost. The cost of SRAM prohibits it from being used as a main memory solution, but it makes perfect sense for use in small amounts such as in a cache.

A company by the name of Monolithic System Technology, Inc. (MoSys) came up with a clever design for DRAM that give it many of the performance benefits of SRAM without incurring a huge cost penalty.

The technology that has garnered all of the attention for MoSys is what they like to call 1T-SRAM. The name implies that they have been able to produce SRAM using only a single transistor (1T) instead of the 6 transistors that are much more common. The reality of the situation is that 1T-SRAM is much more like a special form of DRAM than it is like SRAM. The reason being that 1T-SRAM still requires its memory cells to be refreshed in order to retain their data, the only difference being in its very efficient method of refreshing those cells. According to MoSys, their 1T-SRAM design can hide the refresh process quite effectively to the point where they can claim latency and bandwidth figures that would rival those of conventional SRAM (although not surpass). Obviously it's very difficult to test since there have been very few cases where 1T-SRAM has been used in a testable platform, but it's clear that the technology does allow for lower latency accesses and higher memory bandwidth utilization. But at what cost?

MoSys claims that a 64Mbit 1T-SRAM has a die that is 10 - 15% larger than a 64Mbit SDRAM. While that may not seem like much, do keep in mind that a 64Mbit RDRAM device is 15 - 30% larger than the same 64Mbit SDRAM. This would put the additional cost in terms of die size of 1T-SRAM equal to anywhere between 1/3 and 1/1 of the added cost of RDRAM (production cost excluding license royalties) over SDRAM. However, 1T-SRAM is still cheaper than regular SRAM again because of the fact that it is manufactured using a single transistor vs. 6 for most SRAM designs.

The performance aspects of 1T-SRAM are very difficult to quantify because we've never seen it on a benchmarkable platform making the assessment of its value equally difficult. Needless to say that we didn't present you with this explanation for no reason, as Nintendo saw it fit to make heavy use of MoSys' 1T-SRAM in their GameCube design.

A glimpse into ATI's future? Embedded DRAM in Flipper

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

6 Comments

View All Comments

cubeguy2k5 - Monday, December 20, 2004 - link
feel that anandtechs article on xbox vs ps2 vs gamecube didnt go in depth enough, guessed at too many things, and intentionally got others wrong, not sure where to discuss this at, would like to get a thread going.....

"However details on this processor are sketchy at best but the information we've been able to gather points at a relatively unmodified PowerPC 750CXe microprocessor " - where did they gather this from? gekko isnt a PPC 750CXE or it would be marked as such.

"The Flipper graphics core is a fairly simple fixed function GPU aided by some very powerful amounts of memory bandwidth, but first onto the architecture of the graphics core. Flipper always operates on 4 pixels at a time using its 4 pixel pipelines; each of those pipelines is capable of applying one texture per pipeline which immediately tips you off that the ArtX design wasn't influenced by ATI at all. Since the Radeon and GeForce2, both ATI and NVIDIA's cores have been able to process a minimum of two textures per pixel in each of their pipelines which came quite in handy since none of today's games are single textured anymore." - who told them that gamecube only has one texture unit per pipeline? it wasnt nintendo, i could just as easily say it has 2, doubling texel bandwidth....... who said it was fixed function?

"Planet GameCube: In a recent IGNinsider article, Greg Buchner revealed that Flipper can do some unique things because of the ways that the different texture layers can interact. Can you elaborate on this feature? Have you used it? Do you know if the effects it allows are reproducible on other architectures (at decent framerates)?

Julian Eggebrecht: He was probably referring to the TEV pipeline. Imagine it like an elaborate switchboard that makes the wildest combinations of textures and materials possible. The TEV pipeline combines up to 8 textures in up to 16 stages in one go. Each stage can apply a multitude of functions to the texture - obvious examples of what you do with the TEV stages would be bump-mapping or cel-shading. The TEV pipeline is completely under programmer control, so the more time you spend on writing elaborate shaders for it, the more effects you can achieve. We just used the obvious effects in Rogue Leader with the targeting computer and the volumetric fog variations being the most unusual usage of TEV. In a second generation game we’ll obviously focus on more complicated applications."

The TEV pipeline is completely under programmer control, so the more time you spend on writing elaborate shaders for it, the more effects you can achieve. COMPLETELY UNDER PROGRAMMER CONTROL MEANS NOT FIXED FUNCTION, and on fixed function GPUs you cannot do advanced shader effects in realtime can you? rogue leader and rebel strike use them EXTENSIVELY.... anandtech.... wheres your explanation?

ill provide more examples later....

"Julian Eggebrecht: Maybe without going into too much detail, we don’t think there is anything visually you could do on X-Box (or PS2) which can’t be done on GameCube. I have read theories on the net about Flipper not being able to do cube-mapped environment maps, fur shading, self-shadowing etc... That’s all plain wrong. Rogue does extensive self-shadowing and both cube-maps and fur shading are not anymore complicated to implement on GameCube than on X-Box. You might be doing it differently, but the results are the same. When I said that X-Box and GameCube are on par power-wise I really meant it. " looks like a PROVEN DEVELOPER just proved anandtech is WRONG... nice..... factor5 was involved in the creation of cube, they know it better than ANYONE else, including anandtech....

come on anandtech, i know you see this article... what about this?

you clearly state that you believe xbox is ageneration ahead of gamecube technically, when you COULD NOT do any of the shader effects nor the amount of bumpmapping thats in rogue leader even, on a pre GF3 GPU, let alone rebel strike..... what about the water effects in rebel strike, mario sunshine, waverace, i do believe that in 2001, not one game had water even on pc, even CLOSE to waverace in terms of how it looked, and the physics behind it, and in 2002 there wasnt one game close to mario sunshine as far as water goes, wow!..... what about all the nice fully dynamic lighting in RE4, and rebel strike? you couldnt pull that off on a fixed function gpu could you? apparently they cant even pull it off on xbox, when halo2 has massive slowdown, mostly static lighting, an abysmal polygon count, coupled with lod pop in, and various other problems/faked effects.... nice, what about ninja gaiden ? same story, good character models, very bad textures, non existant lighting, shadows that seem to react to non existant lightsources that exist inside of walls..... cute.....

http://www.geocities.com/cube_guy_2k5/ng3.jpg

nice textures and lack of lighting... low polycount and invisible lightsources that seem to only allow ryu to cast shadows, not the environment, wow.... what bout the faked reflections used in the game?... neat
Cooe - Tuesday, August 18, 2020 - link
The fanboy delusions are strong with this one...
Arkz - Saturday, September 17, 2011 - link
"the other incorrectly labeled digital AV (it's still an analog signal) for component connections."

wrong, its purely digital. the component cable has a DAC chip in the connector block. technically they could make a DVI cable for it.
Arkz - Saturday, September 17, 2011 - link
and gc cpu is 485 not 500
ogamespec - Thursday, August 8, 2013 - link
Actually Gekko speed is 486 ( 162 x 3) MHz.

And Gamecube GPU (Flipper) TEV is fixed stage. No custom shaders.
techFan1988 - Wednesday, May 4, 2022 - link
Mmmm I understand that now we have much better information than back then, but I find this piece of the article a bit skewed towards the Xbox (or against the GC).
There are a couple of aspects that are factually wrong, for example:
"However from all of that data that we have seen comparing the PowerPC 750 to even the desktop Intel Celeron processor, it does not seem that the Gekko can compete, performance-wise."

The original PowerPC 750 didn't even have on-die L2 cache, so saying "it doesn't compete with a Celeron coppermine processor" is absolutely unfair (it would be like comparing the first versions of the P3 -the ones running at 500Mhz- with the Coppermine ones).

To grab the original PPC 750 and compare it to a coppermine celeron 128 (the ones based on the P3 architecture and the one feeding the Xbox -although with a faster bus which was comparable to that of a regular P3) is not a fair comparison.

At least, since this was a modification of the PPC750 CXe (and not the original PPC750) the author of the article should have compared that CPU to the Celeron and not the original PPC 750.

I mean, the difference between P3 first gen and P3 coppermine was even bigger than the difference between P2 and P3 just because of the integrated L2 caché!
How could this factor be ignored when comparing GC's and Xbox's CPUs?

Hardware Behind the Consoles - Part II: Nintendo's GameCube

"If cache is so fast, then why isn't everything made out of it?"

Post Your Comment

6 Comments

View All Comments

cubeguy2k5 - Monday, December 20, 2004 - link

Cooe - Tuesday, August 18, 2020 - link

Arkz - Saturday, September 17, 2011 - link

Arkz - Saturday, September 17, 2011 - link

ogamespec - Thursday, August 8, 2013 - link

techFan1988 - Wednesday, May 4, 2022 - link

Log in

Don't have an account? Sign up now