Embedded DRAM in Flipper

On the Flipper side of things, using NEC's 0.18-micron embedded DRAM manufacturing process, MoSys' 1T-SRAM is used on Flipper's die to provide two very large caches: a 2MB Z-buffer and a 1MB texture cache. This is cheaper than outfitting the chip with 3MB of SRAM which would rival most server CPUs in terms of die space and cost (GameCube would not be as successful if Nintendo lost $500 per console nor would it be successful if they charged $700 per console either) and it's theoretically faster than conventional embedded DRAM for the aforementioned benefits of 1T-SRAM.

The Flipper GPU is composed of 51 million transistors, approximately half of which are dedicated to this on-die 1T-SRAM. If Flipper were to use conventional SRAM it would feature over 170 million transistors and have a die much larger than both of the Xbox chips put together. The decision to use 1T-SRAM instead of conventional SRAM was necessary in order to outfit Flipper with this much memory.

The 2MB Z-buffer/frame buffer is extremely helpful since we already know from our experimentation with HyperZ and deferred rendering architectures that Z-buffer accesses are very memory bandwidth intensive. This on-die Z-buffer completely removes all of those accesses from hogging the limited amount of main memory bandwidth the Flipper GPU is granted. In terms of specifics, there are 4 1T-SRAM devices that make up this 2MB. There is a 96-bit wide interface to each one of these devices offering a total of 7.8GB/s of bandwidth which rivals the highest end Radeon 8500 and GeForce3 Ti 500 in terms of how much bandwidth is available to the Z-buffer. Z-buffer checks should occur very quickly on the Flipper GPU as a result of this very fast 1T-SRAM. Also, the current surface being drawn is stored in this 2MB buffer and then later sent off to external memory for display. Because of this, dependency on bandwidth to main memory is reduced.

The 1MB texture cache helps texture load performance but the impact isn't nearly as big as the 2MB Z-buffer. There are 32 1T-SRAM devices (256Kbit each) that each has their own 16-bit bus offering 10.4GB/s of bandwidth to this cache.

The first thing that should tip you off about these 1T-SRAM devices on the Flipper die is that they would come quite in handy on a PC platform. Although the Flipper GPU will never be asked to render at greater than 640 x 480 (not a very memory bandwidth intensive resolution), very few gamers will settle for anything less than 1024 x 768 with today's graphics cards. A similar 2MB on-die Z-buffer would improve performance tremendously, especially considering how much more memory bandwidth is consumed in most PC games. While it would be nice for ATI to consider the use of some of this style of technology in their future PC products, the cost would be highly prohibitive.

The potential for Flipper to become cheaper to produce as time goes on is also there. NEC is currently in production of their 0.15-micron embedded DRAM process but it is not as mature as their 0.18-micron eDRAM production which is why Flipper is currently produced on that. By the second half of next year the 0.13-micron eDRAM process should be ready for production which means that we should be able to see 0.13-micron Flipper GPUs produced in 2003. The move to a 0.13-micron process could cut the 106 mm^2 Flipper die in half, making it much cheaper to produce but that is all dependent on NEC.

If cache is so fast, then why isn't everything made out of it? 1T-SRAM outside of Flipper
Comments Locked

6 Comments

View All Comments

  • cubeguy2k5 - Monday, December 20, 2004 - link

    feel that anandtechs article on xbox vs ps2 vs gamecube didnt go in depth enough, guessed at too many things, and intentionally got others wrong, not sure where to discuss this at, would like to get a thread going.....

    "However details on this processor are sketchy at best but the information we've been able to gather points at a relatively unmodified PowerPC 750CXe microprocessor " - where did they gather this from? gekko isnt a PPC 750CXE or it would be marked as such.

    "The Flipper graphics core is a fairly simple fixed function GPU aided by some very powerful amounts of memory bandwidth, but first onto the architecture of the graphics core. Flipper always operates on 4 pixels at a time using its 4 pixel pipelines; each of those pipelines is capable of applying one texture per pipeline which immediately tips you off that the ArtX design wasn't influenced by ATI at all. Since the Radeon and GeForce2, both ATI and NVIDIA's cores have been able to process a minimum of two textures per pixel in each of their pipelines which came quite in handy since none of today's games are single textured anymore." - who told them that gamecube only has one texture unit per pipeline? it wasnt nintendo, i could just as easily say it has 2, doubling texel bandwidth....... who said it was fixed function?

    "Planet GameCube: In a recent IGNinsider article, Greg Buchner revealed that Flipper can do some unique things because of the ways that the different texture layers can interact. Can you elaborate on this feature? Have you used it? Do you know if the effects it allows are reproducible on other architectures (at decent framerates)?

    Julian Eggebrecht: He was probably referring to the TEV pipeline. Imagine it like an elaborate switchboard that makes the wildest combinations of textures and materials possible. The TEV pipeline combines up to 8 textures in up to 16 stages in one go. Each stage can apply a multitude of functions to the texture - obvious examples of what you do with the TEV stages would be bump-mapping or cel-shading. The TEV pipeline is completely under programmer control, so the more time you spend on writing elaborate shaders for it, the more effects you can achieve. We just used the obvious effects in Rogue Leader with the targeting computer and the volumetric fog variations being the most unusual usage of TEV. In a second generation game we’ll obviously focus on more complicated applications."

    The TEV pipeline is completely under programmer control, so the more time you spend on writing elaborate shaders for it, the more effects you can achieve. COMPLETELY UNDER PROGRAMMER CONTROL MEANS NOT FIXED FUNCTION, and on fixed function GPUs you cannot do advanced shader effects in realtime can you? rogue leader and rebel strike use them EXTENSIVELY.... anandtech.... wheres your explanation?

    ill provide more examples later....



    "Julian Eggebrecht: Maybe without going into too much detail, we don’t think there is anything visually you could do on X-Box (or PS2) which can’t be done on GameCube. I have read theories on the net about Flipper not being able to do cube-mapped environment maps, fur shading, self-shadowing etc... That’s all plain wrong. Rogue does extensive self-shadowing and both cube-maps and fur shading are not anymore complicated to implement on GameCube than on X-Box. You might be doing it differently, but the results are the same. When I said that X-Box and GameCube are on par power-wise I really meant it. " looks like a PROVEN DEVELOPER just proved anandtech is WRONG... nice..... factor5 was involved in the creation of cube, they know it better than ANYONE else, including anandtech....


    come on anandtech, i know you see this article... what about this?

    you clearly state that you believe xbox is ageneration ahead of gamecube technically, when you COULD NOT do any of the shader effects nor the amount of bumpmapping thats in rogue leader even, on a pre GF3 GPU, let alone rebel strike..... what about the water effects in rebel strike, mario sunshine, waverace, i do believe that in 2001, not one game had water even on pc, even CLOSE to waverace in terms of how it looked, and the physics behind it, and in 2002 there wasnt one game close to mario sunshine as far as water goes, wow!..... what about all the nice fully dynamic lighting in RE4, and rebel strike? you couldnt pull that off on a fixed function gpu could you? apparently they cant even pull it off on xbox, when halo2 has massive slowdown, mostly static lighting, an abysmal polygon count, coupled with lod pop in, and various other problems/faked effects.... nice, what about ninja gaiden ? same story, good character models, very bad textures, non existant lighting, shadows that seem to react to non existant lightsources that exist inside of walls..... cute.....

    http://www.geocities.com/cube_guy_2k5/ng3.jpg

    nice textures and lack of lighting... low polycount and invisible lightsources that seem to only allow ryu to cast shadows, not the environment, wow.... what bout the faked reflections used in the game?... neat
  • Cooe - Tuesday, August 18, 2020 - link

    The fanboy delusions are strong with this one...
  • Arkz - Saturday, September 17, 2011 - link

    "the other incorrectly labeled digital AV (it's still an analog signal) for component connections."

    wrong, its purely digital. the component cable has a DAC chip in the connector block. technically they could make a DVI cable for it.
  • Arkz - Saturday, September 17, 2011 - link

    and gc cpu is 485 not 500
  • ogamespec - Thursday, August 8, 2013 - link

    Actually Gekko speed is 486 ( 162 x 3) MHz.

    And Gamecube GPU (Flipper) TEV is fixed stage. No custom shaders.
  • techFan1988 - Wednesday, May 4, 2022 - link

    Mmmm I understand that now we have much better information than back then, but I find this piece of the article a bit skewed towards the Xbox (or against the GC).
    There are a couple of aspects that are factually wrong, for example:
    "However from all of that data that we have seen comparing the PowerPC 750 to even the desktop Intel Celeron processor, it does not seem that the Gekko can compete, performance-wise."

    The original PowerPC 750 didn't even have on-die L2 cache, so saying "it doesn't compete with a Celeron coppermine processor" is absolutely unfair (it would be like comparing the first versions of the P3 -the ones running at 500Mhz- with the Coppermine ones).

    To grab the original PPC 750 and compare it to a coppermine celeron 128 (the ones based on the P3 architecture and the one feeding the Xbox -although with a faster bus which was comparable to that of a regular P3) is not a fair comparison.

    At least, since this was a modification of the PPC750 CXe (and not the original PPC750) the author of the article should have compared that CPU to the Celeron and not the original PPC 750.

    I mean, the difference between P3 first gen and P3 coppermine was even bigger than the difference between P2 and P3 just because of the integrated L2 caché!
    How could this factor be ignored when comparing GC's and Xbox's CPUs?

Log in

Don't have an account? Sign up now