Hardware Behind the Consoles - Part II: Nintendo's GameCubeby Anand Lal Shimpi on December 7, 2001 3:44 AM EST
- Posted in
Embedded DRAM in Flipper
On the Flipper side of things, using NEC's 0.18-micron embedded DRAM manufacturing process, MoSys' 1T-SRAM is used on Flipper's die to provide two very large caches: a 2MB Z-buffer and a 1MB texture cache. This is cheaper than outfitting the chip with 3MB of SRAM which would rival most server CPUs in terms of die space and cost (GameCube would not be as successful if Nintendo lost $500 per console nor would it be successful if they charged $700 per console either) and it's theoretically faster than conventional embedded DRAM for the aforementioned benefits of 1T-SRAM.
The Flipper GPU is composed of 51 million transistors, approximately half of which are dedicated to this on-die 1T-SRAM. If Flipper were to use conventional SRAM it would feature over 170 million transistors and have a die much larger than both of the Xbox chips put together. The decision to use 1T-SRAM instead of conventional SRAM was necessary in order to outfit Flipper with this much memory.
The 2MB Z-buffer/frame buffer is extremely helpful since we already know from our experimentation with HyperZ and deferred rendering architectures that Z-buffer accesses are very memory bandwidth intensive. This on-die Z-buffer completely removes all of those accesses from hogging the limited amount of main memory bandwidth the Flipper GPU is granted. In terms of specifics, there are 4 1T-SRAM devices that make up this 2MB. There is a 96-bit wide interface to each one of these devices offering a total of 7.8GB/s of bandwidth which rivals the highest end Radeon 8500 and GeForce3 Ti 500 in terms of how much bandwidth is available to the Z-buffer. Z-buffer checks should occur very quickly on the Flipper GPU as a result of this very fast 1T-SRAM. Also, the current surface being drawn is stored in this 2MB buffer and then later sent off to external memory for display. Because of this, dependency on bandwidth to main memory is reduced.
The 1MB texture cache helps texture load performance but the impact isn't nearly as big as the 2MB Z-buffer. There are 32 1T-SRAM devices (256Kbit each) that each has their own 16-bit bus offering 10.4GB/s of bandwidth to this cache.
The first thing that should tip you off about these 1T-SRAM devices on the Flipper die is that they would come quite in handy on a PC platform. Although the Flipper GPU will never be asked to render at greater than 640 x 480 (not a very memory bandwidth intensive resolution), very few gamers will settle for anything less than 1024 x 768 with today's graphics cards. A similar 2MB on-die Z-buffer would improve performance tremendously, especially considering how much more memory bandwidth is consumed in most PC games. While it would be nice for ATI to consider the use of some of this style of technology in their future PC products, the cost would be highly prohibitive.
The potential for Flipper to become cheaper to produce as time goes on is also there. NEC is currently in production of their 0.15-micron embedded DRAM process but it is not as mature as their 0.18-micron eDRAM production which is why Flipper is currently produced on that. By the second half of next year the 0.13-micron eDRAM process should be ready for production which means that we should be able to see 0.13-micron Flipper GPUs produced in 2003. The move to a 0.13-micron process could cut the 106 mm^2 Flipper die in half, making it much cheaper to produce but that is all dependent on NEC.