PlayStation 3’s GPU: The NVIDIA RSX

We’ve mentioned countless times that the PlayStation 3 has the more PC-like GPU out of the two consoles we’re talking about here today, and after this week’s announcement, you now understand why.

The PlayStation 3’s RSX GPU shares the same “parent architecture” as the G70 (GeForce 7800 GTX), much in the same way that the GeForce 6600GT shares the same parent architecture as the GeForce 6800 Ultra.  Sony isn’t ready to unveil exactly what is different between the RSX and the G70, but based on what’s been introduced already, as well as our conversations with NVIDIA, we can gather a few items.

Despite the fact that the RSX comes from the same lineage as the G70, there are a number of changes to the core.  The biggest change is that RSX supports rendering to  both local and system memory, similar to NVIDIA’s Turbo Cache enabled GPUs.  Obviously rendering to/from local memory is going to be a lot lower latency than sending a request to the Cell’s memory controller, so much of the architecture of the GPU has to be changed in order to accommodate this higher latency access to memory.  Buffers and caches have to be made larger to keep the rendering pipelines full despite the higher latency memory access.  If the chip is properly designed to hide this latency, then there is generally no performance sacrifice, only an increase in chip size thanks to the use of larger buffers and caches. 

The RSX only has 60% of the local memory bandwidth of the G70, so in many cases it will most definitely have to share bandwidth with the CPU’s memory bus in order to achieve performance targets. 

There is one peculiarity that hasn’t exactly been resolved, and that is about transistor counts.  Both the G70 and the RSX share the same estimated transistor count, of approximately 300.4 million transistors.  The RSX is built on a 90nm process, so in theory NVIDIA would be able to pack more onto the die without increasing chip size at all - but if the transistor counts are identical, that points to more similarity between the two cores than NVIDIA has led us to believe.  So is the RSX nothing more than the G70?  It’s highly unlikely that the GPUs are identical, especially considering that the sheer addition of Turbo Cache to the part would drive up transistor counts quite a bit.  So how do we explain that the two GPUs are different, yet have the same transistor count and one is supposed to be more powerful than the other?  There are a few possible options.

First and foremost, you have to keep in mind that these are not exact transistor counts - they are estimates.  Transistor count is determined by looking at the number of gates in the design, and multiplying that number by the average number of transistors used per gate.  So the final transistor count won’t be exact, but it will be close enough to reality.  Remember that these chips are computer designed and produced, so it’s not like someone is counting each and every transistor by hand as they go into the chip. 

So it is possible that NVIDIA’s estimates are slightly off for the two GPUs, but at approximately 10 million transistors per pixel pipe, it doesn’t seem very likely that the RSX will feature more than the 24 pixel rendering pipelines of the GeForce 7800 GTX, yet NVIDIA claims it is more powerful than the GeForce 7800 GTX.  But how can that be?  There are a couple of options:

The most likely explanation is attributed to nothing more than clock speed.  Remember that the RSX, being built on a 90nm process, is supposed to be running at 550MHz - a 28% increase in core clock speed from the 110nm GeForce 7800 GTX.  The clock speed increase alone will account for a good boost in GPU performancewhich would make the RSX “more powerful” than the G70. 

There is one other possibility, one that is more far fetched but worth discussing nonetheless.  NVIDIA could offer a chip that featured the same transistor count as the desktop G70, but with significantly more power if the RSX features no vertex shader pipes and instead used that die space to add additional pixel shading hardware. 

Remember that the Cell host processor has an array of 7 SPEs that are very well suited for a number of non-branching tasks, including geometry processing.  Also keep in mind that current games favor creating realism through more pixel operations rather than creating more geometry, so GPUs aren’t very vertex shader bound these days.  Then, note that the RSX has a high bandwidth 35GB/s interface between the Cell processor and the GPU itself - definitely enough to place all vertex processing on the Cell processor itself, freeing up the RSX to exclusively handle pixel shader and ROP tasks.  If this is indeed the case, then the RSX could very well have more than 24 pipelines and still have a similar transistor count to the G70, but if it isn’t, then it is highly unlikely that we’d see a GPU that looked much different than the G70. 

The downside to the RSX using the Cell for all vertex processing is pretty significant.  Remember that the RSX only has a 22.4GB/s link to its local memory bandwidth, which is less than 60% of the memory bandwidth of the GeForce 7800 GTX.  In other words, it needs that additional memory bandwidth from the Cell’s memory controller to be able to handle more texture-bound games.  If a good portion of the 15GB/s downstream link from the Cell processor is used for bandwidth between the Cell’s SPEs and the RSX, the GPU will be texture bandwidth limited in some situations, especially at resolutions as high as 1080p. 

This option is much more far fetched of an explanation, but it is possible, only time will tell what the shipping configuration of the RSX will be. 

Inside the Xenos GPU Will Sony Deliver on 1080p?
Comments Locked

93 Comments

View All Comments

  • BenSkywalker - Sunday, June 26, 2005 - link

    ""One thing is for sure, support for two 1080p outputs in spanning mode (3840 x 1080) on the PS3 is highly unrealistic. At that resolution, the RSX would be required to render over 4 megapixels per frame, without a seriously computation bound game it’s just not going to happen at 60 fps." -- Quote from page 10"

    First off 1080p doesn't support 60FPS as of this moment anyway, and there are an awful lot of games on consoles that aren't remotely close to being GPU bound anyway. Remember that the XBox has titles now that are pushing out 1080i and the RSX is easily far more then four times the speed of the GPU in the XBox.
  • tipoo - Wednesday, August 6, 2014 - link

    "RSX is easily far more then four times the speed of the GPU in the XBox."

    It's funny reading these comments years later, and seeing how crazy the PS3 hype machine was. I assume this insane comment reffered to the 1 terraflop RSX thing, which was a massive joke. RSX was worse than Xenon not only in raw gflops (180 vs over 200 I think), but since it didn't have unified shaders it could be bottlenecked by a scene having too much vertex or pixel effects and leaving shaders underused.
  • calimero - Sunday, June 26, 2005 - link

    Here is one tip about Cell:
    to play MP3 files (stereo) on PC you need 100MHz 486 CPU. Atari Falcon030 with MC68030 (16MHz) and DSP (32MHz) can do same thing!
    Everyone who know to program will find Cell outstanding and thrilling everyone else who pretend to be a programer please continue to waste CPU cycles with your shity code!
  • coolme - Sunday, June 26, 2005 - link

    "Supporting 1080p x2 may seem like overkill,"

    It's not gonna support 1080p x2

    "One thing is for sure, support for two 1080p outputs in spanning mode (3840 x 1080) on the PS3 is highly unrealistic. At that resolution, the RSX would be required to render over 4 megapixels per frame, without a seriously computation bound game it’s just not going to happen at 60 fps." -- Quote from page 10
  • nevermind4711 - Sunday, June 26, 2005 - link

    People have different ways of expressing the frequency of DDRAM. The correct memory frequency of 7800GTX is 256MB/256-bit GDDR3 at 600MHz, but as it is double rate some people say 1200 MHz.

    In the same way you can say the RSX memory is operating at 1400 MHz. How else could 128 bit result in a memory bandwidth of 22 GB/s for the RTX?

    #64 knitecrow, who is your source that the RSX does not contain e-dram, or is it just speculation?

    Besides, your conclusion from extrapolating the transistor count may be correct, but assuming the transistor count is proportional to the number of pixel pipelines is a rather big simplification, there is quite a lot of other stuff inside a GPU as well, stuff that does not scale proportionally to the pixel pipelines.
  • Furen - Sunday, June 26, 2005 - link

    The RSX is supposed to be clocked higher but will only have a 700MHz, 128bit memory bus (as opposed to the 1200MHz, 256bit memory bus on the 7800gtx).
  • knitecrow - Saturday, June 25, 2005 - link

    #61
    too bad you don't speak marketing.
    When they say near.. it means very close. Could be slightly under or over. If it was something like 320M... they will be hyp3ing 320M.


    #62 too bad you are wrong

    with 300M transistors, the RSX is a native 24 pixel pipeline card

    You can extrapolate the number by looking at:
    6800ultra - 16 - 222M
    6600GT - 8 - 144M

    it has no eDRAM.

    The features remain to be seen, but its going to be a G70 derivate -- just like XGPU for the xbox was a geforce3 derivative.

    There is absolutely no evidence to suggest that the RSX is going to be more powerful than 7800GTX.

    Just because a product comes out later doesn't make it better

    Exhibit A:
    Radeon 9700pro vs. 5800ultra

  • Darkon - Saturday, June 25, 2005 - link

    http://www.psinext.com/index.php?categoryid=3&...
  • Dukemaster - Saturday, June 25, 2005 - link

    I think it is very clear why the RSX gpu has the same number of transistors but still is more powerfull then the 7800GTX: the 7800GTX is a chip with 32 pipelines with 8 of them turned off.
  • nevermind4711 - Saturday, June 25, 2005 - link

    Interesting article. However, I find it strange that Anand and Derek do not comment on the difference in floating point capacity between the combatants. 1 TFlops for X360 vs. 2 TFlops for PS3. For X360 we know that the majority of flops come from the GPU, where probably the big part consists of massively paralell compare ops and such coming from the AA- and filtering circuitry integrated with the e-DRAM.
    It would be very interesting to know how the RSX provides 1.8 TFlops. I do not think the G70 has a capacity anything near that. Could it be possible that Sony will bring some e-DRAM to the party together with AA and filtering circuitry similar to X360. After all Sony has quite some experience of e-DRAM from PS2 and PSP.
    Anand and Derek wrote "Both the G70 and the RSX share the same estimated transistor count, of approximately 300.4 million transistors." Where do this information come from? Sony only said in its presentation the RSX will have 300+ mil t:s. G70 we now know contains 302 mil t:s.
    #48: Sony may very well have replaced some video en/de-coding circuitry of the G70 with some e-dram circuitry.

Log in

Don't have an account? Sign up now