PlayStation 3’s GPU: The NVIDIA RSX

We’ve mentioned countless times that the PlayStation 3 has the more PC-like GPU out of the two consoles we’re talking about here today, and after this week’s announcement, you now understand why.

The PlayStation 3’s RSX GPU shares the same “parent architecture” as the G70 (GeForce 7800 GTX), much in the same way that the GeForce 6600GT shares the same parent architecture as the GeForce 6800 Ultra.  Sony isn’t ready to unveil exactly what is different between the RSX and the G70, but based on what’s been introduced already, as well as our conversations with NVIDIA, we can gather a few items.

Despite the fact that the RSX comes from the same lineage as the G70, there are a number of changes to the core.  The biggest change is that RSX supports rendering to  both local and system memory, similar to NVIDIA’s Turbo Cache enabled GPUs.  Obviously rendering to/from local memory is going to be a lot lower latency than sending a request to the Cell’s memory controller, so much of the architecture of the GPU has to be changed in order to accommodate this higher latency access to memory.  Buffers and caches have to be made larger to keep the rendering pipelines full despite the higher latency memory access.  If the chip is properly designed to hide this latency, then there is generally no performance sacrifice, only an increase in chip size thanks to the use of larger buffers and caches. 

The RSX only has 60% of the local memory bandwidth of the G70, so in many cases it will most definitely have to share bandwidth with the CPU’s memory bus in order to achieve performance targets. 

There is one peculiarity that hasn’t exactly been resolved, and that is about transistor counts.  Both the G70 and the RSX share the same estimated transistor count, of approximately 300.4 million transistors.  The RSX is built on a 90nm process, so in theory NVIDIA would be able to pack more onto the die without increasing chip size at all - but if the transistor counts are identical, that points to more similarity between the two cores than NVIDIA has led us to believe.  So is the RSX nothing more than the G70?  It’s highly unlikely that the GPUs are identical, especially considering that the sheer addition of Turbo Cache to the part would drive up transistor counts quite a bit.  So how do we explain that the two GPUs are different, yet have the same transistor count and one is supposed to be more powerful than the other?  There are a few possible options.

First and foremost, you have to keep in mind that these are not exact transistor counts - they are estimates.  Transistor count is determined by looking at the number of gates in the design, and multiplying that number by the average number of transistors used per gate.  So the final transistor count won’t be exact, but it will be close enough to reality.  Remember that these chips are computer designed and produced, so it’s not like someone is counting each and every transistor by hand as they go into the chip. 

So it is possible that NVIDIA’s estimates are slightly off for the two GPUs, but at approximately 10 million transistors per pixel pipe, it doesn’t seem very likely that the RSX will feature more than the 24 pixel rendering pipelines of the GeForce 7800 GTX, yet NVIDIA claims it is more powerful than the GeForce 7800 GTX.  But how can that be?  There are a couple of options:

The most likely explanation is attributed to nothing more than clock speed.  Remember that the RSX, being built on a 90nm process, is supposed to be running at 550MHz - a 28% increase in core clock speed from the 110nm GeForce 7800 GTX.  The clock speed increase alone will account for a good boost in GPU performancewhich would make the RSX “more powerful” than the G70. 

There is one other possibility, one that is more far fetched but worth discussing nonetheless.  NVIDIA could offer a chip that featured the same transistor count as the desktop G70, but with significantly more power if the RSX features no vertex shader pipes and instead used that die space to add additional pixel shading hardware. 

Remember that the Cell host processor has an array of 7 SPEs that are very well suited for a number of non-branching tasks, including geometry processing.  Also keep in mind that current games favor creating realism through more pixel operations rather than creating more geometry, so GPUs aren’t very vertex shader bound these days.  Then, note that the RSX has a high bandwidth 35GB/s interface between the Cell processor and the GPU itself - definitely enough to place all vertex processing on the Cell processor itself, freeing up the RSX to exclusively handle pixel shader and ROP tasks.  If this is indeed the case, then the RSX could very well have more than 24 pipelines and still have a similar transistor count to the G70, but if it isn’t, then it is highly unlikely that we’d see a GPU that looked much different than the G70. 

The downside to the RSX using the Cell for all vertex processing is pretty significant.  Remember that the RSX only has a 22.4GB/s link to its local memory bandwidth, which is less than 60% of the memory bandwidth of the GeForce 7800 GTX.  In other words, it needs that additional memory bandwidth from the Cell’s memory controller to be able to handle more texture-bound games.  If a good portion of the 15GB/s downstream link from the Cell processor is used for bandwidth between the Cell’s SPEs and the RSX, the GPU will be texture bandwidth limited in some situations, especially at resolutions as high as 1080p. 

This option is much more far fetched of an explanation, but it is possible, only time will tell what the shipping configuration of the RSX will be. 

Inside the Xenos GPU Will Sony Deliver on 1080p?
Comments Locked

93 Comments

View All Comments

  • jotch - Friday, June 24, 2005 - link

    #20 well that can't be right for the whole consumer base, as I'm 24 and only know other adults that have consoles and alot of them have flashy tv's for them as well, I do. I think if you look at the market for consoles it is mainly teens and adults that have consoles - not kids. Alot of people I know started with a NES or an Atari 2500, etc and have continued to like games as they have grown up. Why is it that the best selling game has an 18 rating?? (GTA: San Andreas)

    The burning of the screen would be minimal unless you have a game paused for hours and the tv left on - TV technology is moving on and they often turn themselves off if a static image is displayed for an amount of time. So burning shouldn't occur.
  • nserra - Friday, June 24, 2005 - link

    All the people that i know having consoles is kids (80%), and their parents have bought an TV just for the console, an 70€ TV.....

    Who is the parent that will let kids on an LCD or PLASMA (3000€) to play games (burn them).

    Or there will be good 480i "compatibility" in games, or forget it....

    #17 I agree.
  • fitten - Friday, June 24, 2005 - link

    #14 There are a number of issues being discussed.

    For example, given the nature of current AI code, making that code parallel (as in more than one thread executing AI code working together) seems non-trivial. Data dependencies and the very branch heavy code making data dependencies less predictable probably cause headaches here. Sure, one could probably take the simple approach and say one thread for AI, one for physics, one for blah but that has already been discussed by numerous people as a possibility.

    Parallel code comes in many flavors. The parallelism in the graphics card, for instance, is sometimes classified as "embarassingly parallel" which means it's trivial to do. Then there are pipelines (dataflow) which CPUs and GPUs also use. These are usually fairly easy too because the data partitioning is pretty easy. You break out a thread for each overall task that you want to do. You want to do OpA on the data, then OpB, then OpC. All OpB depends on is the output data of OpA and OpC just depends on OpB's final product. Three threads, each one doing an Op on the output of the previous.

    Then there are codes that are quite a bit more complex where, for example, there are numerous threads that all execute on parts of the whole data instead of all of it at once but the solution they are solving for requires many iterations on the data and at the end of each iteration, all the threads exchange data with each other (or just their 'neighbors') so that the next iteration can be performed. These are a bit more work to develop.

    Anyway, I got long-winded anyway. Basically... there are *many* kinds of parallelism and many kinds of algorithms and implementations of parallelism. Some are low hanging fruit and some are non-trivial. Since I've already read that numerous developers for each platform already see low hanging fruit (run one thread for AI, another for physics, etc.) I can only believe they are talking about things that are non-trivial, such as a multithreaded AI engine, for example (again, as opposed to just breaking out the AI engine into one thread seperate from the rest of game play).
  • probedb - Friday, June 24, 2005 - link

    Nice article! I'll wait till they're both out and have a play before I buy either. Last console I bought was an original PlayStation :) But gotta love that hi-def loveliness at last!

    #3 yeah 1080i is interlaced and at such a high res and low refresh the text is really difficult to read, it'd be far better at 1080p I think since that would effectively be the same as 1920x1080 on a normal monitor. 1080i is flickery as hell for me for desktop use but fine for any video and media centre type interfaces on the PC.
  • A5 - Friday, June 24, 2005 - link

    You know, the vast majority of the TVs these systems will be hooked up to will only do 480i (standard TV)...
  • jotch - Friday, June 24, 2005 - link

    #14 - here here!
  • jotch - Friday, June 24, 2005 - link

    #10 - sounds to me like they're way ahead of they're time, future-proofing is good as they'll need another 6 years to develop the PS4 - but the Cell and Xenon will force developers to change their ways and will prepare them for the future of developing on PC's that eventually have this kind of CPU chip design (ref intel's chip design future pic on the first page of the article), like the article says the initial round of games will be single threaded etc etc...

    You might get alot of mediocre games but then you should get ones that really shine bright on the PS3, noticeably Unreal 3 and I bet the Gran Turismo (polyphony) guys will put in the effort.
  • Pannenkoek - Friday, June 24, 2005 - link

    I'm quite tired of hearing how difficult it is to develop a multithreaded game. Only pathetic programmers can not grasp the concept of parallel code execution, it's not as if the current CPU/GPU duality does not qualify as one.
  • knitecrow - Friday, June 24, 2005 - link

    you'll need HDD for online service and MMOP

    how many people are going to buy a $100 HDD if they don't have to?
  • LanceVance - Friday, June 24, 2005 - link

    "the PS3 won’t ship with a hard drive"

    If that's true, then will it be like:

    - PS2 Memory Card; non-included but standard equipment required by all games.
    - PS2 Hard Drive; non-included and considered exotic unusual equipment and used by very few games.

Log in

Don't have an account? Sign up now