Inside the Xenos GPU

As previously mentioned, the 48 shaders will be able to run either vertex or pixel shader programs in any given clock cycle. To clarify, each block of 16 shader units is able to run a shader program thread. These shader units will function on a slightly higher than DX9.0c, but in order to take advantage of the technology, ATI and Microsoft will have to customize the API.

In order to get data into the shader units, textures are read from main memory. The eDRAM of the system is unable to assist with texturing. There are 16 bilinear filtered texture samplers. These units are able to read up to 16 textures per clock cycle. The scheduler will need to take great care to organize threads so that optimal use of the texture units are made. Another consideration to take into account is anisotropic filtering. In order to perform filtering at beyond bilinear levels, the texture will need to be run through the texture unit more than once (until the filtering is finished). If no filtering is required (i.e. if a shader program is simple reading stored data), the vetex fetch units can be used (either with a vertex or a pixel shader program).

In the PC space, we are seeing shifts to more and more complex pixel shaders. Large and larger textures are being used in order to supply data, and some predict that texture processing will eclipse color and z bandwidth in the not so distant future. We will have to see if the console and desktop space continue to diverge in this area.

One of the key aspects of performance for the Xbox 360 will be in how well ATI manages threads on their GPU. With the shift to the unified shader architecture, it is even more imperative to make sure that everything is running at maximum efficiency. We don't have many details on ATI's ability to context switch between vertex and pixel shader programs on hardware, but suffice it to say that ATI cannot afford to have any difficulties in managing threads on any level. As making good use of current pixel shader technology requires swapping out threads on shaders, we expect that this will go fairly well in this department. Thread management is likely one of the most difficult things ATI had to work out to make this hardware feasible.

Those who paid close attention to the amount of eDRAM (10MB) will note that this is not enough memory to store the entire framebuffer for displays larger than standard television with 4xAA enabled. Apparently, ATI will store the front buffer in the UMA area, while the back buffer resides on the eDRAM. In order to manage large displays, the hardware will need to render the back buffer in parts. This indicates that they have implemented some sort of very large grained tiling system (with 2 to 4 tiles). Usually tile based renderes have many more tiles than this, but this is a special case.

Performance of this hardware is a very difficult aspect to assess without testing the system. The potential is there for some nice gains over the current high end desktop part, but it is very difficult to know how easily software engineers will be able to functionally use the hardware before they fully understand it and have programmed for it for a while. Certainly, the learning curve won't be as steep as something like the PlayStation 2 was (DirectX is still the API), but knowing what works and what doesn't will take some time.

ATI's Modeling Engine

The adaptability of their hardware is something ATI is touting as well. Their Modeling Engine is really a name for a usage model ATI provides using their unified shaders. As each shader unit is more general purpose than current vertex and pixel shaders, ATI has built the hardware to easily allow the execution of general floating point math.

ATI's Modeling Engine concept is made practical through their vertex cache implementation. Data for general purpose floating point computations moves into the vertex cache in high volumes for processing. The implication here is that the vertex cache has enough storage space and bandwidth to accommodate all 48 shader units without starvation for an extended period of use. If the vertex cache were to be used solely for vertex data, it could be much less forgiving and still offer the same performance (considering common vertex processing loads in current and near term games). As we stated previously, pixel processing (for now) is going to be more resource intensive than vertex processing. Making it possible to fill up the shader units with data from the vertex cache (as opposed to the output of vertex shaders), and the capability of the hardware to dump shader output to main memory is what makes ATI's Modeling Engine possible.

But just pasting a name on general purpose floating point math execution doesn't make it useful. Programmers will have to take advantage of it, and ATI has offered a few ideas on different applications for which the Modeling Engine is suited. Global illumination is an intriguing suggestion, as is tone mapping. ATI also indicates that higher order surfaces could be operated on before tessellation, giving programmers the ability to more fluidly manipulate complex objects. It has even been suggested that physics processing could be done on this part. Of course, we can expect that Xbox 360 programmers will not implement physics engines on the Modeling Engine, but it could be interesting in future parts from ATI.

The Xbox 360 GPU: ATI's Xenos PlayStation 3’s GPU: The NVIDIA RSX
Comments Locked

93 Comments

View All Comments

  • jotch - Friday, June 24, 2005 - link

    #20 well that can't be right for the whole consumer base, as I'm 24 and only know other adults that have consoles and alot of them have flashy tv's for them as well, I do. I think if you look at the market for consoles it is mainly teens and adults that have consoles - not kids. Alot of people I know started with a NES or an Atari 2500, etc and have continued to like games as they have grown up. Why is it that the best selling game has an 18 rating?? (GTA: San Andreas)

    The burning of the screen would be minimal unless you have a game paused for hours and the tv left on - TV technology is moving on and they often turn themselves off if a static image is displayed for an amount of time. So burning shouldn't occur.
  • nserra - Friday, June 24, 2005 - link

    All the people that i know having consoles is kids (80%), and their parents have bought an TV just for the console, an 70€ TV.....

    Who is the parent that will let kids on an LCD or PLASMA (3000€) to play games (burn them).

    Or there will be good 480i "compatibility" in games, or forget it....

    #17 I agree.
  • fitten - Friday, June 24, 2005 - link

    #14 There are a number of issues being discussed.

    For example, given the nature of current AI code, making that code parallel (as in more than one thread executing AI code working together) seems non-trivial. Data dependencies and the very branch heavy code making data dependencies less predictable probably cause headaches here. Sure, one could probably take the simple approach and say one thread for AI, one for physics, one for blah but that has already been discussed by numerous people as a possibility.

    Parallel code comes in many flavors. The parallelism in the graphics card, for instance, is sometimes classified as "embarassingly parallel" which means it's trivial to do. Then there are pipelines (dataflow) which CPUs and GPUs also use. These are usually fairly easy too because the data partitioning is pretty easy. You break out a thread for each overall task that you want to do. You want to do OpA on the data, then OpB, then OpC. All OpB depends on is the output data of OpA and OpC just depends on OpB's final product. Three threads, each one doing an Op on the output of the previous.

    Then there are codes that are quite a bit more complex where, for example, there are numerous threads that all execute on parts of the whole data instead of all of it at once but the solution they are solving for requires many iterations on the data and at the end of each iteration, all the threads exchange data with each other (or just their 'neighbors') so that the next iteration can be performed. These are a bit more work to develop.

    Anyway, I got long-winded anyway. Basically... there are *many* kinds of parallelism and many kinds of algorithms and implementations of parallelism. Some are low hanging fruit and some are non-trivial. Since I've already read that numerous developers for each platform already see low hanging fruit (run one thread for AI, another for physics, etc.) I can only believe they are talking about things that are non-trivial, such as a multithreaded AI engine, for example (again, as opposed to just breaking out the AI engine into one thread seperate from the rest of game play).
  • probedb - Friday, June 24, 2005 - link

    Nice article! I'll wait till they're both out and have a play before I buy either. Last console I bought was an original PlayStation :) But gotta love that hi-def loveliness at last!

    #3 yeah 1080i is interlaced and at such a high res and low refresh the text is really difficult to read, it'd be far better at 1080p I think since that would effectively be the same as 1920x1080 on a normal monitor. 1080i is flickery as hell for me for desktop use but fine for any video and media centre type interfaces on the PC.
  • A5 - Friday, June 24, 2005 - link

    You know, the vast majority of the TVs these systems will be hooked up to will only do 480i (standard TV)...
  • jotch - Friday, June 24, 2005 - link

    #14 - here here!
  • jotch - Friday, June 24, 2005 - link

    #10 - sounds to me like they're way ahead of they're time, future-proofing is good as they'll need another 6 years to develop the PS4 - but the Cell and Xenon will force developers to change their ways and will prepare them for the future of developing on PC's that eventually have this kind of CPU chip design (ref intel's chip design future pic on the first page of the article), like the article says the initial round of games will be single threaded etc etc...

    You might get alot of mediocre games but then you should get ones that really shine bright on the PS3, noticeably Unreal 3 and I bet the Gran Turismo (polyphony) guys will put in the effort.
  • Pannenkoek - Friday, June 24, 2005 - link

    I'm quite tired of hearing how difficult it is to develop a multithreaded game. Only pathetic programmers can not grasp the concept of parallel code execution, it's not as if the current CPU/GPU duality does not qualify as one.
  • knitecrow - Friday, June 24, 2005 - link

    you'll need HDD for online service and MMOP

    how many people are going to buy a $100 HDD if they don't have to?
  • LanceVance - Friday, June 24, 2005 - link

    "the PS3 won’t ship with a hard drive"

    If that's true, then will it be like:

    - PS2 Memory Card; non-included but standard equipment required by all games.
    - PS2 Hard Drive; non-included and considered exotic unusual equipment and used by very few games.

Log in

Don't have an account? Sign up now