Inside the Xenos GPU

As previously mentioned, the 48 shaders will be able to run either vertex or pixel shader programs in any given clock cycle. To clarify, each block of 16 shader units is able to run a shader program thread. These shader units will function on a slightly higher than DX9.0c, but in order to take advantage of the technology, ATI and Microsoft will have to customize the API.

In order to get data into the shader units, textures are read from main memory. The eDRAM of the system is unable to assist with texturing. There are 16 bilinear filtered texture samplers. These units are able to read up to 16 textures per clock cycle. The scheduler will need to take great care to organize threads so that optimal use of the texture units are made. Another consideration to take into account is anisotropic filtering. In order to perform filtering at beyond bilinear levels, the texture will need to be run through the texture unit more than once (until the filtering is finished). If no filtering is required (i.e. if a shader program is simple reading stored data), the vetex fetch units can be used (either with a vertex or a pixel shader program).

In the PC space, we are seeing shifts to more and more complex pixel shaders. Large and larger textures are being used in order to supply data, and some predict that texture processing will eclipse color and z bandwidth in the not so distant future. We will have to see if the console and desktop space continue to diverge in this area.

One of the key aspects of performance for the Xbox 360 will be in how well ATI manages threads on their GPU. With the shift to the unified shader architecture, it is even more imperative to make sure that everything is running at maximum efficiency. We don't have many details on ATI's ability to context switch between vertex and pixel shader programs on hardware, but suffice it to say that ATI cannot afford to have any difficulties in managing threads on any level. As making good use of current pixel shader technology requires swapping out threads on shaders, we expect that this will go fairly well in this department. Thread management is likely one of the most difficult things ATI had to work out to make this hardware feasible.

Those who paid close attention to the amount of eDRAM (10MB) will note that this is not enough memory to store the entire framebuffer for displays larger than standard television with 4xAA enabled. Apparently, ATI will store the front buffer in the UMA area, while the back buffer resides on the eDRAM. In order to manage large displays, the hardware will need to render the back buffer in parts. This indicates that they have implemented some sort of very large grained tiling system (with 2 to 4 tiles). Usually tile based renderes have many more tiles than this, but this is a special case.

Performance of this hardware is a very difficult aspect to assess without testing the system. The potential is there for some nice gains over the current high end desktop part, but it is very difficult to know how easily software engineers will be able to functionally use the hardware before they fully understand it and have programmed for it for a while. Certainly, the learning curve won't be as steep as something like the PlayStation 2 was (DirectX is still the API), but knowing what works and what doesn't will take some time.

ATI's Modeling Engine

The adaptability of their hardware is something ATI is touting as well. Their Modeling Engine is really a name for a usage model ATI provides using their unified shaders. As each shader unit is more general purpose than current vertex and pixel shaders, ATI has built the hardware to easily allow the execution of general floating point math.

ATI's Modeling Engine concept is made practical through their vertex cache implementation. Data for general purpose floating point computations moves into the vertex cache in high volumes for processing. The implication here is that the vertex cache has enough storage space and bandwidth to accommodate all 48 shader units without starvation for an extended period of use. If the vertex cache were to be used solely for vertex data, it could be much less forgiving and still offer the same performance (considering common vertex processing loads in current and near term games). As we stated previously, pixel processing (for now) is going to be more resource intensive than vertex processing. Making it possible to fill up the shader units with data from the vertex cache (as opposed to the output of vertex shaders), and the capability of the hardware to dump shader output to main memory is what makes ATI's Modeling Engine possible.

But just pasting a name on general purpose floating point math execution doesn't make it useful. Programmers will have to take advantage of it, and ATI has offered a few ideas on different applications for which the Modeling Engine is suited. Global illumination is an intriguing suggestion, as is tone mapping. ATI also indicates that higher order surfaces could be operated on before tessellation, giving programmers the ability to more fluidly manipulate complex objects. It has even been suggested that physics processing could be done on this part. Of course, we can expect that Xbox 360 programmers will not implement physics engines on the Modeling Engine, but it could be interesting in future parts from ATI.

The Xbox 360 GPU: ATI's Xenos PlayStation 3’s GPU: The NVIDIA RSX
Comments Locked

93 Comments

View All Comments

  • MDme - Friday, June 24, 2005 - link

    now i know what to buy :)
  • SuperStrokey - Friday, June 24, 2005 - link

    lol, thats funny
  • bldckstark - Friday, June 24, 2005 - link

    Having a PS2 and an XBOX I was not even thinking about buying a PS3 since the XBOX kicks the PS2's ace. (IMHO). After reading this article I have much more respect for the PS3 and now I don't have any idea which onw I will buy. My wife may force me to buy the PS3 if the 360 isn't as backward compatible as most want it to be.

    Maybe I will just use my unusually large brain to create a PS360 that will play everything. Oooh, wait, I gotta get a big brain first. Then a big p3nis. Or maybe just a normal one.
  • Furen - Friday, June 24, 2005 - link

    #37: supposedly yes. Since it will have to be through hardcore emulation there will be issues (but of course). It wont be fully transparent like the ps2 but rather you'll have profiles saved on your harddrive which will tell the system how to run the games.
  • SuperStrokey - Friday, June 24, 2005 - link

    I havnt been following the 360 too much (im a self admitted nintendo fanboy), but will it be backward compatible too? I heard it was still up in the air but as PS3 is going to be and revolution is going to be (bigtime) i would assume that 360 will be too right?
  • ZobarStyl - Friday, June 24, 2005 - link

    #32 is right: how many games get released for all 3 console with only minor, subtle differences between them? Most of the time, first party stuff is the only major difference between consoles. Very few 3rd party games are held back from the 'slower' consoles; most are just licensing deals (GTA:SA on PS2, for example). And if you look back, of the first party games lineup, XBox didn't have the most compelling of libraries, in my opinion.
  • yacoub - Friday, June 24, 2005 - link

    imo, the revolution will be a loser in more than just hardware. i can't remember the last time i actually wanted to play any of the exclusive nintendo games. actually, i think for about one day i considered a gamecube for metroid but then i saw it in action at a friend's place and was underwhelmed by the gameplay. forget mario and link, give me splinter cell or gran tourismo or forza or... yeah you get the idea.
  • nserra - Friday, June 24, 2005 - link

    #27

    If you read the article carefully, you will see that since they are "weaker" pipelines, the 48 will perform like 24 "complete" ones.

    I think with this Ati new design, there will be games where the performance will be much better, equal or worst.
    But that’s the price to pay for complete new designs.

    On paper Ati design is much more advance, in fact reminds the VOODOO2 design where there are more than one chip doing things. I think I prefer some very fancy graphics design over a double all easy solution.
  • Taracta - Friday, June 24, 2005 - link

    With 25.5 Gbs of bandwith to memory, is OoO (Out Of Order processing) necessary? Isn't OoO and its ilk bandwith hiding solutions? I have an issue with regards to Anandtech outlook on the SPPs of the CELL processor (I could be wrong). I consider the SPPs to be full fledge Vector Processors and not just fancy implementation of MMX, SSE, Altivec etc, which seems to be Anandtech's outlook. As full fledge Vector Processors they are orders of magnitude more flexible than that and as Vector Processors comparing them to Scalar Processors is erroneous.

    Another thing, RISC won the war! Don't believe, what do you call a processor with a RISC core with a CISC hardware translator around it? CISC? I think not, it's a RISC processor. x86 did win the procesor war but not by beating them but joining them and by extension CISC loss. Just needed to clear that up. The x86 instruction set won but the old x86 CISC architecture loss. The x86 insrtuction set will always win, fortunately for AMD because the Itanium was to have been their death. No way could they have copied the Itanium in this day and age which come to think of it is very unfortunate.

    From you have the processor the runs x86 the best you will always win. Unless you can get a toehold in the market with something else such as LINUX and CELL!
  • CuriousMike - Friday, June 24, 2005 - link

    If it's a 3rd party game, it won't matter (greatly) which platform you pick, because developers will develop to the least-common-denominator.

    In the current generation, about the best one could hope for is slightly higher-res textures and better framerate on XBOX over ps2/gc.

    IMO, pick your platform based on first-party games/series you're looking forward to. Simple as that.

Log in

Don't have an account? Sign up now