Inside the Xenos GPU

As previously mentioned, the 48 shaders will be able to run either vertex or pixel shader programs in any given clock cycle. To clarify, each block of 16 shader units is able to run a shader program thread. These shader units will function on a slightly higher than DX9.0c, but in order to take advantage of the technology, ATI and Microsoft will have to customize the API.

In order to get data into the shader units, textures are read from main memory. The eDRAM of the system is unable to assist with texturing. There are 16 bilinear filtered texture samplers. These units are able to read up to 16 textures per clock cycle. The scheduler will need to take great care to organize threads so that optimal use of the texture units are made. Another consideration to take into account is anisotropic filtering. In order to perform filtering at beyond bilinear levels, the texture will need to be run through the texture unit more than once (until the filtering is finished). If no filtering is required (i.e. if a shader program is simple reading stored data), the vetex fetch units can be used (either with a vertex or a pixel shader program).

In the PC space, we are seeing shifts to more and more complex pixel shaders. Large and larger textures are being used in order to supply data, and some predict that texture processing will eclipse color and z bandwidth in the not so distant future. We will have to see if the console and desktop space continue to diverge in this area.

One of the key aspects of performance for the Xbox 360 will be in how well ATI manages threads on their GPU. With the shift to the unified shader architecture, it is even more imperative to make sure that everything is running at maximum efficiency. We don't have many details on ATI's ability to context switch between vertex and pixel shader programs on hardware, but suffice it to say that ATI cannot afford to have any difficulties in managing threads on any level. As making good use of current pixel shader technology requires swapping out threads on shaders, we expect that this will go fairly well in this department. Thread management is likely one of the most difficult things ATI had to work out to make this hardware feasible.

Those who paid close attention to the amount of eDRAM (10MB) will note that this is not enough memory to store the entire framebuffer for displays larger than standard television with 4xAA enabled. Apparently, ATI will store the front buffer in the UMA area, while the back buffer resides on the eDRAM. In order to manage large displays, the hardware will need to render the back buffer in parts. This indicates that they have implemented some sort of very large grained tiling system (with 2 to 4 tiles). Usually tile based renderes have many more tiles than this, but this is a special case.

Performance of this hardware is a very difficult aspect to assess without testing the system. The potential is there for some nice gains over the current high end desktop part, but it is very difficult to know how easily software engineers will be able to functionally use the hardware before they fully understand it and have programmed for it for a while. Certainly, the learning curve won't be as steep as something like the PlayStation 2 was (DirectX is still the API), but knowing what works and what doesn't will take some time.

ATI's Modeling Engine

The adaptability of their hardware is something ATI is touting as well. Their Modeling Engine is really a name for a usage model ATI provides using their unified shaders. As each shader unit is more general purpose than current vertex and pixel shaders, ATI has built the hardware to easily allow the execution of general floating point math.

ATI's Modeling Engine concept is made practical through their vertex cache implementation. Data for general purpose floating point computations moves into the vertex cache in high volumes for processing. The implication here is that the vertex cache has enough storage space and bandwidth to accommodate all 48 shader units without starvation for an extended period of use. If the vertex cache were to be used solely for vertex data, it could be much less forgiving and still offer the same performance (considering common vertex processing loads in current and near term games). As we stated previously, pixel processing (for now) is going to be more resource intensive than vertex processing. Making it possible to fill up the shader units with data from the vertex cache (as opposed to the output of vertex shaders), and the capability of the hardware to dump shader output to main memory is what makes ATI's Modeling Engine possible.

But just pasting a name on general purpose floating point math execution doesn't make it useful. Programmers will have to take advantage of it, and ATI has offered a few ideas on different applications for which the Modeling Engine is suited. Global illumination is an intriguing suggestion, as is tone mapping. ATI also indicates that higher order surfaces could be operated on before tessellation, giving programmers the ability to more fluidly manipulate complex objects. It has even been suggested that physics processing could be done on this part. Of course, we can expect that Xbox 360 programmers will not implement physics engines on the Modeling Engine, but it could be interesting in future parts from ATI.

The Xbox 360 GPU: ATI's Xenos PlayStation 3’s GPU: The NVIDIA RSX
Comments Locked

93 Comments

View All Comments

  • calimero - Wednesday, July 6, 2005 - link

    http://arstechnica.com/news.ars/post/20050629-5054...

    btw Anand article was "full of shit" (sorry but that is the right phrase) and it's not odd that Anand pull it. It's quite embarassing for Anand; someone already told: one thing is to write test of CPU speed and speed of graphics card in games... and another to analyse CPU architecture.
  • jwix - Tuesday, July 5, 2005 - link

    Creathir - the article was reposted on other forums around the net. Here is the story in summary - Sony & Microsoft have both overhyped the processing power of their cpu's by using clever marketing speak. It turns out the processor designs are uneccessarily complicated, inefficient at crunching today's game code, and unlikely to be useful when game code finally becomes fully multi-threaded in the coming years. Why microsoft and sony didn't go with an Intel or AMD design, I don't know. The article speculates that both companies wanted IP rights to the cpu, maybe that's the reason.
    The GPU's on the other hand look plenty powerful. They should both be relatively equivalent in performance to the R520 and the current 7800 GTX.
    Bottom line - the new consoles will be quite powerful compared to the previous generation. However, PC's will still be more powerful, and wil remain the platform of choice for high end gaming. Something I was glad to read as I just built a new pc.

  • steveyoung123456789 - Friday, December 9, 2011 - link

    wow your so smart! faggit
  • creathir - Saturday, July 2, 2005 - link

    jwix:
    I had read a good portion of the article, but had been pulled away (thought to myself I'll just reread it later) and was upset to find it was gone. I have never seen this here at Anandtech, and Anand has not made a single comment on his blog about it. I suppose some fact was incorrect? Maybe Sony/Microsoft decided they would SUE him over the article? I bet the most logical answer is this, Tim Sweeney saw the article, and even though Anand referenced the "anonymous developer", he had earlier mentioned in his blog he had been waiting for some answers from Tim. I would bet this "outed" his source, much like the LA Times outed their source recently for a Grand Jury. This outing probably was followed by a request by Tim to pull the article. I would have to bet we will see it soon enough, reworked, reworded. Whatever the case, Anand, it was a good article, you should be sure to repost it.
    - Creathir
  • steveyoung123456789 - Friday, December 9, 2011 - link

    o someone can read!! yay!
  • linkgoron - Thursday, June 30, 2005 - link

    blckgrffn, THIS IS NOT i repeat NOT the article you think it is.
  • blckgrffn - Thursday, June 30, 2005 - link

    Yes it is back up! :D

    Nat
  • jwix - Thursday, June 30, 2005 - link

    Last night, around 10:00pm EST, I surfed over to the Anandtech home page to see what was happening. I was greeted by Part II of the article (Xbox 360, Sony PS3 - a hardware discussion). Did anyone else read this article last night. I was only able to read the first 2 pages before the article was pulled off the website. Why would they post it and then pull it so quickly? And why has not been reposted since?
    The story it told was unbelievable - basically, the floating point processing power of both the Sony and Xbox processor was less than half of your average Pentium 4. Anand went into detail on how and why this was the case. His sources apparently were confidential, but definitely industry insiders (ie...game developers). I wish I could have finished reading the article before it was pulled. Did anyone read the whole article?
  • ecoumans - Thursday, June 30, 2005 - link

    Physics Middleware will be Multithreaded and heavily optimized for Cell's 7 SPE's. This makes life easier for gamedevelopers, and it changes the story about CPU usage... Same story for sound etc.
  • Houdani - Tuesday, June 28, 2005 - link

    29: In order to turn off the "sponsored links" go to ABOUT in the top left menu and turn off INTELITEXT.

    I think this setting is stored in a cookie, so you will need to do this everytime you clear your cookies.

Log in

Don't have an account? Sign up now