The Xbox 360 GPU: ATI's Xenos

On a purely hardware level, ATI's Xbox 360 GPU (codenamed Xenos) is quite interesting. The part itself is made up of two physically distinct silicon ICs. One IC is the GPU itself, which houses all the shader hardware and most of the processing power. The second IC (which ATI refers to as the "daughter die") is a 10MB block of embedded DRAM (eDRAM) combined with the hardware necessary for z and stencil operations, color and alpha processing, and anti aliasing. This daughter die is connected to the GPU proper via a 32GB/sec interconnect. Data sent over this bus will be compressed, so usable bandwidth will be higher than 32GB/sec. In side the daughter die, between the processing hardware and the eDRAM itself, bandwidth is 256GB/sec.

At this point in time, much of the bandwidth generated by graphics hardware is required to handle color and z data moving to the framebuffer. ATI hopes to eliminate this as a bottleneck by moving this processing and the back framebuffer off the main memory bus. The bus to main memory is 512MB of 128-bit 700MHz GDDR3 (which results in just over 22GB/sec of bandwidth). This is less bandwidth than current desktop graphics cards have available, but by offloading work and bandwidth for color and z to the daughter die, ATI saves themselves a good deal of bandwidth. The 22GB/sec is left for textures and the rest of the system (the Xbox implements a single pool of unified memory).

The GPU essentially acts as the Northbridge for the system, and sits in the middle of everything. From the graphics hardware, there is 10.8GB/sec of bandwidth up and down to the CPU itself. The rest of the system is hooked in with 500MB/sec of bandwidth up and down. The high bandwidth to the CPU is quite useful as the GPU is able to directly read from the L2 cache. In the console world, the CPU and GPU are quite tightly linked and the Xbox 360 stands to continue that tradition.

Weighing in at 332M transistors, the Xbox 360 GPU is quite a powerful part, but its architecture differs from that of current desktop graphics hardware. For years, vertex and pixel shader hardware have been implemented separately, but ATI has sought to combine their functionality in a unified shader architecture.

What's A Unified Shader Architecture?

The GPU in the Xbox 360 uses a different architecture than we are used to seeing. To be sure, vertex and pixel shader programs will run on the part, but not on separate segments of the hardware. Vertex and pixel processing differ in purpose, but there is quite a bit of overlap in the type of hardware needed to do both. The unified shader architecture that ATI chose to use in their Xbox 360 GPU allows them to pack more functionality onto fewer transistors as less hardware needs to be duplicated for use in different parts of the chip and will run both vertex and shader programs on the same hardware.

There are 3 parallel groups of 16 shader units each. Each of the three groups can either operate on vertex or pixel data. Each shader unit is able to perform one 4 wide vector operation and 1 scalar operation per clock cycle. Current ATI hardware is able to perform two 3 wide vector and two scalar operations per cycle in the pixel pipe alone. The vertex pipeline of R420 is 6 wide and can do one vector 4 and one scalar op per cycle. If we look at straight up processing power, this gives R420 the ability to crunch 158 components (30 of which are 32bit and 128 are limited to 24bit precision). The Xbox GPU is able to crunch 240 32bit components in its shader units per clock cycle. Where this is a 51% increase in the number of ops that can be done per cycle (as well as a general increase in precision), we can't expect these 48 piplines to act like 3 sets of R420 pipelines. All things being equal, this increase (when only looking at ops/cycle) would be only as powerful as a 24 piped R420.

What will make or break the difference between something like a 24 piped R420 and the unified shaders of the Xbox GPU is how well applications will lend themselves to the adaptive nature of the hardware. Current configurations don't have nearly the same vertex processing power as they do pixel processing power. This is quite logical when we consider the fact that games have many more pixels displayed than vertices. For each geometry primitive, there are likely a good number of pixels involved. Of course, not all titles will need the same ratio of geometry to pixel power. This means that all the ops per clock could either be dedicated to geometry processing in truly polygon intense scenes. On the flip side (and more likely), any given clock cycle could see all 240 ops being used for pixel processing. If game designers realize this and code their shaders accordingly, we could see much more focused processing power dedicated to a single type of problem than on current hardware.

ATI is predicting that developers will use lots of very small triangles in Xbox 360 games. As engines like Epic's Unreal Engine 3 have shown incredible results using pixel shaders and normal maps to augment low geometric detail, we can't tell if ATI is trying to provide the chicken or the egg. In other words, will we see many small triangles on Xbox 360 because console developers are moving in that direction or because that is what will run well on ATI's hardware?

Regardless of the paths that lead to this road, it is obvious that the Xbox 360 will be a geometry power house. Not only are all 3 blocks of 16 shaders able to become vertex shaders, but ATI's GPU will be able to handle twice as many z operations if a z only pass is performed. The same is true of current ATI and NVIDIA hardware, but the fact that a geometry only pass can now make use of shader hardware to perform 48 vector and 48 scalar operations in any given clock cycle while doing twice the z operations is quite intriguing. This could allow some very geometrically complicated scenes.

How Many Threads? Inside the Xenos GPU


View All Comments

  • LanceVance - Friday, June 24, 2005 - link

    Excellent article. Definitely the most thorough, informative, well researched article on the PS3/Xbox360.

    And most importantly, unlike every other article on the subject, it's not strongly biased toward one camp while making comments of substance.
  • yacoub - Friday, June 24, 2005 - link

    I bet the PS3 debuts at a higher price.

    Also regarding statements made on the Conclusionary page:

    --"That being said, it won’t be impossible to get the same level of performance out of the PS3, it will just take more work. In fact, specialized hardware can be significantly faster than general purpose hardware at certain tasks, giving the PS3 the potential to outperform the Xbox 360 in CPU tasks. It has yet to be seen how much work is required to truly exploit that potential however, and it will definitely be a while before we can truly answer that question."--

    I find it funny that once again the PlayStation will be the harder system to code games for that take full advantage of its abilities. If trends mimic the past (as they often do) this will lead to a large amount of mediocre games by companies too small to afford the dev time necessary to take real advantage of the PS3's advantages or on deadlines too tight to spend the time doing more.
  • Furen - Friday, June 24, 2005 - link

    It does sound pretty low but (I'm guessing) it's more than enough, I dont think they would have separated the dies unless it didnt lead to a big performance penalty. also, I'm guessing that the 256MB/sec bandwidth between the eDRAM and its processing hardware is 256GB/sec? Microsoft was using that number to inflate their "system bandwidth" total. Reply
  • Woodchuck2000 - Friday, June 24, 2005 - link

    And for that matter, 32Mb/s inter-die communications in the Xenos GPU seems low to me
    Good article though guys!
  • Furen - Friday, June 24, 2005 - link

    Is there any word on the media center extender capabilities on the xbox 360? I think Microsoft mentioned something about that but I'm not sure if that was oficial or not. Just hope they allow us to plug in some video capture device and use it as a dvr eventually.

    As much as I like sony's playstation, I find it quite boring on the technical side. It seems like they're just throwing everything they can into it but nothing is really that exciting, or useful. Come on, dual-HDMI. I dont see myself having two HDTVs in such close proximity to each other. Gigabit router? Seems like they're desperate to use the extra cpu muscle. I wonder how heavy ethernet traffic will affect cpu usage.
  • Woodchuck2000 - Friday, June 24, 2005 - link

    Surely porting between multi-core PC software and Xenon should be fairly trivial, not fairly Non-trivial as stated in the article...? Reply
  • jotch - Friday, June 24, 2005 - link

    I stands for interlaced whilst the P stands for progressive scan. Check out the difference at


    This should resolve this issue.
  • AnnihilatorX - Friday, June 24, 2005 - link

    1080i = 720p doesn't it? 1080p is the one Xbox 360 doesn't support.

    These "i"s and "p"s are confusing me
  • sprockkets - Friday, June 24, 2005 - link

    How is 1080i on your tv's? On my 1 year old Mitsubishi native 1080i tv using dvi from the computer at 1080i is basically useless since the text is too small and the image looks like the refresh rate is below 60hz, whereas HDTV broadcasts look fine. Using the other mode of 720x480 looked great.

    Will HD output from a console be any better than a video card in a computer? Is it just my tv?

    Cmon, did you really think nVidia would release something far more advanced for a console than for a video card, or perhaps, more specifically, having it way outperform 6800 ultras in sli?

    If you need around a 400w power supply for even non sli setup, what kind of heat and power will these new consoles need anyhow???

    Of course I am more interested in how the PS3 will work with Linux more than games hahahahaha, since Sony officially mentioned it.
  • emmap - Sunday, December 4, 2005 - link

    And that's this article, Sony and M$ have missed:

    it's not the number of megapixels, shader pipelines, CPU / GPU bandwidth, multithreaded or single threaded code which do a great game. It's imagination put in the game, gameplay, artistic art quality, human feeling we get looking at the characters, fun and so on. It's not only mathematics and physics: we don't love a game because it has X millions polygons or run at Y fps, no it's totally different. Just see all the mame fans out there, you'll see that they don't care about the obsolete hardware the game they are playing on, they care about the most important thing about game: ENTERTAINMENT!

Log in

Don't have an account? Sign up now