Xenon vs. Cell

The first public game demo on the PlayStation 3 was Epic Games’ Unreal Engine 3 at Sony’s PS3 press conference.  Tim Sweeney, the founder and UE3 father of Epic, performed the demo and helped shed some light on how multi-threading can work on the PlayStation 3.

According to Tim, a lot of things aren’t appropriate for SPE acceleration in UE3, mainly high-level game logic, artificial intelligence and scripting.  But he adds that “Fortunately these comprise a small percentage of total CPU time on a traditional single-threaded architecture, so dedicating the CPU to those tasks is appropriate, while the SPE's and GPU do their thing." 

So what does Tim Sweeney see the SPEs being used for in UE3?  "With UE3, our focus on SPE acceleration is on physics, animation updates, particle systems, sound; a few other areas are possible but require more experimentation."

Tim’s view on the PPE/SPE split in Cell is far more balanced than most we’ve encountered.  There are many who see the SPEs as utterly useless for executing anything (we’ll get to why in a moment), while there are others who have been talking about doing far too much on SPEs where the general purpose PPE would do much better. 

For the most part, the areas that UE3 uses the Cell’s SPEs for are fairly believable.  For example, sound processing makes a lot of sense for the SPEs given their rather specialized architecture aimed at streaming tasks.  But the one curious item is the focus on using SPEs to accelerate physics calculations, especially given how branch heavy physics calculations generally are. 

Collision detection is a big part of what is commonly referred to as “game physics.”  As the name implies, collision detection simply refers to the game engine determining when two objects collide.  Without collision detection, bullets would never hit your opponents and your character would be able to walk through walls, cars, etc... among other things.

One method of implementing collision detection in a game is through the use of a Binary Search Partitioning (BSP) tree.  BSP trees are created by organizing lists of polygons into a binary tree.  The structure of the tree itself doesn’t matter to this discussion, but the important thing to keep in mind is that to traverse a BSP tree in order to test for a collision between some object and a polygon in the tree you have to perform a lot of comparisons.  You first traverse the tree finding to find the polygon you want to test for a collision against.  Then you have to perform a number of checks to see whether a collision has occurred between the object you’re comparing and the polygon itself.  This process involves a lot of conditional branching, code which likes to be run on a high performance OoO core with a very good branch predictor. 

Unfortunately, the SPEs have no branch prediction, so BSP tree traversal will tie up an SPE for quite a bit of time while not performing very well as each branch condition has to be evaluated before execution can continue.  However it is possible to structure collision detection for execution on the SPEs, but it would require a different approach to the collision detection algorithms than what would be normally implemented on a PC or Xbox 360.

We’re still working on providing examples of how it is actually done, but it’s tough getting access to detailed information at this stage given that a number of NDAs are still in place involving Cell development for the PS3.  Regardless of how it is done, obviously the Epic team found the SPEs to be a good match for their physics code, if structured properly, meaning that the Cell processor isn’t just one general purpose core with 7 others that go unused. 

In fact, if properly structured and coded for SPE acceleration, physics code could very well run faster on the PlayStation 3 than on the Xbox 360 thanks to the more specialized nature of the SPE hardware.  Not to mention that physics acceleration is particularly parallelizable, making it a perfect match for an array of 7 SPEs. 

Microsoft has referred to the Cell’s array of SPEs as a bunch of DSPs useless to game developers.  The fact that the next installment of the Unreal engine will be using the Cell’s SPEs for physics, animation updates, particle systems as well as audio processing means that Microsoft’s definition is a bit off.  While not all developers will follow in Epic’s footsteps, those that wish to remain competitive and get good performance out of the PS3 will have to.

The bottom line is that Sony would not foolishly spend over 75% of their CPU die budget on SPEs to use them for nothing more than fancy DSPs.  Architecting a game engine around Cell and optimizing for SPE acceleration will take more effort than developing for the Xbox 360 or PC, but it can be done.  The question then becomes, will developers do it? 

In Johan’s Quest for More Processing Power series he looked at the developmental limitations of multi-threading, especially as they applied to games.  The end result is that multi-threaded game development takes between 2 and 3 times longer than conventional single-threaded game development, to add additional time in order to restructure elements of your engine to get better performance on the PS3 isn’t going to make the transition any easier on developers. 

Introducing the Xbox 360’s Xenon CPU Does In-Order Matter?
POST A COMMENT

93 Comments

View All Comments

  • BenSkywalker - Sunday, June 26, 2005 - link

    ""One thing is for sure, support for two 1080p outputs in spanning mode (3840 x 1080) on the PS3 is highly unrealistic. At that resolution, the RSX would be required to render over 4 megapixels per frame, without a seriously computation bound game it’s just not going to happen at 60 fps." -- Quote from page 10"

    First off 1080p doesn't support 60FPS as of this moment anyway, and there are an awful lot of games on consoles that aren't remotely close to being GPU bound anyway. Remember that the XBox has titles now that are pushing out 1080i and the RSX is easily far more then four times the speed of the GPU in the XBox.
    Reply
  • tipoo - Wednesday, August 6, 2014 - link

    "RSX is easily far more then four times the speed of the GPU in the XBox."

    It's funny reading these comments years later, and seeing how crazy the PS3 hype machine was. I assume this insane comment reffered to the 1 terraflop RSX thing, which was a massive joke. RSX was worse than Xenon not only in raw gflops (180 vs over 200 I think), but since it didn't have unified shaders it could be bottlenecked by a scene having too much vertex or pixel effects and leaving shaders underused.
    Reply
  • calimero - Sunday, June 26, 2005 - link

    Here is one tip about Cell:
    to play MP3 files (stereo) on PC you need 100MHz 486 CPU. Atari Falcon030 with MC68030 (16MHz) and DSP (32MHz) can do same thing!
    Everyone who know to program will find Cell outstanding and thrilling everyone else who pretend to be a programer please continue to waste CPU cycles with your shity code!
    Reply
  • coolme - Sunday, June 26, 2005 - link

    "Supporting 1080p x2 may seem like overkill,"

    It's not gonna support 1080p x2

    "One thing is for sure, support for two 1080p outputs in spanning mode (3840 x 1080) on the PS3 is highly unrealistic. At that resolution, the RSX would be required to render over 4 megapixels per frame, without a seriously computation bound game it’s just not going to happen at 60 fps." -- Quote from page 10
    Reply
  • nevermind4711 - Sunday, June 26, 2005 - link

    People have different ways of expressing the frequency of DDRAM. The correct memory frequency of 7800GTX is 256MB/256-bit GDDR3 at 600MHz, but as it is double rate some people say 1200 MHz.

    In the same way you can say the RSX memory is operating at 1400 MHz. How else could 128 bit result in a memory bandwidth of 22 GB/s for the RTX?

    #64 knitecrow, who is your source that the RSX does not contain e-dram, or is it just speculation?

    Besides, your conclusion from extrapolating the transistor count may be correct, but assuming the transistor count is proportional to the number of pixel pipelines is a rather big simplification, there is quite a lot of other stuff inside a GPU as well, stuff that does not scale proportionally to the pixel pipelines.
    Reply
  • Furen - Sunday, June 26, 2005 - link

    The RSX is supposed to be clocked higher but will only have a 700MHz, 128bit memory bus (as opposed to the 1200MHz, 256bit memory bus on the 7800gtx). Reply
  • knitecrow - Saturday, June 25, 2005 - link

    #61
    too bad you don't speak marketing.
    When they say near.. it means very close. Could be slightly under or over. If it was something like 320M... they will be hyp3ing 320M.


    #62 too bad you are wrong

    with 300M transistors, the RSX is a native 24 pixel pipeline card

    You can extrapolate the number by looking at:
    6800ultra - 16 - 222M
    6600GT - 8 - 144M

    it has no eDRAM.

    The features remain to be seen, but its going to be a G70 derivate -- just like XGPU for the xbox was a geforce3 derivative.

    There is absolutely no evidence to suggest that the RSX is going to be more powerful than 7800GTX.

    Just because a product comes out later doesn't make it better

    Exhibit A:
    Radeon 9700pro vs. 5800ultra

    Reply
  • Darkon - Saturday, June 25, 2005 - link

    http://www.psinext.com/index.php?categoryid=3&... Reply
  • Dukemaster - Saturday, June 25, 2005 - link

    I think it is very clear why the RSX gpu has the same number of transistors but still is more powerfull then the 7800GTX: the 7800GTX is a chip with 32 pipelines with 8 of them turned off. Reply
  • nevermind4711 - Saturday, June 25, 2005 - link

    Interesting article. However, I find it strange that Anand and Derek do not comment on the difference in floating point capacity between the combatants. 1 TFlops for X360 vs. 2 TFlops for PS3. For X360 we know that the majority of flops come from the GPU, where probably the big part consists of massively paralell compare ops and such coming from the AA- and filtering circuitry integrated with the e-DRAM.
    It would be very interesting to know how the RSX provides 1.8 TFlops. I do not think the G70 has a capacity anything near that. Could it be possible that Sony will bring some e-DRAM to the party together with AA and filtering circuitry similar to X360. After all Sony has quite some experience of e-DRAM from PS2 and PSP.
    Anand and Derek wrote "Both the G70 and the RSX share the same estimated transistor count, of approximately 300.4 million transistors." Where do this information come from? Sony only said in its presentation the RSX will have 300+ mil t:s. G70 we now know contains 302 mil t:s.
    #48: Sony may very well have replaced some video en/de-coding circuitry of the G70 with some e-dram circuitry.
    Reply

Log in

Don't have an account? Sign up now