Why In-Order?

Ever since the Pentium Pro, desktop PC microprocessors have implemented Out of Order (OoO) execution architectures in order to improve performance.  We’ve explained the idea in great detail before, but the idea is that an Out-of-Order microprocessor can reorganize its instruction stream in order to best utilize its execution resources.  Despite the simplicity of its explanation, implementing support for OoO dramatically increases the complexity of a microprocessor, as well as drives up power consumption. 

In a perfect world, you could group a bunch of OoO cores on a single die and offer both excellent single threaded performance, as well as great multi-threaded performance.  However, the world isn’t so perfect, and there are limitations to how big a processor’s die can be.  Intel and AMD can only fit two of their OoO cores on a 90nm die, yet the Xbox 360 and PlayStation 3 targeted 3 and 9 cores, respectively, on a 90nm die; clearly something has to give, and that something happened to be the complexity of each individual core. 

Given a game console’s 5 year expected lifespan, the decision was made (by both MS and Sony) to favor a multi-core platform over a faster single-core CPU in order to remain competitive towards the latter half of the consoles’ lifetime. 

So with the Xbox 360 Microsoft used three fairly simple IBM PowerPC cores, while Sony has the much publicized Cell processor in their PlayStation 3.  Both will perform absolutely much slower than even mainstream desktop processors in single threaded game code, but the majority of games these days are far more GPU bound than CPU bound, so the performance decrease isn’t a huge deal.  In the long run, with a bit of optimization and running multi-threaded game engines, these collections of simple in-order cores should be able to put out some fairly good performance. 

Does In-Order Matter?

As we discussed in our Cell article, in-order execution makes a lot of sense for the SPEs.  With in-order execution as well as a small amount of high speed local memory, memory access becomes quite predictable and code is very easily scheduled by the compiler for the SPEs.  However, for the PPE in Cell, and the PowerPC cores in Xenon, the in-order approach doesn’t necessarily make a whole lot of sense.  You don’t have the advantage of a cacheless architecture, even though you do have the ability to force certain items to remain untouched by the cache.  More than anything having an in-order general purpose core just works to simplify the core, at the expense of depending quite a bit on the compiler, and the programmer, to optimize performance. 

Very little of modern day games is written in assembly, most of it is written in a high level language like C or C++ and the compiler does the dirty work of optimizing the code and translating it into low level assembly.  Compilers are horrendously difficult to write; getting a compiler to work is a pretty difficult job in itself, but getting one to work well, regardless of what the input code is, is nearly impossible. 

However, with a properly designed ISA and a good compiler, having an in-order core to work on is not the end of the world.  The performance you lose by not being able to extract the last bit of instruction level parallelism is made up by the fact that you can execute far more threads per clock thanks to the simplicity of the in-order cores allowing more to be packed on a die.  Unfortunately, as we’ve already discussed, on day one that’s not going to be much of an advantage. 

The Cell processor’s SPEs are even more of a challenge, as they are more specialized hardware only suitable to executing certain types of code.  Keeping in mind that the SPEs are not well suited to running branch heavy code, loop unrolling will do a lot to improve performance as it can significantly reduce the number of branches that must be executed.  In order to squeeze the absolute maximum amount of performance out of the SPEs, developers may be forced to hand code some routines as initial performance numbers for optimized, compiled SPE code appear to be far less than their peak throughput. 

While the move to in-order architectures won’t cause game developers too much pain with good compilers at their disposal, the move to multi-threaded game development and optimizing for the Cell in general will be much more challenging. 

Xenon vs. Cell How Many Threads?


View All Comments

  • Darkon - Friday, June 24, 2005 - link


    WTF are you talking ?

    The Cell does general-purpose processing although not as good as 360 cpu.

    And Anand I suggest you do some more research on cell
  • Alx - Friday, June 24, 2005 - link

    Someone explain to me how Sony will support 1080p please. If developers make the games run at acceptable framerate at that resolution, most people running them at 720p and 480i will be wasting at least half of PS3's rendering power.

    On the other hand if XBOX360 game devs make their games run just fast enough at 720p, that'll give them far more resources to work with than those poor Sony game devs.
  • Shinei - Friday, June 24, 2005 - link

    That's not necessarily true, #48. The Cell processor doesn't do general-purpose processing, so it can't do decoding on its own--and as far as I know, even pressed DVDs have to be decoded by some kind of processor. (Of course, I know next to nothing about video equipment, so I could be wrong...) Reply
  • arturnow - Friday, June 24, 2005 - link

    Another difference between RSX and G70 is hardware video decoder - PureVideo, i'm sure RSX doesn't need that which saves transistors count Reply
  • freebst - Friday, June 24, 2005 - link

    Actually, in response to 31 there is no 1080p 60 frame/sec signal. the only HD signals are 1080 30p, 24p, 60i, 720 60p, 30p, 24p. Reply
  • BenSkywalker - Friday, June 24, 2005 - link

    Why the support for lower resolutions? I'm a bit confused by this- I can't see why anyone who isn't a fanatic loyalist wouldn't want to see the highest resolution possible supported by the consoles. The XBox(current) supports 1080i and despite the extreme rarity in which it is used- it IS used. Supporting 1080p x2 may seem like overkill, but think of the possibilities in terms of turn based RPGs or strategy games(particularly turn based) where 60FPS is very far removed from required.

    The most disappointing thing about the new generation of consoles is MS flipping its customers off in terms of backwards compatability. Even Nintendo came around this gen and MS comes up with some half done emulation that works on some of 'the best selling' games. Also, with their dropping production of the original XB already it appears they still have an enormous amount to learn about the console market(check out sales of the original PS after the launch of the PS2 for an example).
  • Warder45 - Friday, June 24, 2005 - link

    errr #31 not 37 Reply
  • Warder45 - Friday, June 24, 2005 - link

    #37 is right on the money. There is a good chance that there will be no HDTV that can accept a 1080p signal by the time the PS3 comes out.

    It seems less like Sony future proofing the PS3 and more like Sony saying we have bigger balls then MS. Not to say MS is exempt from doing the same.
  • IamTHEsnake - Friday, June 24, 2005 - link

    Excellent article Anand and crew.

    Thank you for the very informative read.
  • masher - Friday, June 24, 2005 - link

    > "Collision detection is a big part of what is commonly
    > referred to as “game physics.” ..."

    Sorry, collision detection is computational geometry, not physics.

    > "However it is possible to structure collision detection for
    > execution on the SPEs, but it would require a different
    > approach to the collision detection algorithms... "

    Again, untrue. You walk the tree on the PPE, whereas you do the actual intersection tests on the SPs. The SPs are also ideally suited to calculating the positions of each object (read: real physics) and updating the tree accordingly.

Log in

Don't have an account? Sign up now