Why In-Order?

Ever since the Pentium Pro, desktop PC microprocessors have implemented Out of Order (OoO) execution architectures in order to improve performance.  We’ve explained the idea in great detail before, but the idea is that an Out-of-Order microprocessor can reorganize its instruction stream in order to best utilize its execution resources.  Despite the simplicity of its explanation, implementing support for OoO dramatically increases the complexity of a microprocessor, as well as drives up power consumption. 

In a perfect world, you could group a bunch of OoO cores on a single die and offer both excellent single threaded performance, as well as great multi-threaded performance.  However, the world isn’t so perfect, and there are limitations to how big a processor’s die can be.  Intel and AMD can only fit two of their OoO cores on a 90nm die, yet the Xbox 360 and PlayStation 3 targeted 3 and 9 cores, respectively, on a 90nm die; clearly something has to give, and that something happened to be the complexity of each individual core. 

Given a game console’s 5 year expected lifespan, the decision was made (by both MS and Sony) to favor a multi-core platform over a faster single-core CPU in order to remain competitive towards the latter half of the consoles’ lifetime. 

So with the Xbox 360 Microsoft used three fairly simple IBM PowerPC cores, while Sony has the much publicized Cell processor in their PlayStation 3.  Both will perform absolutely much slower than even mainstream desktop processors in single threaded game code, but the majority of games these days are far more GPU bound than CPU bound, so the performance decrease isn’t a huge deal.  In the long run, with a bit of optimization and running multi-threaded game engines, these collections of simple in-order cores should be able to put out some fairly good performance. 

Does In-Order Matter?

As we discussed in our Cell article, in-order execution makes a lot of sense for the SPEs.  With in-order execution as well as a small amount of high speed local memory, memory access becomes quite predictable and code is very easily scheduled by the compiler for the SPEs.  However, for the PPE in Cell, and the PowerPC cores in Xenon, the in-order approach doesn’t necessarily make a whole lot of sense.  You don’t have the advantage of a cacheless architecture, even though you do have the ability to force certain items to remain untouched by the cache.  More than anything having an in-order general purpose core just works to simplify the core, at the expense of depending quite a bit on the compiler, and the programmer, to optimize performance. 

Very little of modern day games is written in assembly, most of it is written in a high level language like C or C++ and the compiler does the dirty work of optimizing the code and translating it into low level assembly.  Compilers are horrendously difficult to write; getting a compiler to work is a pretty difficult job in itself, but getting one to work well, regardless of what the input code is, is nearly impossible. 

However, with a properly designed ISA and a good compiler, having an in-order core to work on is not the end of the world.  The performance you lose by not being able to extract the last bit of instruction level parallelism is made up by the fact that you can execute far more threads per clock thanks to the simplicity of the in-order cores allowing more to be packed on a die.  Unfortunately, as we’ve already discussed, on day one that’s not going to be much of an advantage. 

The Cell processor’s SPEs are even more of a challenge, as they are more specialized hardware only suitable to executing certain types of code.  Keeping in mind that the SPEs are not well suited to running branch heavy code, loop unrolling will do a lot to improve performance as it can significantly reduce the number of branches that must be executed.  In order to squeeze the absolute maximum amount of performance out of the SPEs, developers may be forced to hand code some routines as initial performance numbers for optimized, compiled SPE code appear to be far less than their peak throughput. 

While the move to in-order architectures won’t cause game developers too much pain with good compilers at their disposal, the move to multi-threaded game development and optimizing for the Cell in general will be much more challenging. 

Xenon vs. Cell How Many Threads?
Comments Locked

93 Comments

View All Comments

  • jotch - Friday, June 24, 2005 - link

    #20 well that can't be right for the whole consumer base, as I'm 24 and only know other adults that have consoles and alot of them have flashy tv's for them as well, I do. I think if you look at the market for consoles it is mainly teens and adults that have consoles - not kids. Alot of people I know started with a NES or an Atari 2500, etc and have continued to like games as they have grown up. Why is it that the best selling game has an 18 rating?? (GTA: San Andreas)

    The burning of the screen would be minimal unless you have a game paused for hours and the tv left on - TV technology is moving on and they often turn themselves off if a static image is displayed for an amount of time. So burning shouldn't occur.
  • nserra - Friday, June 24, 2005 - link

    All the people that i know having consoles is kids (80%), and their parents have bought an TV just for the console, an 70€ TV.....

    Who is the parent that will let kids on an LCD or PLASMA (3000€) to play games (burn them).

    Or there will be good 480i "compatibility" in games, or forget it....

    #17 I agree.
  • fitten - Friday, June 24, 2005 - link

    #14 There are a number of issues being discussed.

    For example, given the nature of current AI code, making that code parallel (as in more than one thread executing AI code working together) seems non-trivial. Data dependencies and the very branch heavy code making data dependencies less predictable probably cause headaches here. Sure, one could probably take the simple approach and say one thread for AI, one for physics, one for blah but that has already been discussed by numerous people as a possibility.

    Parallel code comes in many flavors. The parallelism in the graphics card, for instance, is sometimes classified as "embarassingly parallel" which means it's trivial to do. Then there are pipelines (dataflow) which CPUs and GPUs also use. These are usually fairly easy too because the data partitioning is pretty easy. You break out a thread for each overall task that you want to do. You want to do OpA on the data, then OpB, then OpC. All OpB depends on is the output data of OpA and OpC just depends on OpB's final product. Three threads, each one doing an Op on the output of the previous.

    Then there are codes that are quite a bit more complex where, for example, there are numerous threads that all execute on parts of the whole data instead of all of it at once but the solution they are solving for requires many iterations on the data and at the end of each iteration, all the threads exchange data with each other (or just their 'neighbors') so that the next iteration can be performed. These are a bit more work to develop.

    Anyway, I got long-winded anyway. Basically... there are *many* kinds of parallelism and many kinds of algorithms and implementations of parallelism. Some are low hanging fruit and some are non-trivial. Since I've already read that numerous developers for each platform already see low hanging fruit (run one thread for AI, another for physics, etc.) I can only believe they are talking about things that are non-trivial, such as a multithreaded AI engine, for example (again, as opposed to just breaking out the AI engine into one thread seperate from the rest of game play).
  • probedb - Friday, June 24, 2005 - link

    Nice article! I'll wait till they're both out and have a play before I buy either. Last console I bought was an original PlayStation :) But gotta love that hi-def loveliness at last!

    #3 yeah 1080i is interlaced and at such a high res and low refresh the text is really difficult to read, it'd be far better at 1080p I think since that would effectively be the same as 1920x1080 on a normal monitor. 1080i is flickery as hell for me for desktop use but fine for any video and media centre type interfaces on the PC.
  • A5 - Friday, June 24, 2005 - link

    You know, the vast majority of the TVs these systems will be hooked up to will only do 480i (standard TV)...
  • jotch - Friday, June 24, 2005 - link

    #14 - here here!
  • jotch - Friday, June 24, 2005 - link

    #10 - sounds to me like they're way ahead of they're time, future-proofing is good as they'll need another 6 years to develop the PS4 - but the Cell and Xenon will force developers to change their ways and will prepare them for the future of developing on PC's that eventually have this kind of CPU chip design (ref intel's chip design future pic on the first page of the article), like the article says the initial round of games will be single threaded etc etc...

    You might get alot of mediocre games but then you should get ones that really shine bright on the PS3, noticeably Unreal 3 and I bet the Gran Turismo (polyphony) guys will put in the effort.
  • Pannenkoek - Friday, June 24, 2005 - link

    I'm quite tired of hearing how difficult it is to develop a multithreaded game. Only pathetic programmers can not grasp the concept of parallel code execution, it's not as if the current CPU/GPU duality does not qualify as one.
  • knitecrow - Friday, June 24, 2005 - link

    you'll need HDD for online service and MMOP

    how many people are going to buy a $100 HDD if they don't have to?
  • LanceVance - Friday, June 24, 2005 - link

    "the PS3 won’t ship with a hard drive"

    If that's true, then will it be like:

    - PS2 Memory Card; non-included but standard equipment required by all games.
    - PS2 Hard Drive; non-included and considered exotic unusual equipment and used by very few games.

Log in

Don't have an account? Sign up now