Why In-Order?

Ever since the Pentium Pro, desktop PC microprocessors have implemented Out of Order (OoO) execution architectures in order to improve performance.  We’ve explained the idea in great detail before, but the idea is that an Out-of-Order microprocessor can reorganize its instruction stream in order to best utilize its execution resources.  Despite the simplicity of its explanation, implementing support for OoO dramatically increases the complexity of a microprocessor, as well as drives up power consumption. 

In a perfect world, you could group a bunch of OoO cores on a single die and offer both excellent single threaded performance, as well as great multi-threaded performance.  However, the world isn’t so perfect, and there are limitations to how big a processor’s die can be.  Intel and AMD can only fit two of their OoO cores on a 90nm die, yet the Xbox 360 and PlayStation 3 targeted 3 and 9 cores, respectively, on a 90nm die; clearly something has to give, and that something happened to be the complexity of each individual core. 

Given a game console’s 5 year expected lifespan, the decision was made (by both MS and Sony) to favor a multi-core platform over a faster single-core CPU in order to remain competitive towards the latter half of the consoles’ lifetime. 

So with the Xbox 360 Microsoft used three fairly simple IBM PowerPC cores, while Sony has the much publicized Cell processor in their PlayStation 3.  Both will perform absolutely much slower than even mainstream desktop processors in single threaded game code, but the majority of games these days are far more GPU bound than CPU bound, so the performance decrease isn’t a huge deal.  In the long run, with a bit of optimization and running multi-threaded game engines, these collections of simple in-order cores should be able to put out some fairly good performance. 

Does In-Order Matter?

As we discussed in our Cell article, in-order execution makes a lot of sense for the SPEs.  With in-order execution as well as a small amount of high speed local memory, memory access becomes quite predictable and code is very easily scheduled by the compiler for the SPEs.  However, for the PPE in Cell, and the PowerPC cores in Xenon, the in-order approach doesn’t necessarily make a whole lot of sense.  You don’t have the advantage of a cacheless architecture, even though you do have the ability to force certain items to remain untouched by the cache.  More than anything having an in-order general purpose core just works to simplify the core, at the expense of depending quite a bit on the compiler, and the programmer, to optimize performance. 

Very little of modern day games is written in assembly, most of it is written in a high level language like C or C++ and the compiler does the dirty work of optimizing the code and translating it into low level assembly.  Compilers are horrendously difficult to write; getting a compiler to work is a pretty difficult job in itself, but getting one to work well, regardless of what the input code is, is nearly impossible. 

However, with a properly designed ISA and a good compiler, having an in-order core to work on is not the end of the world.  The performance you lose by not being able to extract the last bit of instruction level parallelism is made up by the fact that you can execute far more threads per clock thanks to the simplicity of the in-order cores allowing more to be packed on a die.  Unfortunately, as we’ve already discussed, on day one that’s not going to be much of an advantage. 

The Cell processor’s SPEs are even more of a challenge, as they are more specialized hardware only suitable to executing certain types of code.  Keeping in mind that the SPEs are not well suited to running branch heavy code, loop unrolling will do a lot to improve performance as it can significantly reduce the number of branches that must be executed.  In order to squeeze the absolute maximum amount of performance out of the SPEs, developers may be forced to hand code some routines as initial performance numbers for optimized, compiled SPE code appear to be far less than their peak throughput. 

While the move to in-order architectures won’t cause game developers too much pain with good compilers at their disposal, the move to multi-threaded game development and optimizing for the Cell in general will be much more challenging. 

Xenon vs. Cell How Many Threads?
POST A COMMENT

93 Comments

View All Comments

  • MDme - Friday, June 24, 2005 - link

    now i know what to buy :) Reply
  • SuperStrokey - Friday, June 24, 2005 - link

    lol, thats funny Reply
  • bldckstark - Friday, June 24, 2005 - link

    Having a PS2 and an XBOX I was not even thinking about buying a PS3 since the XBOX kicks the PS2's ace. (IMHO). After reading this article I have much more respect for the PS3 and now I don't have any idea which onw I will buy. My wife may force me to buy the PS3 if the 360 isn't as backward compatible as most want it to be.

    Maybe I will just use my unusually large brain to create a PS360 that will play everything. Oooh, wait, I gotta get a big brain first. Then a big p3nis. Or maybe just a normal one.
    Reply
  • Furen - Friday, June 24, 2005 - link

    #37: supposedly yes. Since it will have to be through hardcore emulation there will be issues (but of course). It wont be fully transparent like the ps2 but rather you'll have profiles saved on your harddrive which will tell the system how to run the games. Reply
  • SuperStrokey - Friday, June 24, 2005 - link

    I havnt been following the 360 too much (im a self admitted nintendo fanboy), but will it be backward compatible too? I heard it was still up in the air but as PS3 is going to be and revolution is going to be (bigtime) i would assume that 360 will be too right? Reply
  • ZobarStyl - Friday, June 24, 2005 - link

    #32 is right: how many games get released for all 3 console with only minor, subtle differences between them? Most of the time, first party stuff is the only major difference between consoles. Very few 3rd party games are held back from the 'slower' consoles; most are just licensing deals (GTA:SA on PS2, for example). And if you look back, of the first party games lineup, XBox didn't have the most compelling of libraries, in my opinion. Reply
  • yacoub - Friday, June 24, 2005 - link

    imo, the revolution will be a loser in more than just hardware. i can't remember the last time i actually wanted to play any of the exclusive nintendo games. actually, i think for about one day i considered a gamecube for metroid but then i saw it in action at a friend's place and was underwhelmed by the gameplay. forget mario and link, give me splinter cell or gran tourismo or forza or... yeah you get the idea. Reply
  • nserra - Friday, June 24, 2005 - link

    #27

    If you read the article carefully, you will see that since they are "weaker" pipelines, the 48 will perform like 24 "complete" ones.

    I think with this Ati new design, there will be games where the performance will be much better, equal or worst.
    But that’s the price to pay for complete new designs.

    On paper Ati design is much more advance, in fact reminds the VOODOO2 design where there are more than one chip doing things. I think I prefer some very fancy graphics design over a double all easy solution.
    Reply
  • Taracta - Friday, June 24, 2005 - link

    With 25.5 Gbs of bandwith to memory, is OoO (Out Of Order processing) necessary? Isn't OoO and its ilk bandwith hiding solutions? I have an issue with regards to Anandtech outlook on the SPPs of the CELL processor (I could be wrong). I consider the SPPs to be full fledge Vector Processors and not just fancy implementation of MMX, SSE, Altivec etc, which seems to be Anandtech's outlook. As full fledge Vector Processors they are orders of magnitude more flexible than that and as Vector Processors comparing them to Scalar Processors is erroneous.

    Another thing, RISC won the war! Don't believe, what do you call a processor with a RISC core with a CISC hardware translator around it? CISC? I think not, it's a RISC processor. x86 did win the procesor war but not by beating them but joining them and by extension CISC loss. Just needed to clear that up. The x86 instruction set won but the old x86 CISC architecture loss. The x86 insrtuction set will always win, fortunately for AMD because the Itanium was to have been their death. No way could they have copied the Itanium in this day and age which come to think of it is very unfortunate.

    From you have the processor the runs x86 the best you will always win. Unless you can get a toehold in the market with something else such as LINUX and CELL!
    Reply
  • CuriousMike - Friday, June 24, 2005 - link

    If it's a 3rd party game, it won't matter (greatly) which platform you pick, because developers will develop to the least-common-denominator.

    In the current generation, about the best one could hope for is slightly higher-res textures and better framerate on XBOX over ps2/gc.

    IMO, pick your platform based on first-party games/series you're looking forward to. Simple as that.

    Reply

Log in

Don't have an account? Sign up now