Why In-Order?

Ever since the Pentium Pro, desktop PC microprocessors have implemented Out of Order (OoO) execution architectures in order to improve performance.  We’ve explained the idea in great detail before, but the idea is that an Out-of-Order microprocessor can reorganize its instruction stream in order to best utilize its execution resources.  Despite the simplicity of its explanation, implementing support for OoO dramatically increases the complexity of a microprocessor, as well as drives up power consumption. 

In a perfect world, you could group a bunch of OoO cores on a single die and offer both excellent single threaded performance, as well as great multi-threaded performance.  However, the world isn’t so perfect, and there are limitations to how big a processor’s die can be.  Intel and AMD can only fit two of their OoO cores on a 90nm die, yet the Xbox 360 and PlayStation 3 targeted 3 and 9 cores, respectively, on a 90nm die; clearly something has to give, and that something happened to be the complexity of each individual core. 

Given a game console’s 5 year expected lifespan, the decision was made (by both MS and Sony) to favor a multi-core platform over a faster single-core CPU in order to remain competitive towards the latter half of the consoles’ lifetime. 

So with the Xbox 360 Microsoft used three fairly simple IBM PowerPC cores, while Sony has the much publicized Cell processor in their PlayStation 3.  Both will perform absolutely much slower than even mainstream desktop processors in single threaded game code, but the majority of games these days are far more GPU bound than CPU bound, so the performance decrease isn’t a huge deal.  In the long run, with a bit of optimization and running multi-threaded game engines, these collections of simple in-order cores should be able to put out some fairly good performance. 

Does In-Order Matter?

As we discussed in our Cell article, in-order execution makes a lot of sense for the SPEs.  With in-order execution as well as a small amount of high speed local memory, memory access becomes quite predictable and code is very easily scheduled by the compiler for the SPEs.  However, for the PPE in Cell, and the PowerPC cores in Xenon, the in-order approach doesn’t necessarily make a whole lot of sense.  You don’t have the advantage of a cacheless architecture, even though you do have the ability to force certain items to remain untouched by the cache.  More than anything having an in-order general purpose core just works to simplify the core, at the expense of depending quite a bit on the compiler, and the programmer, to optimize performance. 

Very little of modern day games is written in assembly, most of it is written in a high level language like C or C++ and the compiler does the dirty work of optimizing the code and translating it into low level assembly.  Compilers are horrendously difficult to write; getting a compiler to work is a pretty difficult job in itself, but getting one to work well, regardless of what the input code is, is nearly impossible. 

However, with a properly designed ISA and a good compiler, having an in-order core to work on is not the end of the world.  The performance you lose by not being able to extract the last bit of instruction level parallelism is made up by the fact that you can execute far more threads per clock thanks to the simplicity of the in-order cores allowing more to be packed on a die.  Unfortunately, as we’ve already discussed, on day one that’s not going to be much of an advantage. 

The Cell processor’s SPEs are even more of a challenge, as they are more specialized hardware only suitable to executing certain types of code.  Keeping in mind that the SPEs are not well suited to running branch heavy code, loop unrolling will do a lot to improve performance as it can significantly reduce the number of branches that must be executed.  In order to squeeze the absolute maximum amount of performance out of the SPEs, developers may be forced to hand code some routines as initial performance numbers for optimized, compiled SPE code appear to be far less than their peak throughput. 

While the move to in-order architectures won’t cause game developers too much pain with good compilers at their disposal, the move to multi-threaded game development and optimizing for the Cell in general will be much more challenging. 

Xenon vs. Cell How Many Threads?
POST A COMMENT

93 Comments

View All Comments

  • PS3 Masterbater 5 - Tuesday, January 9, 2007 - link

    I WOULD JUST LIKE TO SAY THAT IF PS3 HAD A HOLE IN IT I WOULD INSERT MY PENIS IN AND MAKE SWEET LOVE TO IT BECAUSE IT IS THE GREATEST THING EVER. NINTENDO WII CAN SUCK MY HUGE COCK BECAUSE ITS A LITTLE BITCH AND IT IS THE POOR MANS PS3. IF NINTENDO WII WAS A MAN IT WOULD HAVE A VERY SMALL PENIS AND STILL BE A VIRGIN YOU GUYS ARE SO JEALOUS THAT I HAD THE FIRST PS3 EVER AND I WILL DOMINATE ANYONE IN "RESISTANCE : FALL OF MAN"
    Reply
  • Wizzdo - Friday, January 14, 2011 - link

    Definitely a bigger head below than above! Reply
  • steveyoung123456789 - Friday, December 9, 2011 - link

    your a virgin pussy and if i ever find out where you live i will kick your ass!!!!!!!!!!!! Reply
  • steveyoung123456789 - Friday, December 9, 2011 - link

    Btw your a psycho for wanting to fuck a gaming cousel... smh.... queef!! Reply
  • Oliseo - Thursday, January 2, 2020 - link

    How amusing would it be to meet the guy who wrote that, all these years later in a pub. And show him what he wrote. Wonder how he'd respond! Here's to 2020 my main man! Imagine your wife or kids seeing this. *cringe*

    ha! But isn't that the beauty of growing up, that we can all look back on our younger selves and cringe a little.

    Thing is, if you're not doing this, are you even progressing as a person!?
    Reply
  • SilverTrine - Friday, November 17, 2006 - link

    The GPU in the Ps3 is more than enough for what its intended for. Theres no magic in GPUs they're just specialized processors.

    In the Xbox360 the GPU carries more of the processing load. Remember the unified ram that the GPU uses in the Xbox360 is 700mhz fast.

    The GPU in the Ps3 also has 700mhz ram. However the Cell processor has access to XDR ram running at a whopping 3.2ghz! In the Ps3 system the Cell with the superfast XDR ram will do more of the grunt work and rely less on the GPU.

    Saying the GPU in the Xbox360 somehow gives the system is a mistake. What would you rather have doing processing work a GPU running relatively slow with 700mhz ram or a extremely fast Cell processor with 3.2ghz XDR ram?

    However utilizing this on the Ps3 will require more specialized programming, the Xbox360 because its fairly conventional will be able to tap more of its power sooner than the Ps3.
    Reply
  • tipoo - Wednesday, August 6, 2014 - link

    Uhh, "mhz fast" doesn't matter an iota. The bandwidth of that XDR RAM was still 25GB/s to the Cell, it just works in a different way than GDDR, needs a higher clock speed for similar bandwidth. The clock speed was no advantage. And the RSX could only get data back at 15GB/s from the Cell going to the XDR. Reply
  • theteamaqua - Tuesday, July 12, 2005 - link

    http://theconsolewars.blogspot.com/2005/05/xbox-36...
    i just wan people to know that how bias this site is, i mean this guy has no idea what he is talking about
    Reply
  • jwix - Wednesday, July 6, 2005 - link

    #77 I wouldn't say Anand's article was "full of shit." I would say it was a bit sensationlist, as stated in the Arstechnica article. What surprised me more than anything was that Anand would post such an article, then remove it so quickly. That's not his style.
    Bottom line though - these consoles will offer nothing new or innovative in the way of gameplay. I think I'll stick with my PC and Nintendo DS for now.
    Reply
  • steveyoung123456789 - Friday, December 9, 2011 - link

    get a life Reply

Log in

Don't have an account? Sign up now