Why In-Order?

Ever since the Pentium Pro, desktop PC microprocessors have implemented Out of Order (OoO) execution architectures in order to improve performance.  We’ve explained the idea in great detail before, but the idea is that an Out-of-Order microprocessor can reorganize its instruction stream in order to best utilize its execution resources.  Despite the simplicity of its explanation, implementing support for OoO dramatically increases the complexity of a microprocessor, as well as drives up power consumption. 

In a perfect world, you could group a bunch of OoO cores on a single die and offer both excellent single threaded performance, as well as great multi-threaded performance.  However, the world isn’t so perfect, and there are limitations to how big a processor’s die can be.  Intel and AMD can only fit two of their OoO cores on a 90nm die, yet the Xbox 360 and PlayStation 3 targeted 3 and 9 cores, respectively, on a 90nm die; clearly something has to give, and that something happened to be the complexity of each individual core. 

Given a game console’s 5 year expected lifespan, the decision was made (by both MS and Sony) to favor a multi-core platform over a faster single-core CPU in order to remain competitive towards the latter half of the consoles’ lifetime. 

So with the Xbox 360 Microsoft used three fairly simple IBM PowerPC cores, while Sony has the much publicized Cell processor in their PlayStation 3.  Both will perform absolutely much slower than even mainstream desktop processors in single threaded game code, but the majority of games these days are far more GPU bound than CPU bound, so the performance decrease isn’t a huge deal.  In the long run, with a bit of optimization and running multi-threaded game engines, these collections of simple in-order cores should be able to put out some fairly good performance. 

Does In-Order Matter?

As we discussed in our Cell article, in-order execution makes a lot of sense for the SPEs.  With in-order execution as well as a small amount of high speed local memory, memory access becomes quite predictable and code is very easily scheduled by the compiler for the SPEs.  However, for the PPE in Cell, and the PowerPC cores in Xenon, the in-order approach doesn’t necessarily make a whole lot of sense.  You don’t have the advantage of a cacheless architecture, even though you do have the ability to force certain items to remain untouched by the cache.  More than anything having an in-order general purpose core just works to simplify the core, at the expense of depending quite a bit on the compiler, and the programmer, to optimize performance. 

Very little of modern day games is written in assembly, most of it is written in a high level language like C or C++ and the compiler does the dirty work of optimizing the code and translating it into low level assembly.  Compilers are horrendously difficult to write; getting a compiler to work is a pretty difficult job in itself, but getting one to work well, regardless of what the input code is, is nearly impossible. 

However, with a properly designed ISA and a good compiler, having an in-order core to work on is not the end of the world.  The performance you lose by not being able to extract the last bit of instruction level parallelism is made up by the fact that you can execute far more threads per clock thanks to the simplicity of the in-order cores allowing more to be packed on a die.  Unfortunately, as we’ve already discussed, on day one that’s not going to be much of an advantage. 

The Cell processor’s SPEs are even more of a challenge, as they are more specialized hardware only suitable to executing certain types of code.  Keeping in mind that the SPEs are not well suited to running branch heavy code, loop unrolling will do a lot to improve performance as it can significantly reduce the number of branches that must be executed.  In order to squeeze the absolute maximum amount of performance out of the SPEs, developers may be forced to hand code some routines as initial performance numbers for optimized, compiled SPE code appear to be far less than their peak throughput. 

While the move to in-order architectures won’t cause game developers too much pain with good compilers at their disposal, the move to multi-threaded game development and optimizing for the Cell in general will be much more challenging. 

Xenon vs. Cell How Many Threads?
POST A COMMENT

93 Comments

View All Comments

  • calimero - Wednesday, July 6, 2005 - link

    http://arstechnica.com/news.ars/post/20050629-5054...

    btw Anand article was "full of shit" (sorry but that is the right phrase) and it's not odd that Anand pull it. It's quite embarassing for Anand; someone already told: one thing is to write test of CPU speed and speed of graphics card in games... and another to analyse CPU architecture.
    Reply
  • jwix - Tuesday, July 5, 2005 - link

    Creathir - the article was reposted on other forums around the net. Here is the story in summary - Sony & Microsoft have both overhyped the processing power of their cpu's by using clever marketing speak. It turns out the processor designs are uneccessarily complicated, inefficient at crunching today's game code, and unlikely to be useful when game code finally becomes fully multi-threaded in the coming years. Why microsoft and sony didn't go with an Intel or AMD design, I don't know. The article speculates that both companies wanted IP rights to the cpu, maybe that's the reason.
    The GPU's on the other hand look plenty powerful. They should both be relatively equivalent in performance to the R520 and the current 7800 GTX.
    Bottom line - the new consoles will be quite powerful compared to the previous generation. However, PC's will still be more powerful, and wil remain the platform of choice for high end gaming. Something I was glad to read as I just built a new pc.

    Reply
  • steveyoung123456789 - Friday, December 9, 2011 - link

    wow your so smart! faggit Reply
  • creathir - Saturday, July 2, 2005 - link

    jwix:
    I had read a good portion of the article, but had been pulled away (thought to myself I'll just reread it later) and was upset to find it was gone. I have never seen this here at Anandtech, and Anand has not made a single comment on his blog about it. I suppose some fact was incorrect? Maybe Sony/Microsoft decided they would SUE him over the article? I bet the most logical answer is this, Tim Sweeney saw the article, and even though Anand referenced the "anonymous developer", he had earlier mentioned in his blog he had been waiting for some answers from Tim. I would bet this "outed" his source, much like the LA Times outed their source recently for a Grand Jury. This outing probably was followed by a request by Tim to pull the article. I would have to bet we will see it soon enough, reworked, reworded. Whatever the case, Anand, it was a good article, you should be sure to repost it.
    - Creathir
    Reply
  • steveyoung123456789 - Friday, December 9, 2011 - link

    o someone can read!! yay! Reply
  • linkgoron - Thursday, June 30, 2005 - link

    blckgrffn, THIS IS NOT i repeat NOT the article you think it is. Reply
  • blckgrffn - Thursday, June 30, 2005 - link

    Yes it is back up! :D

    Nat
    Reply
  • jwix - Thursday, June 30, 2005 - link

    Last night, around 10:00pm EST, I surfed over to the Anandtech home page to see what was happening. I was greeted by Part II of the article (Xbox 360, Sony PS3 - a hardware discussion). Did anyone else read this article last night. I was only able to read the first 2 pages before the article was pulled off the website. Why would they post it and then pull it so quickly? And why has not been reposted since?
    The story it told was unbelievable - basically, the floating point processing power of both the Sony and Xbox processor was less than half of your average Pentium 4. Anand went into detail on how and why this was the case. His sources apparently were confidential, but definitely industry insiders (ie...game developers). I wish I could have finished reading the article before it was pulled. Did anyone read the whole article?
    Reply
  • ecoumans - Thursday, June 30, 2005 - link

    Physics Middleware will be Multithreaded and heavily optimized for Cell's 7 SPE's. This makes life easier for gamedevelopers, and it changes the story about CPU usage... Same story for sound etc. Reply
  • Houdani - Tuesday, June 28, 2005 - link

    29: In order to turn off the "sponsored links" go to ABOUT in the top left menu and turn off INTELITEXT.

    I think this setting is stored in a cookie, so you will need to do this everytime you clear your cookies.
    Reply

Log in

Don't have an account? Sign up now