Why In-Order?

Ever since the Pentium Pro, desktop PC microprocessors have implemented Out of Order (OoO) execution architectures in order to improve performance.  We’ve explained the idea in great detail before, but the idea is that an Out-of-Order microprocessor can reorganize its instruction stream in order to best utilize its execution resources.  Despite the simplicity of its explanation, implementing support for OoO dramatically increases the complexity of a microprocessor, as well as drives up power consumption. 

In a perfect world, you could group a bunch of OoO cores on a single die and offer both excellent single threaded performance, as well as great multi-threaded performance.  However, the world isn’t so perfect, and there are limitations to how big a processor’s die can be.  Intel and AMD can only fit two of their OoO cores on a 90nm die, yet the Xbox 360 and PlayStation 3 targeted 3 and 9 cores, respectively, on a 90nm die; clearly something has to give, and that something happened to be the complexity of each individual core. 

Given a game console’s 5 year expected lifespan, the decision was made (by both MS and Sony) to favor a multi-core platform over a faster single-core CPU in order to remain competitive towards the latter half of the consoles’ lifetime. 

So with the Xbox 360 Microsoft used three fairly simple IBM PowerPC cores, while Sony has the much publicized Cell processor in their PlayStation 3.  Both will perform absolutely much slower than even mainstream desktop processors in single threaded game code, but the majority of games these days are far more GPU bound than CPU bound, so the performance decrease isn’t a huge deal.  In the long run, with a bit of optimization and running multi-threaded game engines, these collections of simple in-order cores should be able to put out some fairly good performance. 

Does In-Order Matter?

As we discussed in our Cell article, in-order execution makes a lot of sense for the SPEs.  With in-order execution as well as a small amount of high speed local memory, memory access becomes quite predictable and code is very easily scheduled by the compiler for the SPEs.  However, for the PPE in Cell, and the PowerPC cores in Xenon, the in-order approach doesn’t necessarily make a whole lot of sense.  You don’t have the advantage of a cacheless architecture, even though you do have the ability to force certain items to remain untouched by the cache.  More than anything having an in-order general purpose core just works to simplify the core, at the expense of depending quite a bit on the compiler, and the programmer, to optimize performance. 

Very little of modern day games is written in assembly, most of it is written in a high level language like C or C++ and the compiler does the dirty work of optimizing the code and translating it into low level assembly.  Compilers are horrendously difficult to write; getting a compiler to work is a pretty difficult job in itself, but getting one to work well, regardless of what the input code is, is nearly impossible. 

However, with a properly designed ISA and a good compiler, having an in-order core to work on is not the end of the world.  The performance you lose by not being able to extract the last bit of instruction level parallelism is made up by the fact that you can execute far more threads per clock thanks to the simplicity of the in-order cores allowing more to be packed on a die.  Unfortunately, as we’ve already discussed, on day one that’s not going to be much of an advantage. 

The Cell processor’s SPEs are even more of a challenge, as they are more specialized hardware only suitable to executing certain types of code.  Keeping in mind that the SPEs are not well suited to running branch heavy code, loop unrolling will do a lot to improve performance as it can significantly reduce the number of branches that must be executed.  In order to squeeze the absolute maximum amount of performance out of the SPEs, developers may be forced to hand code some routines as initial performance numbers for optimized, compiled SPE code appear to be far less than their peak throughput. 

While the move to in-order architectures won’t cause game developers too much pain with good compilers at their disposal, the move to multi-threaded game development and optimizing for the Cell in general will be much more challenging. 

Xenon vs. Cell How Many Threads?
Comments Locked

93 Comments

View All Comments

  • LanceVance - Friday, June 24, 2005 - link

    Excellent article. Definitely the most thorough, informative, well researched article on the PS3/Xbox360.

    And most importantly, unlike every other article on the subject, it's not strongly biased toward one camp while making comments of substance.
  • yacoub - Friday, June 24, 2005 - link

    I bet the PS3 debuts at a higher price.

    Also regarding statements made on the Conclusionary page:

    --"That being said, it won’t be impossible to get the same level of performance out of the PS3, it will just take more work. In fact, specialized hardware can be significantly faster than general purpose hardware at certain tasks, giving the PS3 the potential to outperform the Xbox 360 in CPU tasks. It has yet to be seen how much work is required to truly exploit that potential however, and it will definitely be a while before we can truly answer that question."--

    I find it funny that once again the PlayStation will be the harder system to code games for that take full advantage of its abilities. If trends mimic the past (as they often do) this will lead to a large amount of mediocre games by companies too small to afford the dev time necessary to take real advantage of the PS3's advantages or on deadlines too tight to spend the time doing more.
  • Furen - Friday, June 24, 2005 - link

    It does sound pretty low but (I'm guessing) it's more than enough, I dont think they would have separated the dies unless it didnt lead to a big performance penalty. also, I'm guessing that the 256MB/sec bandwidth between the eDRAM and its processing hardware is 256GB/sec? Microsoft was using that number to inflate their "system bandwidth" total.
  • Woodchuck2000 - Friday, June 24, 2005 - link

    And for that matter, 32Mb/s inter-die communications in the Xenos GPU seems low to me
    :p
    Good article though guys!
  • Furen - Friday, June 24, 2005 - link

    Is there any word on the media center extender capabilities on the xbox 360? I think Microsoft mentioned something about that but I'm not sure if that was oficial or not. Just hope they allow us to plug in some video capture device and use it as a dvr eventually.

    As much as I like sony's playstation, I find it quite boring on the technical side. It seems like they're just throwing everything they can into it but nothing is really that exciting, or useful. Come on, dual-HDMI. I dont see myself having two HDTVs in such close proximity to each other. Gigabit router? Seems like they're desperate to use the extra cpu muscle. I wonder how heavy ethernet traffic will affect cpu usage.
  • Woodchuck2000 - Friday, June 24, 2005 - link

    Surely porting between multi-core PC software and Xenon should be fairly trivial, not fairly Non-trivial as stated in the article...?
  • jotch - Friday, June 24, 2005 - link

    I stands for interlaced whilst the P stands for progressive scan. Check out the difference at http://en.wikipedia.org/wiki/720p

    or

    http://en.wikipedia.org/wiki/1080i

    This should resolve this issue.
  • AnnihilatorX - Friday, June 24, 2005 - link

    1080i = 720p doesn't it? 1080p is the one Xbox 360 doesn't support.

    These "i"s and "p"s are confusing me
  • sprockkets - Friday, June 24, 2005 - link

    How is 1080i on your tv's? On my 1 year old Mitsubishi native 1080i tv using dvi from the computer at 1080i is basically useless since the text is too small and the image looks like the refresh rate is below 60hz, whereas HDTV broadcasts look fine. Using the other mode of 720x480 looked great.

    Will HD output from a console be any better than a video card in a computer? Is it just my tv?

    Cmon, did you really think nVidia would release something far more advanced for a console than for a video card, or perhaps, more specifically, having it way outperform 6800 ultras in sli?

    If you need around a 400w power supply for even non sli setup, what kind of heat and power will these new consoles need anyhow???

    Of course I am more interested in how the PS3 will work with Linux more than games hahahahaha, since Sony officially mentioned it.
  • emmap - Sunday, December 4, 2005 - link

    And that's this article, Sony and M$ have missed:

    it's not the number of megapixels, shader pipelines, CPU / GPU bandwidth, multithreaded or single threaded code which do a great game. It's imagination put in the game, gameplay, artistic art quality, human feeling we get looking at the characters, fun and so on. It's not only mathematics and physics: we don't love a game because it has X millions polygons or run at Y fps, no it's totally different. Just see all the mame fans out there, you'll see that they don't care about the obsolete hardware the game they are playing on, they care about the most important thing about game: ENTERTAINMENT!

Log in

Don't have an account? Sign up now