Ordering Instructions around Dependencies

Luckily, there are solutions to the problem of dependencies in code; one tackles the problem in hardware, the other tackles the problem in software.

The software compiler is responsible for producing the assembly code that is sent to the CPU for execution.   Thus, with an intimate knowledge of the inner workings of the CPU, the compiler can, generally speaking, produce code that minimizes data dependencies.

There are microprocessor architectures that are dependent entirely on the compiler to extract parallelism, on the instruction level, while avoiding dependencies as much as possible.  These architectures are known as in-order microprocessors.

In-Order Architectures

As the name implies, an in-order microprocessor can only execute instructions in the order that they are sent to the CPU.   At best, the CPU can execute multiple instructions in parallel, but it has no ability to reorder the instructions to suit its needs better.

If you have a good enough compiler, then an in-order microprocessor should be just fine.   There are a couple of key limitations, however:

1.      Binaries Compiled for in-order architectures are very architecture specific

Although both the Athlon 64 and the Pentium 4 are fully able to run x86 code, they contain vastly different microarchitectures, with different execution units and very different things that they are “good” at.   If both of the aforementioned chips depended entirely on the compiler to extract parallelism and maximize performance, one would most definitely suffer.   You could always have two versions of every program, but that tends to get large and messy - especially from an update/patches standpoint.   The compiler has to be intimately aware of the architecture that it’s compiling for, which works in cases like a game console where you don’t have multiple vendors providing differently architected CPUs with a common ISA, yet not so well when you look at something like the desktop x86 market.

2.      Unpredictable memory latencies

Cache is a good thing, most of the time.   Cache on a microprocessor does its best to keep frequently used data at hand, so it can be made available to the CPU at very low latencies.   The problem is that cache adds a level of unpredictability to how long it will take to get data from memory.   A cache hit could mean that your data will be ready in 10 - 20 cycles.  A cache miss could mean that it’ll be hundreds of cycles.   With an in-order microprocessor, you can’t reorder instructions based on data availability, so if data isn’t available in cache and the CPU has to wait longer to pull it from main memory, the entire CPU has to sit and wait until that data is brought in from main memory.   Even if other instructions could be executed, an in-order microprocessor has no logic to effectively handle the on-the-fly reordering of instructions to get around unpredictable memory latencies.

If you can find a way around the limitations of an in-order architecture, there are some very tangible benefits:

1.      A much simplified microprocessor

Out-of-Order microprocessors have a significant amount of complexity added to them in order to deal with on-the-fly reordering of instructions.  We will talk about them in greater detail in the next section.   By moving this complexity to the software/compiler side, you greatly reduce the complexity of your microprocessor and save your transistor budget for other things that can yield better performance benefits.   Less complexity also means less power consumed and heat dissipated.

2.      Shorter pipeline

In order to deal with the reordering of instructions, generally speaking, a number of pipeline stages have to be added to the architecture, resulting in higher power consumption and demands for a more accurate branch predictor (thanks to an even higher branch prediction penalty).   While the impact on pipeline depth isn’t as big of a deal for longer pipelined designs, for shorter designs, the increase can be 40% or more.

Historically, the idea of a simple in-order core has been one that’s been abandoned in favor of the obvious alternative: an out-of-order architecture.

Cell's In-Order Architecture Out-of-Order Architectures
Comments Locked

70 Comments

View All Comments

  • ceefka - Thursday, March 17, 2005 - link

    Rambus'Revenge
  • Locut0s - Thursday, March 17, 2005 - link

    Great article Anand!! Yeah I actually get to bring my Comp150 knowledge to bear in reading this article! If this had come out 6 months ago I would have been totally lost. It will indeed be interesting to see what headway Cell can make, however unfortunately as Anand alludes to the x86 architecture is just too heavily entrenched for anything to budge it except the Big 2 (AMD and Intel). I can't wait to see what type of power the Playstation 3 will have though, and especially how that power will be utilized in games. I bet there will be some jaw dropping graphics awaiting us there. That is if Cells limitations don't hold back lazy game developers and lead to a string of mediocre games punctuated by a few amazing titles made by independent developers who really care to utilize the architecture. Didn't the Playstantion 1 suffer something similar?
  • knitecrow - Thursday, March 17, 2005 - link

    The real world technology article on the cell, states that it gives up single thread performance in favour of runing many parallel threads. That sounds like a terrible difficult processor to development games for.

    I for one think it will be easier to put the burden on the hardware rather than on the software side.

    Can we see another repeat of PS2? Technically impressive, but hard to code for.
  • JarredWalton - Thursday, March 17, 2005 - link

    11 - I think the point is that games tend to use certain functions of a CPU much more frequently, while general business/office applications make use of a wider range of generic operations. I understand your complaint, as office applications generally don't need a lot more power than about 1.5 GHz at most. However, the key of the statement was the "general purpose microprocessor" and not the "very powerful" part.
  • AnandThenMan - Thursday, March 17, 2005 - link

    WAIT. What the flock does this mean?

    "Performance in business/office applications requires a very powerful, very fast general purpose microprocessor, but performance in a game console, for example, does not."

    WHAT??????? Hello?? So an office app like Word needs a very powerful processor, but a game console does not? I beg to differ. I suppose it depends on how you define "business/office application" but I think that statement is WAY off. I know several current office applications that will limp along on a pentium 133, but no current game has any hope on the same CPU.
  • tipoo - Wednesday, July 30, 2014 - link

    It was clear to me that meant console CPUs didn't have to be as general purpose and brute force powerful in every regard - they can get away with being more specialized, and suck at general work, but still fast for game specific code.
  • Googer - Thursday, March 17, 2005 - link

    When are they coming out? Anyone know of a release date?
  • jeffbui - Thursday, March 17, 2005 - link

    #4, I do. Heh.

    I've been waiting for this article forever.. thanks!
  • JarredWalton - Thursday, March 17, 2005 - link

    Interesting stuff. The Playstation has always been something of a pain in the rear to program. PS1 went it's own way, and PS2 did the same. PS3 and Cell seem ready to pave new roads into the "OMG this is really complex" land of programming. I'm glad I've given up serious programming.... :)
  • Googer - Thursday, March 17, 2005 - link

    In soviet russia cell processor controls your mind.

Log in

Don't have an account? Sign up now