Pipelining: 101

It seems like every time Intel releases a new processor we have to revisit the topic of pipelining to help explain why a 3GHz P4 performs like a 2GHz Athlon 64. With a 55% longer pipeline than Northwood, Prescott forces us to revisit this age old topic once again.

You've heard it countless times before: pipelining is to a CPU as the assembly line is to a car plant. A CPU's pipeline is not a physical pipe that data goes into and appears at the end of, instead it is a collection of "things to do" in order to execute instructions. Every instruction must go through the same steps, and we call these steps stages.

The stages of a pipeline do things like find out what instruction to execute next, find out what two numbers are going to be added together, find out where to store the result, perform the add, etc...

The most basic CPU pipeline can be divided into 5 stages:

1. Instruction Fetch
2. Decode Instructions
3. Fetch Operands
4. Execute
5. Store to Cache

You'll notice that those five stages are very general in their description, at the same time you could make a longer pipeline with more specific stages:

1. Instruction Fetch 1
2. Instruction Fetch 2
3. Decode 1
4. Decode 2
5. Fetch Operands
6. Dispatch
7. Schedule
8. Execute
9. Store to Cache 1
10. Store to Cache 2

Both pipelines have to accomplish the same task: instructions come in, results go out. The difference is that each of the five stages of the first pipeline must do more work than each of the ten stages of the second pipeline.

If all else were the same, you'd want a 5-stage pipeline like the first case, simply because it's easier to fill 5 stages with data than it is to fill 10. And if your pipeline is not constantly full of data, you're losing precious execution power - meaning your CPU isn't running as efficiently as it could.

The only reason you would want the second pipeline is if, by making each stage simpler, you can get the time it takes to complete each stage to be significantly quicker than in the previous design. Your slowest (most complicated) stage determines how quickly you can get data through each stage - keep that in mind.

Let's say that the first pipeline results in each stage taking 1ns to complete and if each stage takes 1 clock cycle to execute, we can build a 1GHz processor (1/1ns = 1GHz) using this pipeline. Now in order to make up for the fact that we have more stages (and thus have more of a difficult time keeping the pipeline full), the second design must have a significantly shorter clock period (the amount of time each stage takes to complete) in order to offer equal/greater performance to the first design. Thankfully, since we're doing less work per clock - we can reduce the clock period significantly. Assuming that we've done our design homework well, let's say we get the clock period down to 0.5ns for the second design.

Design 2 can now scale to 2GHz, twice the clock speed of the original CPU and we will get twice the performance - assuming we can keep the pipeline filled at all times. Reality sets in and it becomes clear that without some fancy footwork, we can't keep that pipeline full all the time - and all of the sudden our 2GHz CPU isn't performing twice as fast as our 1GHz part.

Make sense? Now let's relate this to the topic at hand.

Index 31 Stages: What’s this, Baskin Robbins?
Comments Locked

104 Comments

View All Comments

  • INTC - Monday, February 2, 2004 - link

    Ummmm yea, kinda reminds me of cooking an egg on an Athlon XP http://www.biggaybear.co.uk/Menu/Aegg/Aeggs.html
  • cliffa3 - Monday, February 2, 2004 - link

    something good to include on the mb compatibility article would be what boards would house the 2.8/533...i'm wondering myself if the E7205 chipset would...i have a p4g8x, and it would be a welcome upgrade with HT and all the other goodies if it oc's well.
  • Stlr22 - Monday, February 2, 2004 - link

    They didn't burn down, but the proc were running hot. Not to mention, these are the FIRST releases in the Prescott line. What's it gonna be like later on?....

    Just think, a P4 based computer that turns your living room into your very own Sauna!!....WHOOO-HOOO!!.....now that's what I call a bargain!


  • INTC - Monday, February 2, 2004 - link

    The message is clear: Anandtech and all of the other review sites didn't burn down so I guess it's not a flame thrower.

    Prescott is not as fast as I had hoped but is definitely not the step backwards as some were rumoring it to be. I think a Prescott 2.8 @ 250 MHz FSB will be really nice to play with until I see what Intel announces at IDF in a few weeks.
  • Icewind - Monday, February 2, 2004 - link

    The message is clear: Im buying an Athlon 64.
  • Vanners - Sunday, February 1, 2004 - link

    Did anyone catch the error in Pipelining: 101?

    if you halve the time for a stage in the pipeline and double the number of stages. Yes this means you can run at 2GHz instead of 1GHz but the reality is you're still taking 5ns to complete the pipe.

    Look at it like a motorbike: You drop down a gear and rev harder; you make more noise but you are still doing the same speed.
    The only reasons to drop down a gear are to break through your gears (i.e. slow down) or to rev significantly higher than the change in gear ratio in order to move faster (with more torque).

    The trouble Intel has is that they drop down a gear then rev 6 months to a year later.
  • kamper - Sunday, February 1, 2004 - link

    Just curious, Anand or Derek: what board did you use to get the 3.72 GHz oc? Obviously it wasn't the intel board used in the benches. I guess we'll hear all about this in the compatibility review though :)

    keep up the good work, that last point about smaller margins at higher clockspeeds (vs. Northwood) was cool. Let's just hope the pattern continues.
  • Stlr22 - Sunday, February 1, 2004 - link

    Seems to me like people either got cought up in some of the hype and expected to much or some people expected to little and that history would repeat itself (Willamette vs Palomino)

    The fact that the Prescott fared much better in it's launch compared to the Willamette might be a hint to not underestimate it. Prescott isn't really looking bad now, and I think it will hit stride faster then the Willamette core did.

    The next couple of years are gonna be really interesting.

    Damn, ya just gotta love it!
  • ntrights - Sunday, February 1, 2004 - link

    Great review!
  • KF - Sunday, February 1, 2004 - link

    I've grown to appreciate CRAMITPAL. If you read around the opinionated diatribes, he has some good stuff that people avoid saying for fear of retaliation. I suppose if I were in love with Intel, he would tick me off.

    But, it does look like Intel has created a CPU that should ramp up to speeds high enough to beat the A64 in 32bit mode, and that is all they needed to do.

    Regardless of how much heat that is going to take, Intel must have some way in the works to handle it.

    Looks like they might not charge an arm and leg for it, which is the biggest shock.

Log in

Don't have an account? Sign up now