An Impatient Prescott: Scheduler Improvements

Prescott can’t keep any more operations in flight (active in the pipeline) than Northwood, but because of the longer pipeline Prescott must work even harder to make sure that it is filled.

We just finished discussing branch predictors and their importance in determining how deep of a pipeline you can have, but another contributor to the equation is a CPU’s scheduling windows.

Let’s say you’ve got a program that is 3 operations long:

1. D = B + 1
2. A = 3 + D
3. C = A + B

You’ll notice that the 2nd operation can’t execute until the first one is complete, as it depends on the outcome (D) of the first operation. The same is true for the 3rd operation, it can’t execute until it knows what the value of A is. Now let’s say our CPU has 3 ALUs, and in theory could execute three adds simultaneously, if we just had this stream of operations going through the pipeline, we would only be using 1/3 of our total execution power - not the best situation. If we just upgraded from a CPU with 1 ALU, we would be getting the same throughput as our older CPU – and no one wants to hear that.

Luckily, no program is 3 operations long (even print “Hello World” is on the order of 100 operations) so our 3 ALUs should be able to stay busy, right? There is a unit in all modern day CPUs whose job it is to keep execution units, like ALUs, as busy as possible – as much of the time as possible. This is the job of the scheduler.

The scheduler looks at a number of operations being sent to the CPU’s execution cores and attempts to extract the maximum amount of parallelism possible from the operations. It does so by placing pending operations as soon as they make it to the scheduling stage(s) of the pipeline into a buffer or scheduling window. The size of the window determines the amount of parallelism that can be extracted, for example if our CPU’s scheduling window were only 3 operations large then using the above code example we would still only use 1/3 of our ALUs. If we could look at more operations, we could potentially find code that didn’t depend on the values of A, B or D and execute that in parallel while we’re waiting for other operations to complete. Make sense?

Because Intel increased the size of the pipeline on Prescott by such a large amount, the scheduling windows had to be increased a bit. Unfortunately, present microarchitecture design techniques do not allow for very large scheduling windows to be used on high clock speed CPUs – so the improvements here were minimal.

Intel increased the size of the scheduling windows used to buffer operations going to the FP units to coincide with the increase in pipeline.

There is also parallelism that can be extracted out of load and store operations (getting data out of and into memory). Let’s say that you have the following:

A = 1 + 3
Store A at memory location X


Load A from memory location X

The store operation actually happens as two operations (further pipelining by splitting up a store into two operations): a store address operation (where the data is going) and a store data operation (what the data actually is). The problem here is that the scheduler may try to parallelize the store operations and the load operation without realizing that the two are dependent on one another. Once this is discovered, the load will not execute and a performance penalty will be paid because the CPU’s scheduler just wasted time getting a load ready to execute and then having to get rid of it. The load will eventually execute after the store operations have completed, but at a significant performance penalty.

If a situation like the one mentioned above does crop up, long pipeline designs will suffer greatly – meaning that Prescott wants this to happen even less than Northwood. In Prescott, Intel included a small, very accurate, predictor to predict whether a load operation is likely to require data from a soon-to-be-executed store and hold that load until the store has executed. Although the predictor isn’t perfect, it will reduce bubbles of no-execution in the pipeline – a killer to Prescott and all long pipelined architectures.

Don’t look at these enhancements to improve performance, but to help balance the lengthened pipeline. A lot of the improvements we’ll talk about may sound wonderful but you must keep in mind that at this point, Prescott needs these technologies in order to equal the performance of Northwood so don’t get too excited. It’s an uphill battle that must be fought.

Prescott's Crystal Ball (continued) Execution Core Improvements
Comments Locked

104 Comments

View All Comments

  • Stlr22 - Sunday, February 1, 2004 - link

    post*
  • Stlr22 - Sunday, February 1, 2004 - link

    KristopherKubicki

    Earlier you said that I should read the article.
    What was your point? What was it about my first pot that you disagreed with?
  • KristopherKubicki - Sunday, February 1, 2004 - link

    #7:

    I agree 100% with Anand and Derek. This processor will be a non-event until we get in the 3.6GHz range. Similar to Northwood's launch.

    #10:

    Check out our price engine. We have already been listing the processor a week!

    http://www.anandtech.com/guides/priceguide.htm

    http://www.monarchcomputer.com/Merchant2/merchant....

  • cliffa3 - Sunday, February 1, 2004 - link

    In the table on page 14 it shows that the 90nm P4@2.8 will have a 533 MHz FSB, but is that the case? I did some quick google research and can't find anything to support that...please confirm or correct, thanks.
  • NFactor - Sunday, February 1, 2004 - link

    Yes, I must agree this is an amazing article, one of the best i have ever read. Thanks.
  • Xentropy - Sunday, February 1, 2004 - link

    VERY interesting article. Thank you Anand and Derek! One of the best I've read on Anandtech, and I consider yours the best hardware site on the net!

    One correction, on page 7, you say, "if you want to multiply a number in binary by 2 you can simply shift the bits of the number to the right by 1 bit," but don't you mean shift to the left one bit (and place a zero at the end)? It's much like multiplying a decimal number by ten for obvious reasons.

    Anyway, it looks like the Prescott is somewhat of a non-event at this time. Just new cores that perform fundamentally the same as the current ones at current speeds. The real news will come later; Intel has just positioned itself for one hell of a speed ramp to come. Northwood was clearly at the end of the line. One analogy, I suppose, would be that Intel didn't fire any shots in the CPU war today, but they loaded their guns in preparation to fire.

    The coming year will be an exciting one for us hardware geeks. I'm interested in seeing how higher clocked Prescotts play out as well as whether anything 64-bit shows up before 2005 to support AMD's stance that we need it NOW.

    Again, thanks for a very thorough article!
  • Stlr22 - Sunday, February 1, 2004 - link

    KristopherKubicki

    So what's your take on these new Prescotts?
  • KristopherKubicki - Sunday, February 1, 2004 - link

    Anand scolded me for not reading the article :( I only read the conclusion and the graphs. Turns out the decision making isnt as clearcut as it sounds.

    As for the thing with the inquirer. Well, lots of people had prescotts. We had one back in August I believe. The thing is they were horribly slow - 533FSB 2.8GHz. Everyone drew the conclusion that these were purposely slowed processors that were jsut for engineering purposes. While the inq benched this processor, most people didnt just becuase they were under the impression this was not to be the final production model. Hope that clears up some discrepancy about the validity.

    Cheers,

    Kristopher
  • wicktron - Sunday, February 1, 2004 - link

    Hehe, I guess the Inq was right about this one. Where are all the Inq bashers and their claim of "fake" benchies? Haha, I laugh.
  • Stlr22 - Sunday, February 1, 2004 - link

    KristopherKubicki - "read the article..."


    lol that might be a good idea, as I only broswed it and read the conclusion. :D

Log in

Don't have an account? Sign up now