Prescott's New Crystal Ball: Branch Predictor Improvements

We’ve said it before: before you can build a longer pipeline or add more execution units, you need a powerful branch predictor. The branch predictor (more specifically, its accuracy), will determine how many operations you can have working their way through the CPU until you hit a stall. Intel extended the basic Integer pipeline by 11 stages, so they need to make corresponding increases in the accuracy of Prescott’s branch predictor otherwise performance will inevitably tank.

Intel admits that the majority of the branch predictor unit remains unchanged in Prescott, but there have been some key modifications to help balance performance.

For those of you that aren’t familiar with the term, the role of a branch predictor in a processor is to predict the path code will take. If you’ve ever written code before, it boils down to being able to predict which part of a conditional statement (if-then, loops, etc…) will be taken. Present day branch predictors work on a simple principle; if branches were taken in the past, it is likely that they will be taken in the future. So the purpose of a branch predictor is to keep track of the code being executed on the CPU, and increment counters that keep track of how often branches at particular addresses were taken. Once enough data has accumulated in these counters, the branch predictor will then be able to predict branches as taken or not taken with relatively high accuracy, assuming they are given enough room to store all of this data.

One way of improving the accuracy of a branch predictor, as you may guess, is to give the unit more space to keep track of previously taken (or not taken) branches. AMD improved the accuracy of their branch predictor in the Opteron by increasing the amount of space available to store branch data, Intel has not chosen to do so with Prescott. Prescott’s Branch Target Buffer remains unchanged at 4K entries and it doesn’t look like Intel has increased the size of the Global History Counter either. Instead, Intel focused on tuning the efficiency of their branch predictor using less die-space-consuming methods.

Loops are very common in code, they are useful for zeroing data structures, printing characters or are simply a part of a larger algorithm. Although you may not think of them as branches, loops are inherently filled with branches – before you start a loop and every iteration of the loop, you must find out whether you should continue executing the loop. Luckily, these types of branches are relatively easy to predict; you could generally assume that if the outcome of a branch took you to an earlier point in the code (called a backwards branch), that you were dealing with a loop and the branch predictor should predict taken.

As you would expect, not all backwards branches should be taken – not all of them are at the end of a loop. Backwards branches that aren’t loop ending branches are sometimes the result of error handling in code, if an error is generated then you should back up and start over again. But if there’s no error generated in the application, then the prediction should be not-taken, but how do you specify this while keeping hardware simple?

Code Fragment A

Line 10: while (i < 10) do
Line 11: A;
Line 12: B;
Line 13: increment i;
Line 14: if i is still < 10, then go back to Line 11

Code Fragment B

Line 10: A;
Line 11: B;
Line 12: C;
...
Line 80: if (error) then go back to Line 11

Line 14 is a backwards branch at the end of a loop - should be taken!
Line 80 is a backwards branch not at the end of a loop - should not be taken!
Example of the two types of backwards branching

It turns out that loop ending branches and these error branches, both backwards branches, differentiate themselves from one another by the amount of code that separates the branch from its target. Loops are generally small, and thus only a handful of instructions will separate the branch from its target; error handling branches generally instruct the CPU to go back many more lines of code. The depiction below should illustrate this a bit better:

Prescott includes a new algorithm that looks at how far the branch target is from the actual branch instruction, and better determines whether or not to take the branch. These enhancements are for static branch prediction, which looks at certain scenarios and always makes the same prediction when those scenarios occur. Prescott also includes improvements to its dynamic branch prediction.

31 Stages: What’s this, Baskin Robbins? Prescott's Crystal Ball (continued)
Comments Locked

104 Comments

View All Comments

  • ianwhthse - Sunday, February 1, 2004 - link

    *sigh*

    Well, now I know.

    *goes to buy A64*
  • KristopherKubicki - Sunday, February 1, 2004 - link

    read the article...
  • Stlr22 - Sunday, February 1, 2004 - link

    31 stage pipeline?!.....lol..guess those "30 stage pipelne" rumors were true.

    These processors aren't bad at all. They performed on the same level as the Northwood versions. They just aren't worth the "premium" price tag that they will carry for now.

    Looks like there wont be a better time to grab a Northwwod,
    as I'm sure these puppies will keep dropping in price to make room for the Prescotts.
  • Thatguy97 - Wednesday, April 29, 2015 - link

    lol never even made to 4ghz man you guys did not give intel the crap it deserved

Log in

Don't have an account? Sign up now