Prescott's New Crystal Ball: Branch Predictor Improvements

We’ve said it before: before you can build a longer pipeline or add more execution units, you need a powerful branch predictor. The branch predictor (more specifically, its accuracy), will determine how many operations you can have working their way through the CPU until you hit a stall. Intel extended the basic Integer pipeline by 11 stages, so they need to make corresponding increases in the accuracy of Prescott’s branch predictor otherwise performance will inevitably tank.

Intel admits that the majority of the branch predictor unit remains unchanged in Prescott, but there have been some key modifications to help balance performance.

For those of you that aren’t familiar with the term, the role of a branch predictor in a processor is to predict the path code will take. If you’ve ever written code before, it boils down to being able to predict which part of a conditional statement (if-then, loops, etc…) will be taken. Present day branch predictors work on a simple principle; if branches were taken in the past, it is likely that they will be taken in the future. So the purpose of a branch predictor is to keep track of the code being executed on the CPU, and increment counters that keep track of how often branches at particular addresses were taken. Once enough data has accumulated in these counters, the branch predictor will then be able to predict branches as taken or not taken with relatively high accuracy, assuming they are given enough room to store all of this data.

One way of improving the accuracy of a branch predictor, as you may guess, is to give the unit more space to keep track of previously taken (or not taken) branches. AMD improved the accuracy of their branch predictor in the Opteron by increasing the amount of space available to store branch data, Intel has not chosen to do so with Prescott. Prescott’s Branch Target Buffer remains unchanged at 4K entries and it doesn’t look like Intel has increased the size of the Global History Counter either. Instead, Intel focused on tuning the efficiency of their branch predictor using less die-space-consuming methods.

Loops are very common in code, they are useful for zeroing data structures, printing characters or are simply a part of a larger algorithm. Although you may not think of them as branches, loops are inherently filled with branches – before you start a loop and every iteration of the loop, you must find out whether you should continue executing the loop. Luckily, these types of branches are relatively easy to predict; you could generally assume that if the outcome of a branch took you to an earlier point in the code (called a backwards branch), that you were dealing with a loop and the branch predictor should predict taken.

As you would expect, not all backwards branches should be taken – not all of them are at the end of a loop. Backwards branches that aren’t loop ending branches are sometimes the result of error handling in code, if an error is generated then you should back up and start over again. But if there’s no error generated in the application, then the prediction should be not-taken, but how do you specify this while keeping hardware simple?

Code Fragment A

Line 10: while (i < 10) do
Line 11: A;
Line 12: B;
Line 13: increment i;
Line 14: if i is still < 10, then go back to Line 11

Code Fragment B

Line 10: A;
Line 11: B;
Line 12: C;
...
Line 80: if (error) then go back to Line 11

Line 14 is a backwards branch at the end of a loop - should be taken!
Line 80 is a backwards branch not at the end of a loop - should not be taken!
Example of the two types of backwards branching

It turns out that loop ending branches and these error branches, both backwards branches, differentiate themselves from one another by the amount of code that separates the branch from its target. Loops are generally small, and thus only a handful of instructions will separate the branch from its target; error handling branches generally instruct the CPU to go back many more lines of code. The depiction below should illustrate this a bit better:

Prescott includes a new algorithm that looks at how far the branch target is from the actual branch instruction, and better determines whether or not to take the branch. These enhancements are for static branch prediction, which looks at certain scenarios and always makes the same prediction when those scenarios occur. Prescott also includes improvements to its dynamic branch prediction.

31 Stages: What’s this, Baskin Robbins? Prescott's Crystal Ball (continued)
Comments Locked

104 Comments

View All Comments

  • INTC - Monday, February 2, 2004 - link

    Ummmm yea, kinda reminds me of cooking an egg on an Athlon XP http://www.biggaybear.co.uk/Menu/Aegg/Aeggs.html
  • cliffa3 - Monday, February 2, 2004 - link

    something good to include on the mb compatibility article would be what boards would house the 2.8/533...i'm wondering myself if the E7205 chipset would...i have a p4g8x, and it would be a welcome upgrade with HT and all the other goodies if it oc's well.
  • Stlr22 - Monday, February 2, 2004 - link

    They didn't burn down, but the proc were running hot. Not to mention, these are the FIRST releases in the Prescott line. What's it gonna be like later on?....

    Just think, a P4 based computer that turns your living room into your very own Sauna!!....WHOOO-HOOO!!.....now that's what I call a bargain!


  • INTC - Monday, February 2, 2004 - link

    The message is clear: Anandtech and all of the other review sites didn't burn down so I guess it's not a flame thrower.

    Prescott is not as fast as I had hoped but is definitely not the step backwards as some were rumoring it to be. I think a Prescott 2.8 @ 250 MHz FSB will be really nice to play with until I see what Intel announces at IDF in a few weeks.
  • Icewind - Monday, February 2, 2004 - link

    The message is clear: Im buying an Athlon 64.
  • Vanners - Sunday, February 1, 2004 - link

    Did anyone catch the error in Pipelining: 101?

    if you halve the time for a stage in the pipeline and double the number of stages. Yes this means you can run at 2GHz instead of 1GHz but the reality is you're still taking 5ns to complete the pipe.

    Look at it like a motorbike: You drop down a gear and rev harder; you make more noise but you are still doing the same speed.
    The only reasons to drop down a gear are to break through your gears (i.e. slow down) or to rev significantly higher than the change in gear ratio in order to move faster (with more torque).

    The trouble Intel has is that they drop down a gear then rev 6 months to a year later.
  • kamper - Sunday, February 1, 2004 - link

    Just curious, Anand or Derek: what board did you use to get the 3.72 GHz oc? Obviously it wasn't the intel board used in the benches. I guess we'll hear all about this in the compatibility review though :)

    keep up the good work, that last point about smaller margins at higher clockspeeds (vs. Northwood) was cool. Let's just hope the pattern continues.
  • Stlr22 - Sunday, February 1, 2004 - link

    Seems to me like people either got cought up in some of the hype and expected to much or some people expected to little and that history would repeat itself (Willamette vs Palomino)

    The fact that the Prescott fared much better in it's launch compared to the Willamette might be a hint to not underestimate it. Prescott isn't really looking bad now, and I think it will hit stride faster then the Willamette core did.

    The next couple of years are gonna be really interesting.

    Damn, ya just gotta love it!
  • ntrights - Sunday, February 1, 2004 - link

    Great review!
  • KF - Sunday, February 1, 2004 - link

    I've grown to appreciate CRAMITPAL. If you read around the opinionated diatribes, he has some good stuff that people avoid saying for fear of retaliation. I suppose if I were in love with Intel, he would tick me off.

    But, it does look like Intel has created a CPU that should ramp up to speeds high enough to beat the A64 in 32bit mode, and that is all they needed to do.

    Regardless of how much heat that is going to take, Intel must have some way in the works to handle it.

    Looks like they might not charge an arm and leg for it, which is the biggest shock.

Log in

Don't have an account? Sign up now