Prescott's New Crystal Ball: Branch Predictor Improvements

We’ve said it before: before you can build a longer pipeline or add more execution units, you need a powerful branch predictor. The branch predictor (more specifically, its accuracy), will determine how many operations you can have working their way through the CPU until you hit a stall. Intel extended the basic Integer pipeline by 11 stages, so they need to make corresponding increases in the accuracy of Prescott’s branch predictor otherwise performance will inevitably tank.

Intel admits that the majority of the branch predictor unit remains unchanged in Prescott, but there have been some key modifications to help balance performance.

For those of you that aren’t familiar with the term, the role of a branch predictor in a processor is to predict the path code will take. If you’ve ever written code before, it boils down to being able to predict which part of a conditional statement (if-then, loops, etc…) will be taken. Present day branch predictors work on a simple principle; if branches were taken in the past, it is likely that they will be taken in the future. So the purpose of a branch predictor is to keep track of the code being executed on the CPU, and increment counters that keep track of how often branches at particular addresses were taken. Once enough data has accumulated in these counters, the branch predictor will then be able to predict branches as taken or not taken with relatively high accuracy, assuming they are given enough room to store all of this data.

One way of improving the accuracy of a branch predictor, as you may guess, is to give the unit more space to keep track of previously taken (or not taken) branches. AMD improved the accuracy of their branch predictor in the Opteron by increasing the amount of space available to store branch data, Intel has not chosen to do so with Prescott. Prescott’s Branch Target Buffer remains unchanged at 4K entries and it doesn’t look like Intel has increased the size of the Global History Counter either. Instead, Intel focused on tuning the efficiency of their branch predictor using less die-space-consuming methods.

Loops are very common in code, they are useful for zeroing data structures, printing characters or are simply a part of a larger algorithm. Although you may not think of them as branches, loops are inherently filled with branches – before you start a loop and every iteration of the loop, you must find out whether you should continue executing the loop. Luckily, these types of branches are relatively easy to predict; you could generally assume that if the outcome of a branch took you to an earlier point in the code (called a backwards branch), that you were dealing with a loop and the branch predictor should predict taken.

As you would expect, not all backwards branches should be taken – not all of them are at the end of a loop. Backwards branches that aren’t loop ending branches are sometimes the result of error handling in code, if an error is generated then you should back up and start over again. But if there’s no error generated in the application, then the prediction should be not-taken, but how do you specify this while keeping hardware simple?

Code Fragment A

Line 10: while (i < 10) do
Line 11: A;
Line 12: B;
Line 13: increment i;
Line 14: if i is still < 10, then go back to Line 11

Code Fragment B

Line 10: A;
Line 11: B;
Line 12: C;
...
Line 80: if (error) then go back to Line 11

Line 14 is a backwards branch at the end of a loop - should be taken!
Line 80 is a backwards branch not at the end of a loop - should not be taken!
Example of the two types of backwards branching

It turns out that loop ending branches and these error branches, both backwards branches, differentiate themselves from one another by the amount of code that separates the branch from its target. Loops are generally small, and thus only a handful of instructions will separate the branch from its target; error handling branches generally instruct the CPU to go back many more lines of code. The depiction below should illustrate this a bit better:

Prescott includes a new algorithm that looks at how far the branch target is from the actual branch instruction, and better determines whether or not to take the branch. These enhancements are for static branch prediction, which looks at certain scenarios and always makes the same prediction when those scenarios occur. Prescott also includes improvements to its dynamic branch prediction.

31 Stages: What’s this, Baskin Robbins? Prescott's Crystal Ball (continued)
Comments Locked

104 Comments

View All Comments

  • terrywongintra - Monday, February 2, 2004 - link

    anybody benchmark prescott over northwood in entry-server environment? i'm installing 3 servers later by using intel 875p (s875wp1-e) entry server board n p4 2.8, need to decide prescott or northwood to use.
  • sipc660 - Monday, February 2, 2004 - link

    i don't understand why some people are bashing such a good inovation that was long overdue from intel.

    a pc that doubles as a heater and at only 100-200W power consumption.

    Let me remind you that a conventional fan heater eats up a kilowatt/hour of power.

    Think positive

    * space reduction
    * enormous power savings (pc + fan heater)
    * extremly sophisticated looking fan haeter
    * extremly safe casing. reduces burn injuries
    to pets and children.
    * finely tunable temperature settings (only need
    to overclock by small increments)
    * coupled with an lcd it features the best
    looking temperature adjustment one has ever
    witnessed on a heater
    * child proof as it features thermal shutdown
    * anyone having a laugh thus far
    * will soon feature on american idol
    the worst singers will receive one p4 E based
    unit each. That should make people
    think twice about auditioning thus making
    sure only true talent shows up.
    * gives dell new marketing potential and a crack
    at a long desired consumer heating electronic
    * amd is nowhere near this advancement in thermal
    thechnology leaving intel way ahead


    hope you enjoyed some of my thoughts

    Other than that good article and some good comments.

    on another note i don't understand why people run and fill intels pockets so intel can hide their engineering mistakes with unseen propaganda, while there is an obvious choice.

    choice is Advanced Micro Devices all until intel gets their act together.

    go amd...
  • Stlr22 - Monday, February 2, 2004 - link

    INTC - "Intel roadmap says Prescott will hit 4.2 GHz by Q1 '05. My guess is that it is already running at 4 GHz but just needs to be fine tuned to reduce the heat."


    Maybe they are trying to keep it under the 200watt mark? ;-)
  • INTC - Monday, February 2, 2004 - link

    I think CRAMITPAL must have sat on a hot Prescott and got it stuck where the sun doesn't shine - that would explain all of the yelling and screaming and friggin this and friggin that going on. "Approved mobo, approved PC case cooling system, approved heatsink & fan - and you better not use Artic Silver or else it will void your warranty..." gee - didn't we just hear that when Athlon XPs came out? It brings to mind when TechTV put their dual Athlon MP rig together and it started smoking and catching on fire when they fired it up the first time on live television during their show.

    Intel roadmap says Prescott will hit 4.2 GHz by Q1 '05. My guess is that it is already running at 4 GHz but just needs to be fine tuned to reduce the heat. I bet the experts (or self proclaimed experts such as CRAM) were betting that Northwood could not hit 3 GHz and look where it is at today. Video card GPUs today are hitting 70 degrees C plus at full load but they do fine with cooling in the same PC cases.
  • CRAMITPAL - Monday, February 2, 2004 - link

    Dealing with the FLAME THROWER's heat issues is only one aspect of Prescott's problems. The chip is a DOG and it requires an "approved Mobo" and an "approved PC case cooling system", a premo PSU cause the friggin thing draws 100+ Watts and this crap all costs money you don't need to spend on an A64 system that is faster, runs cooler, and does both 32/64 bit processing faster. How difficult is THIS to comprehend???

    Ain't no way Intel is gonna be able to Spin this one despite the obvious "press material" they supplied to all the reviewers to PIMP that Prescott was designed to reach 5 Gigs. Pigs will fly lightyears before Prescott runs at 5 Gigs.

    Time to GET REAL folks. Prescott sucks and every hardware review site politely stated so in "political speak".
  • Stlr22 - Monday, February 2, 2004 - link

    ((((((((((((CRAMITPAL)))))))))))))))




    It's ok man. It's ok. Everything will be alright.


    ;-)
  • scosta - Monday, February 2, 2004 - link

    #38 - About your "Did anyone catch the error in Pipelining: 101?".

    There is no error. The time it takes to travel the pipelane is just a kind of process delay. What matters is the rate at witch finished/processed results come out of the pipeline. In the case of the 0.5ns/10 stage pipelane you will get one finished result every 0.5ns, twice as many as in the case of the 1ns/5 stage pipeline.

    If the pipelines were building motorcycles, you woud get, respectively, 1 and 2 motorcycles every ns. And that is the point.
  • LordSnailz - Monday, February 2, 2004 - link

    I'm sure the prescotts will get hotter as the speed increases but you can't forget there are companies out there that specializes in this area. There are 3 companies that I know of that are doing research on ways to reduce the heat, for instance, they're planning on placing a piece of silicon with etch lines on top of the CPU and run some type of coolant through it. Much like the radiator concept.

    My point is, Intel doesn't have to worry about the heat too much since there are companies out there fighting that battle. Intel will just concentrate on achieving those higher speeds and the temp control solution will come.
  • scosta - Monday, February 2, 2004 - link

    You can find thermal power information in the also excelent "Aces Hardware" Prescot review here:
    [L=myurl]http://www.aceshardware.com/read.jsp?id=60000317[/l]

    In resume, we have the following Typical Thermal Power :
    P4 3.2 GHz (Northwood) - 82W
    P4E 3.2 GHz (Prescot) - 103W

    Note that, at the same clock speed and with the same or lesser performance, the Prescot dissipates 25% more power than Northwood. This means that with a similar cooling system, the Prescot has to run substancially hoter.

    As AcesHardware says,
    [Q]After running a 3DSMax rendering and restarting the PC, the BIOS reported that the 3.2 GHz Northwood was at about 45-47°C, while Prescott was flirting with 64-66°C. Mind you, this is measured on a motherboard completely exposed to the cool air (18°C) of our lab.[/Q]

    So, what will the ~5GHx Prescot dissipate? 200W ?
    Will we all be forced to run PCs with bulky, expensive, etc, criogenic cooling systems?. I for one wont. This power consumption escalation has to stop. Intel and AMD have to improve the performace of their CPUs by improving the CPU archytecture and manufacturing processes, not by trowing more and more electrical power at the problem.

    And those are my 2 cents.
  • CRAMITPAL - Monday, February 2, 2004 - link

    Prescott will never go above 3.8 Gig. even with the 3rd revision of the 90 nano process. Tejas will make it to just over 4.0 Gig. with a little luck but it won't be anything to write home about either based on current knowledge.

    Intel has fallen and can't get it up!

Log in

Don't have an account? Sign up now