Typical branches have one of two options: either don’t take the branch, or go to the target instruction and begin executing code there:

A Typical Branch

...
Line 24: if (a = b)
Line 25:
execute this code;
Line 26: otherwise
Line 27: go to line 406;
...

Most branches have two options - don't take the branch or go to the target and start executing there

There is a third type of branch – called an indirect branch – that complicates predictions a bit more. Instead of telling the CPU where to go if the branch is taken, an indirect branch will tell the CPU to look at an address in a register/main memory that will contain the location of the instruction that the CPU should branch to. An indirect branch predictor, originally introduced in the Pentium M (Banias), has been included in Prescott to predict these types of branches.

An Indirect Branch

...
Line 113: if (z < 2)
Line 114: execute this code;
Line 115: otherwise
Line 116: go to memory location F and retreive the address of where to start executing

...

Conventionally, you predict an indirect branch somewhat haphazardly by telling the CPU to go to where most instructions of the program end up being located. It’s sort of like needing to ask your boss what he wants you to do, but instead of asking just walking into the computer lab because that’s where most of your work ends up being anyways. This method of indirect branch prediction ends up working well for a lot of cases, but not all. Prescott’s indirect branch predictor features algorithms to handle these cases, although the exact details of the algorithms are not publicly available. The fact that the Prescott team borrowed this idea from the Pentium M team is a further testament to the impressive amount of work that went into the Pentium M, and what continues to make it one of Intel’s best designed chips of all time.

Prescott’s indirect branch predictor is almost directly responsible for the 55% decrease in mispredicted branches in the 253.perlbmk SPEC CPU2000 test. Here’s what the test does:

253.perlbmk is a cut-down version of Perl v5.005_03, the popular scripting language. SPEC's version of Perl has had most of OS-specific features removed. In addition to the core Perl interpreter, several third-party modules are used: MD5 v1.7, MHonArc v2.3.3, IO-stringy v1.205, MailTools v1.11, TimeDate v1.08

The reference workload for 253.perlbmk consists of four scripts:

The primary component of the workload is the freeware email-to-HTML converter MHonArc. Email messages are generated from a set of random components and converted to HTML. In addition to MHonArc, which was lightly patched to avoid file I/O, this component also uses several standard modules from the CPAN (Comprehensive Perl Archive Network).

Another script (which also uses the mail generator for convienience) excercises a slightly-modified version of the 'specdiff' script, which is a part of the CPU2000 tool suite.

The third script finds perfect numbers using the standard iterative algorithm. Both native integers and the Math::BigInt module are used.
Finally, the fourth script tests only that the psuedo-random numbers are coming out in the expected order, and does not really contribute very much to the overall runtime.

The training workload is similar, but not identical, to the reference workload. The test workload consists of the non-system-specific parts of the acutal Perl 5.005_03 test harness.

In the case of the mail-based benchmarks, a line with salient characteristics (number of header lines, number of body lines, etc) is output for each message generated.

During processing, MD5 hashes of the contents of output "files" (in memory) are computed and output.

For the perfect number finder, the operating mode (BigInt or native) is output, along with intermediate progress and, of course, the perfect numbers.
Output for the random number check is simply every 1000th random number generated.

As you can see, the performance improvement is in a real-world algorithm. As is commonplace for microprocessor designers to do, Intel measured the effectiveness of Prescott’s branch prediction enhancements in SPEC and came up with an overall reduction in mispredicted branches of about 13%:

Percentage Reduction in Mispredicted Branches for Prescott over Northwood (higher is better)
164.gzip
1.94%
175.vpr
8.33%
176.gcc
17.65%
181.mcf
9.63%
186.crafty
4.17%
197.parser
17.92%
252.eon
11.36%
253.perlbmk
54.84%
254.gap
27.27%
255.vortex
-12.50%
256.bzip2
5.88%
300.twolf
6.82%
Overall
12.78%

The improvements seen above aren’t bad at all, however remember that this sort of a reduction is necessary in order to make up for the fact that we’re now dealing with a 55% longer pipeline with Prescott.

The areas that received the largest improvement (> 10% fewer mispredicted branches) were in 176.gcc, 197.parser, 252.eon, 253.perlbmk and 254.gap. The 176.gcc test is a compiler test, which the Pentium 4 has clearly lagged behind the Athlon 64 in. 197.parser is a word processing test, also an area where the Pentium 4 has done poorly in the past thanks to branch-happy integer code. 252.eon is a ray tracer, and we already know about 253.perlbmk; improvements in 254.gap could have positive ramifications for Prescott’s performance in HPC applications as it simulates performance in math intensive distributed data computation.

The benefit of improvements under the hood like the branch prediction algorithms we’ve discussed here is that they are taken advantage of on present-day software, with no recompiling and no patches. Keep this in mind when we investigate performance later on.

We’ll close this section off with another interesting fact – although Prescott features a lot of new improvements, there are other improvements included in Prescott that were only introduced in later revisions of the Northwood core. Not all Northwood cores are created equal, but all of the enhancements present in the first Hyper Threading enabled Northwoods are also featured in Prescott.

Prescott's New Crystal Ball: Branch Predictor Improvements An Impatient Prescott: Scheduler Improvements
Comments Locked

104 Comments

View All Comments

  • terrywongintra - Monday, February 2, 2004 - link

    anybody benchmark prescott over northwood in entry-server environment? i'm installing 3 servers later by using intel 875p (s875wp1-e) entry server board n p4 2.8, need to decide prescott or northwood to use.
  • sipc660 - Monday, February 2, 2004 - link

    i don't understand why some people are bashing such a good inovation that was long overdue from intel.

    a pc that doubles as a heater and at only 100-200W power consumption.

    Let me remind you that a conventional fan heater eats up a kilowatt/hour of power.

    Think positive

    * space reduction
    * enormous power savings (pc + fan heater)
    * extremly sophisticated looking fan haeter
    * extremly safe casing. reduces burn injuries
    to pets and children.
    * finely tunable temperature settings (only need
    to overclock by small increments)
    * coupled with an lcd it features the best
    looking temperature adjustment one has ever
    witnessed on a heater
    * child proof as it features thermal shutdown
    * anyone having a laugh thus far
    * will soon feature on american idol
    the worst singers will receive one p4 E based
    unit each. That should make people
    think twice about auditioning thus making
    sure only true talent shows up.
    * gives dell new marketing potential and a crack
    at a long desired consumer heating electronic
    * amd is nowhere near this advancement in thermal
    thechnology leaving intel way ahead


    hope you enjoyed some of my thoughts

    Other than that good article and some good comments.

    on another note i don't understand why people run and fill intels pockets so intel can hide their engineering mistakes with unseen propaganda, while there is an obvious choice.

    choice is Advanced Micro Devices all until intel gets their act together.

    go amd...
  • Stlr22 - Monday, February 2, 2004 - link

    INTC - "Intel roadmap says Prescott will hit 4.2 GHz by Q1 '05. My guess is that it is already running at 4 GHz but just needs to be fine tuned to reduce the heat."


    Maybe they are trying to keep it under the 200watt mark? ;-)
  • INTC - Monday, February 2, 2004 - link

    I think CRAMITPAL must have sat on a hot Prescott and got it stuck where the sun doesn't shine - that would explain all of the yelling and screaming and friggin this and friggin that going on. "Approved mobo, approved PC case cooling system, approved heatsink & fan - and you better not use Artic Silver or else it will void your warranty..." gee - didn't we just hear that when Athlon XPs came out? It brings to mind when TechTV put their dual Athlon MP rig together and it started smoking and catching on fire when they fired it up the first time on live television during their show.

    Intel roadmap says Prescott will hit 4.2 GHz by Q1 '05. My guess is that it is already running at 4 GHz but just needs to be fine tuned to reduce the heat. I bet the experts (or self proclaimed experts such as CRAM) were betting that Northwood could not hit 3 GHz and look where it is at today. Video card GPUs today are hitting 70 degrees C plus at full load but they do fine with cooling in the same PC cases.
  • CRAMITPAL - Monday, February 2, 2004 - link

    Dealing with the FLAME THROWER's heat issues is only one aspect of Prescott's problems. The chip is a DOG and it requires an "approved Mobo" and an "approved PC case cooling system", a premo PSU cause the friggin thing draws 100+ Watts and this crap all costs money you don't need to spend on an A64 system that is faster, runs cooler, and does both 32/64 bit processing faster. How difficult is THIS to comprehend???

    Ain't no way Intel is gonna be able to Spin this one despite the obvious "press material" they supplied to all the reviewers to PIMP that Prescott was designed to reach 5 Gigs. Pigs will fly lightyears before Prescott runs at 5 Gigs.

    Time to GET REAL folks. Prescott sucks and every hardware review site politely stated so in "political speak".
  • Stlr22 - Monday, February 2, 2004 - link

    ((((((((((((CRAMITPAL)))))))))))))))




    It's ok man. It's ok. Everything will be alright.


    ;-)
  • scosta - Monday, February 2, 2004 - link

    #38 - About your "Did anyone catch the error in Pipelining: 101?".

    There is no error. The time it takes to travel the pipelane is just a kind of process delay. What matters is the rate at witch finished/processed results come out of the pipeline. In the case of the 0.5ns/10 stage pipelane you will get one finished result every 0.5ns, twice as many as in the case of the 1ns/5 stage pipeline.

    If the pipelines were building motorcycles, you woud get, respectively, 1 and 2 motorcycles every ns. And that is the point.
  • LordSnailz - Monday, February 2, 2004 - link

    I'm sure the prescotts will get hotter as the speed increases but you can't forget there are companies out there that specializes in this area. There are 3 companies that I know of that are doing research on ways to reduce the heat, for instance, they're planning on placing a piece of silicon with etch lines on top of the CPU and run some type of coolant through it. Much like the radiator concept.

    My point is, Intel doesn't have to worry about the heat too much since there are companies out there fighting that battle. Intel will just concentrate on achieving those higher speeds and the temp control solution will come.
  • scosta - Monday, February 2, 2004 - link

    You can find thermal power information in the also excelent "Aces Hardware" Prescot review here:
    [L=myurl]http://www.aceshardware.com/read.jsp?id=60000317[/l]

    In resume, we have the following Typical Thermal Power :
    P4 3.2 GHz (Northwood) - 82W
    P4E 3.2 GHz (Prescot) - 103W

    Note that, at the same clock speed and with the same or lesser performance, the Prescot dissipates 25% more power than Northwood. This means that with a similar cooling system, the Prescot has to run substancially hoter.

    As AcesHardware says,
    [Q]After running a 3DSMax rendering and restarting the PC, the BIOS reported that the 3.2 GHz Northwood was at about 45-47°C, while Prescott was flirting with 64-66°C. Mind you, this is measured on a motherboard completely exposed to the cool air (18°C) of our lab.[/Q]

    So, what will the ~5GHx Prescot dissipate? 200W ?
    Will we all be forced to run PCs with bulky, expensive, etc, criogenic cooling systems?. I for one wont. This power consumption escalation has to stop. Intel and AMD have to improve the performace of their CPUs by improving the CPU archytecture and manufacturing processes, not by trowing more and more electrical power at the problem.

    And those are my 2 cents.
  • CRAMITPAL - Monday, February 2, 2004 - link

    Prescott will never go above 3.8 Gig. even with the 3rd revision of the 90 nano process. Tejas will make it to just over 4.0 Gig. with a little luck but it won't be anything to write home about either based on current knowledge.

    Intel has fallen and can't get it up!

Log in

Don't have an account? Sign up now