Execution Core Improvements

Intel lengthened the pipeline on Prescott but they did not give the CPU any new execution units; so basically the chip can run faster to crunch more data, but at the same speeds there are no enhancements to work any faster.

Despite the lack of any new execution units (this is nothing to complain about, remember the Athlon 64 has the same number of execution units as the Athlon XP), Intel did make two very important changes to the Prescott core that were made possible because of the move to 90nm.

Both of these changes can positively impact integer multiply operations; with one being a bit more positive than the other. Let us explain:

The Pentium 4 has three Arithmetic and Logic Units (ALUs) that handle integer code (code that operates on integer values - the vast majority of code you run on your PC). Two of these ALUs can crank out operations twice every clock cycle, and thus Intel marketing calls them "double pumped" and says that they operate at twice the CPU's clock speed. These ALUs are used for simple instructions that are easily executed within 1/2 of a clock cycle, this helps the Pentium 4 reach very high clock speeds (the doing less work per cycle principle).

More complicated instructions are sent to a separate ALU that runs at the core frequency, so that instead of complex instructions slowing down the entire CPU, the Pentium 4 can run at its high clock speeds without being bogged down by these complex instructions.

Before Prescott, one type of operation that would run on the slow ALU was a shift/rotate. One place where shifts are used is when multiplying by 2; if you want to multiply a number in binary by 2 you can simply shift the bits of the number to the left by 1 bit - the resulting value is the original number multiplied by 2.

In Prescott, a shift/rotate block has been added to one of the fast ALUs so that simple shifts/rotates may execute quickly.

The next improvement comes with actual integer multiplies; before Prescott, all integer multiplies were actually done on the floating point multiply unit and then sent back to the ALUs. Intel finally included a dedicated integer multiplier in Prescott, thanks to the ability to cram more 90nm transistors into a die size smaller than before. The inclusion of a dedicated integer multiplier is the cause of Prescott's "reduced integer multiply" claim.

Integer multiplies are quite common in all types of code, especially where array traversal is involved.

An Impatient Prescott: Scheduler Improvements Larger, Slower Cache
Comments Locked

104 Comments

View All Comments

  • Jeff7181 - Sunday, February 1, 2004 - link

    I'm going to go out on a limb here and say 2004 is the year of the Athlon-64 and Intel will take a back seat this year unless their new socket will help increase clock speeds. When AMD makes the transition to 90nm I think you'll see a jump in clock speed from them too... and I'm willing to bet their current 130nm processors will scale to 2.6 or 2.8 Ghz if they want to put the effort into it before switching to 90nm.

    Intel better hope people adopt SSE3 in favor of AMD-64 otherwise they're going to lose the majority of the benchmark tests.

    On second thought... the real question is how high will Prescott scale... will we really see 4.0 Ghz by the end of the year? Will performance scale as well as it does with the Athlon-64?

    Right now, looking at the Prescott, the best I can say for it is "huh, 31 stages in the pipeline and they didn't lose too much performance, neat."
  • Barkuti - Sunday, February 1, 2004 - link

    Check out the article at xbitlabs:

    http://www.xbitlabs.com/articles/cpu/display/presc...

    Less technical but with a wider set of tests.
  • Stlr22 - Sunday, February 1, 2004 - link

    ;-)
  • Stlr22 - Sunday, February 1, 2004 - link

    ((((((((((((((CRAMITPAL))))))))))))))))

    Listen,I just want you to know that everything will be alright. Really, life isn't all that bad buddy. It's not good to keep so much hate inside. It's very unhealthy. We are all family here at the Anandtech forums and we care about you. If you ever need to sit down and talk, I'm ll ears pal. So that your brother doesn't feel left out, here's a hug for him aswell.......


    (((((((((((((AMDjihad)))))))))))))
  • KF - Sunday, February 1, 2004 - link

    Yeah, the Inquirer was right about 30 stages. Maybe I should start reading it! However I did read the one where the news linked to an article purporting that an Inquirer reporter had bumped into a person who had overheard an Intel executive say Prescott was 64 bit. Maybe Derek and Anand didn't have the space to squeeze that tiny detail into the review.

    I saw a paper on the Intel site a while ago, seemingly intended for some professional jounal, the premise of which was that it is ALWAYS preferable to make the pipeline longer, no matter how long, while using techniques to reduce the penalties. Like, 100 stages would be a good thing. Right then I knew what one team at Intel was up to. The fact that they didn't explain any new penalty reduction techniques only made it all the more sure what Intel had in the works (otherwise why write the paper?), and that they had the techniques worked out, but still under wraps.
  • ianwhthse - Sunday, February 1, 2004 - link

    Err.. *Cramitpal

    Sorry about that. My mind is wandering.
  • ianwhthse - Sunday, February 1, 2004 - link

    Did we actually just get 26 good posts in before crumpet showed up?
  • FiberOptik - Sunday, February 1, 2004 - link

    I like the part about the new shift/rotate unit on the CPU. Does this mean that prescott will be noticeably faster for the RC5 project? Athlon's usually mop the floor with whatever the Northwood can pump out.
  • eBauer - Sunday, February 1, 2004 - link

    "Botmatch has bots (AI) playing, shooting, running, etc. (deathmatch) while Flyby does not. The number that you should be most interested in is the Botmatch scores."

    No, I am talking about the botmatch scores from previous articles. Well aware of the difference between flyby and botmatch. http://www.anandtech.com/cpu/showdoc.html?i=1946&a... In that article, all CPU's had about 10 more fps than the CPU's in the prescott article.




  • AnonymouseUser - Sunday, February 1, 2004 - link

    "I am curious as to why the UT2k3 botmatch scores dropped on all CPU's... Different map?"

    Botmatch has bots (AI) playing, shooting, running, etc. (deathmatch) while Flyby does not. The number that you should be most interested in is the Botmatch scores.

Log in

Don't have an account? Sign up now