Execution Core Improvements

Intel lengthened the pipeline on Prescott but they did not give the CPU any new execution units; so basically the chip can run faster to crunch more data, but at the same speeds there are no enhancements to work any faster.

Despite the lack of any new execution units (this is nothing to complain about, remember the Athlon 64 has the same number of execution units as the Athlon XP), Intel did make two very important changes to the Prescott core that were made possible because of the move to 90nm.

Both of these changes can positively impact integer multiply operations; with one being a bit more positive than the other. Let us explain:

The Pentium 4 has three Arithmetic and Logic Units (ALUs) that handle integer code (code that operates on integer values - the vast majority of code you run on your PC). Two of these ALUs can crank out operations twice every clock cycle, and thus Intel marketing calls them "double pumped" and says that they operate at twice the CPU's clock speed. These ALUs are used for simple instructions that are easily executed within 1/2 of a clock cycle, this helps the Pentium 4 reach very high clock speeds (the doing less work per cycle principle).

More complicated instructions are sent to a separate ALU that runs at the core frequency, so that instead of complex instructions slowing down the entire CPU, the Pentium 4 can run at its high clock speeds without being bogged down by these complex instructions.

Before Prescott, one type of operation that would run on the slow ALU was a shift/rotate. One place where shifts are used is when multiplying by 2; if you want to multiply a number in binary by 2 you can simply shift the bits of the number to the left by 1 bit - the resulting value is the original number multiplied by 2.

In Prescott, a shift/rotate block has been added to one of the fast ALUs so that simple shifts/rotates may execute quickly.

The next improvement comes with actual integer multiplies; before Prescott, all integer multiplies were actually done on the floating point multiply unit and then sent back to the ALUs. Intel finally included a dedicated integer multiplier in Prescott, thanks to the ability to cram more 90nm transistors into a die size smaller than before. The inclusion of a dedicated integer multiplier is the cause of Prescott's "reduced integer multiply" claim.

Integer multiplies are quite common in all types of code, especially where array traversal is involved.

An Impatient Prescott: Scheduler Improvements Larger, Slower Cache
Comments Locked

104 Comments

View All Comments

  • sprockkets - Monday, February 2, 2004 - link

    Hmmm... on Intel's website on the new processor news: "Thermal Monitoring: Allows motherboards to be cost-effectively designed to expected application power usages rather than theoretical maximums."

    Not sure what it means. I'm thinking clock throttling so that if your particular chip is hotter than it should be it will run on under engineered motherboards/coolers.

    This chip dissipates around the same heat as Northwoods clock for clock! And of course, Intel style is wait 6-12, then the new stuff will actually be good. Still, is it really that important to increase performance so much that heat becomes an issue? I.E., will Dell be able to make the cooling whisper quiet? They can with the processor sitting at 80-90c, but now that with normal cooling it's almost there, now what will they do? Why can't we just have new processors that run so cool that we can just use heatsinks without fans? Oh well.
  • Novaoblivion - Monday, February 2, 2004 - link

    Great article :) I found it very interesting I dont think I'll be buying a prescott till they hit about 4Ghz. My 2.4C is nice and fast for now.
  • CRAMITPAL - Monday, February 2, 2004 - link


    http://www.theinquirer.net/?article=13927


    http://www.theinquirer.net/?article=13947
  • johnsonx - Monday, February 2, 2004 - link

    To Vanners, #38:

    "if you halve the time for a stage in the pipeline and double the number of stages. Yes this means you can run at 2GHz instead of 1GHz but the reality is you're still taking 5ns to complete the pipe."

    Yes and no... In the example, you're right that a single instruction takes the same 5ns to complete. But you're not just executing a single instruction... rather, thousands to millions! The 10 stage pipe has twice as many instructions in flight as the 5 stage pipe. Therefore in the example, you get one result out of the 5-stage/1Ghz cpu every 1ns, but TWO results out of the 10-stage/2Ghz cpu in the same 1ns... twice as many.

    What I find interesting is that as pipelines get longer and longer, we might have to start talking about Instruction Latency: the number of clocks and ns between the time an instruction goes in and when the result comes out. It'll never be anything a human could notice directly, but it might come into play in high-performance realtime apps that deal with input from the outside world, and have to produce synchronized output. Any CPU calculates somewhat "back-in-time" as instructions fly down the pipe... right now, a Prescott calculates about twice as far behind 'reality' as an A64 does. I don't know if there is any realworld application where this really could make a difference, or if there ever will be, but it's interesting to ponder, particularly if the pipeline lengths of Intel vs. AMD continue to diverge.
  • cliffa3 - Monday, February 2, 2004 - link

    i don't see how a 4+GHz prescott will match up with intel's new pico BTX form factor...with that much heat (using air cooling), you need to keep a safe zone around the proc unless you like your RAM DDR+BBQ.
    I'd have to say that a lot of enthusiasts are younger and live in limited space conditions...might work well for people up north who don't want to run the heater, but as for me in texas, i have all the cool air pumping in to my bedroom and it still takes a lot to keep it cool. Can you imagine a university or corporation having a room full of those?..if they think about that, then it's no bueno for DELL and others as well.
    I'd also have to agree with the others about the heat/power being a major part of the article that was left out...otherwise a tremendous read, thanks for all the effort that goes into these.
  • tfranzese - Monday, February 2, 2004 - link

    But - I need to add - the correction was needed and is welcome. Not trying to pick a bone with the editors.
  • tfranzese - Monday, February 2, 2004 - link

    #55, you read what I read. I'll vouch for you.
  • Icewind - Monday, February 2, 2004 - link

    #55
    Better go back to sleep me thinks :)
  • Spearhawk - Monday, February 2, 2004 - link

    Is it just me (who was extremely tired yesterday) or has the 101 on pipeline part changed since the article was put up?
    I seem to rememeber reading someting about how a 5 staged CPU at 1 Ghz should be exactly as fast as a 2 GHz CPU with 10 stages (all else being equal of course) and that the secret of geting any profit out of going to more stages was to make sure that it couldn't only scale to 2 Ghz but to 3 Ghz or more.
  • Icewind - Monday, February 2, 2004 - link

    I think shuttle owners are SOL with prescott.

Log in

Don't have an account? Sign up now