CHAPTER 4: The Pentium 4 crash landing

The Prescott failure

The Pentium 4 "Prescott" is, despite its innovative architecture, a failure. Intel expected to scale this Pentium 4 architecture to 5 GHz, and derivatives of this architecture were supposed to come close to 10 GHz. Instead, the Prescott was only able to reach 3.8 GHz after numerous revisions. And even then, the 3.8 GHz is losing up to 115 Watt, and about 35-50% (depending on the source) is lost to leakage power.

The Prescott project failed, but that doesn't mean that the architecture itself was not any good. In fact, the philosophy behind the enhanced Netburst architecture is very innovative and even brilliant. To understand why we state this, let me quickly refresh your memory on the software side of things.

IPC unfriendly software

First, consider that the average code does not allow the CPU to process a lot of instructions in parallel. To give you an idea, we found out that video encoding achieves about 0.6-0.8 instructions per clock cycle (IPC) on modern CPUs. Secondly, note that almost 20% of the instructions are branches, and 50% of them are memory operations. In case of video encoding, you may have less than 10% branches, and about 60% memory operations. Most of the instructions that are not branches or memory operations are additions, or "ADD"s. Some of the memory operations need to make use of the same units that perform the ADD instructions.

You should also know that many algorithms contain calculations, which need the results of a previous one: a dependency. So, you cannot issue the second calculation until the first is done.

Most studies show that realistically, a sophisticated CPU would be able to reach an IPC of a little more than 2, about twice as much as CPUs today.

Up close and personal

Now, take look at the scheme of the Prescott architecture below. Let us see how Prescott solves all the problems mentioned above.


Fig 7. Prescott's architecture.

Click to enlarge.

First of all, you want to make sure that memory operations happen quickly. Therefore, the Prescott doubled the L1 (data only) and L2-cache. It has also two dedicated Address Generation Units, one for stores and one for loads.

Build for 4 GHz and more, accesses to the main RAM are going to be costly in terms of clock pulses (latency), considering that DDR-II 533 runs at a 266 MHz clock. So, Prescott tries to minimize the damage of waiting for cache misses by increasing the big store buffers of Northwood from 24 to 32, and doubling the load request buffers. So, Prescott can have a lot of cache misses simultaneously outstanding . An intelligent hardware prefetcher is another way to avoid slowdowns due to high memory latency.

To battle branch misprediction, the Prescott Branch predictor has been tuned and predicts 10% of the mispredicted branches by Northwood correctly. That results in up to 20% better performance! And of course, the trace cache makes sure that a mispredicted branch does not need to restart the decoding stages. As a result, the misprediction penalty is not 39 stages, but 31 stages. The 8 stages of decoding do not need to happen again because in most cases, the Trace cache has the decoded instruction.


CHAPTER 3: Containing the epidemic problems CHAPTER 4 (con't)
Comments Locked

65 Comments

View All Comments

  • stephenbrooks - Wednesday, February 9, 2005 - link

    #28 - that's interesting. I was thinking myself just a few days ago "I wonder if those wires go the long way on a rectangular grid or do they go diagonally?" Looks like there's still room for improvement.
  • Chuckles - Wednesday, February 9, 2005 - link

    The word comes from Latin. "mono" meaning one, "lithic" meaning stone. So monolithic refers to the fact that it is a single cohesive unit.
    The reason you associate "lithic" with old is only due to the fact that anthropologists use Paleolithic and Neolithic to describe time periods in human history in the Stone Age. The words translate as "old stone" and "new stone" respectively.
    I have seen plenty of monolithic benches around here. Heck, a slab granite countertop qualifies as a monolith.
  • theOracle - Wednesday, February 9, 2005 - link

    Very good article - looks like a university paper with all the references etc! Looking forward to part two.

    Re "monolithic", granted the word doesn't mean old but anything '-lithic' instantly makes me think ancient (think neolithic etc). -lithic means a period in stone use by humans, and a monolith is a (usually ancient) stone monument; I think its fair to say Intel were trying to make the audience think 'old technology'.
  • DavidMcCraw - Wednesday, February 9, 2005 - link

    Great article, but this isn't accurate:

    "Note the word "monolithic", a word with a rather pejorative meaning, which insinuates that the current single core CPUs are based on old technology."

    Neither the dictionary nor technical meanings of monolithic imply 'old technology'. Rather, it simply refers to the fact that the single-core CPU being referred to is as large as the two smaller chips, but is in one part.

    In the context of OS kernel architectures, the Linux kernel is a good example of monolithic technology... but I doubt many people consider it old tech!
  • IceWindius - Wednesday, February 9, 2005 - link

    Even this articles makes my head hurt, so much about CPU's is hard to understand and grasp. I wish I kneow how those CPU engineers do this for a living.

    I wish someone like Arstechinca would make something really built ground up like CPU's for morons so I could start understanding this stuff better.
  • JohanAnandtech - Wednesday, February 9, 2005 - link

    Jason and Anand have promised me (building some pressure ;-) a threaded comment system so I can answer more personally. Until then:

    1. Thanks for all the encouraging comments. It really gives a warm feeling to read them, and it is basically the most important motivation for writing more

    2. Slashbin (27): Typo. just typed with a small period of insanity. Voltage of course, fixed

    3. CSMR: the SPEC numbers of intel are artificially high, as they have been spending more and more time on aggressive compiler optimisations. All other benchmarks clearly show the slowdown.
  • CSMR - Tuesday, February 8, 2005 - link

    Excellent article. Couple of odd things you might want to amend in chapter one: "CPUs run 40 to 60% faster each year" contradicts the previous discussion about slowed CPU speed increases. Also power formula explanation on the same page doesn't really make sense as pointed out by #27.
  • Doormat - Tuesday, February 8, 2005 - link

    Good article. The only real thing I wanted to bring up was something called the "X Consortium". I wrote a paper in my solid state circuit design class a few years ago. Basically instead of having all the interconnects within a chip laid out in a grid-like fashion, it allows them to be diagonal (and thus, a savings of, at most, 29% - for the math impaired it could be at most 1/sqrt(2)). Perhaps the tools arent there or its too patent encumbered. If interconnects are really an issue then they should move to this diagonal interconnect technology. I actually dont think they are a very pressing need right now - leakage current is the most pressing issue. The move to copper interconnects a while ago helped (increased conductivity over aluminum, smaller die sizes mean shorter distances to traverse, typically).

    It will be very interesting to see what IBM does with their Cell chips and SOI (and what clock speed AMD releases their next A64/Opteron chips at since they've teamed with IBM). If indeed these cell chips run at 4GHz and dont have leakage current issues then there is a good chance that issue is mostly remedied (for now at least).
  • slashbinslashbash - Tuesday, February 8, 2005 - link

    " In other words, dissipated power is linear with the e ffective capacitance, activity and frequency. Power increases quadratically with frequency or clock speed." (Page 2)

    Typo there? Frequency can't be both linear and quadratic..... from the equation itself, it looks like voltage is quadratic. (assuming the V is voltage)
  • AnnoyedGrunt - Tuesday, February 8, 2005 - link

    And of course I meant to refer to post 23 above.
    -D!

Log in

Don't have an account? Sign up now