Intel's NetBurst Architecture - The Pentium 4's innards get a nameby Anand Lal Shimpi on August 20, 2000 9:39 PM EST
- Posted in
Hyper Pipelined Technology
The NetBurst architecture's first feature is what Intel is calling its Hyper Pipelined Technology, which is a fancy term for the 20 stage pipeline that the Pentium 4 has. This 20 stage pipeline is twice as long as the 10 stage P6 pipeline that the Pentium III featured and four times as long as the P5's five stage pipeline. A longer pipeline, as we've explained before, has its pros and cons.
The 20-stage pipeline on the Pentium 4 is what allows it to hit higher clock speeds right off the bat without requiring a die shrink. It is for this reason that the Pentium 4 will debut at speeds of 1.4GHz and higher (we will talk more about clock speed in a bit). Before you let that number impress you too much, you have to realize that the 20-stage pipeline of the Pentium 4 also yields what is called a lower amount of Instructions Per Clock (IPC). A lower IPC basically means that you get less accomplished in a given amount of time when compared to a processor that has a higher IPC - pretty simple right?
Well, there are a number of ways you make up for a lower IPC; one of the most obvious is to simply increase the clock speed, which Intel is definitely doing in this case. There isn't a doubt that on any of the current benchmarks, if a 1GHz Pentium III were put up against a hypothetical 1GHz Pentium 4, the Pentium III would win because it can do more per clock than the Pentium 4.
By the time the Pentium 4 hits the streets, the fastest Pentium III will most likely still be the 1.13GHz part we reviewed not too long ago, and with the P4 debuting with at least two speed grades (1.4GHz and above is Intel's official statement, but also remember that we saw a 1.5GHz Pentium 4 in February) there should be a performance delta between the two upon its launch.
Modern day CPUs attempt to increase the efficiency of their pipelines by predicting what they will be asked to do next. This is a simplified explanation of the term Branch Tree Prediction. When a processor predicts correctly, everything goes according to plan but when an incorrect prediction is made, the processing cycle must start all over at the beginning of the pipeline. Because of this, a processor with a 10 stage pipeline has a lower penalty for a mis-predicted branch than that of a processor with a 20 stage pipeline. The longer the pipeline, the further back in the process you have to start over in order to make up for a mis-predicted branch. The second problem presented with a longer pipeline is that the penalties for a mis-predicted branch are much greater than in a shorter pipeline.
In order to navigate around these problems, Intel's NetBurst architecture has a few features that help to lessen the burden of having a longer pipeline.