Intel Pentium 4 1.4GHz & 1.5GHzby Anand Lal Shimpi on November 20, 2000 12:54 AM EST
- Posted in
Hyper Pipelined Technology
The problem Intel faced with the P6 micro-architecture was that they were in a situation where they needed to increase the clock speed of their P6 based processors, however without performing another die shrink on the core of the Pentium III they were already hitting the limits of that core. This was evident by the problems Intel encountered when attempting to produce a 1.13GHz Pentium III. Unfortunately, with their 0.13-micron fabrication process still months away from being ready to use on mass produced processors, Intel needs something else to help keep the clock speed up.
We have seen this trend throughout history. The Pentium Classic and the Pentium MMX, both based on the P5 micro-architecture, maxed out at 233MHz in desktop configurations and 266MHz in mobile setups. The Pentium Pro, a P6 processor, ended up reaching a 200MHz ceiling however moving the L2 cache off-die and later followed by a die shrink allowed its successor, the Pentium II (a P6 processor as well), to reach clock speeds as high as 450MHz.
It’s easy to recommend that a CPU manufacturer just employ a die shrink in order to increase clock speed, however it’s something easier said than done. Introducing a new fabrication process is quite expensive (those manufacturing plants cost billions to construct, you can imagine the cost on introducing a new manufacturing process) and it’s often times not a viable solution to the clock speed issue since it takes quite a bit of time to bring a new process up to speed and actually get the yields high enough to make it a profitable move.
This brings up that other method of increasing clock speed that we alluded to before. Instead of shrinking the die, why not make the CPU do less? If you make a CPU do less per clock, it’s able to ramp up to higher overall clock speeds. And theoretically, if you can get the numbers to work out, the sacrifice you make in terms of the amount the CPU can do per clock is more than made up for by the fact that this CPU of yours can now run at much higher clock speeds.
Doing “less” per clock is an oversimplified way of stating that you should increase the number of stages in the processor’s pipeline. The deeper the pipeline, the more stages an instruction must go through before reaching the end of the pipeline, thus you’re accomplishing less per clock. The original Pentium featured a fairly short pipeline by today’s standards, composed of “only” five stages. The Pentium Pro and later on the Pentium II/III made use of the P6 micro-architecture that featured a pipeline that featured twice as many stages, for a total of 10. The NetBurst micro-architecture that the Pentium 4 is based around doubles the length of the pipeline yet again, making it 20 stages deep.
This 20-stage pipeline is what Intel is calling their Hyper Pipelined Technology.
Remember how we mentioned that a redeeming quality of the Pentium 4 may be its ability to ramp up to higher clock speeds? The Hyper Pipelined Technology is how Intel is planning on doing just that.
So if increasing the depth of a processor’s pipeline is an easier way of paving the way for higher clock speeds, why not jump to a 100 stage pipeline right away? Just like most everything in life, there is a pretty big downside to this otherwise beautifully painted picture.
Modern day CPUs attempt to increase the efficiency of their pipelines by predicting what they will be asked to do next, this is a simplified explanation of the term Branch Tree Prediction. When a processor predicts correctly, everything goes according to plan but when an incorrect prediction is made the processing cycle must start all over at the beginning of the pipeline. Because of this, a processor with a 10 stage pipeline has a lower penalty for a mis-predicted branch than that of a processor with a 20 stage pipeline. The longer the pipeline, the further back in the process you have to start over in order to make up for a mis-predicted branch.
This puts Intel in an interesting situation, they have to make the tradeoff of a bigger penalty for a mis-predicted branch in order to reach higher clock speeds. In order to lessen the penalty that must be paid for a mis-predicted branch the next two features of Intel’s NetBurst micro-architecture come into play.