Rapid Execution Engine

Just two hours into his presentation at the Spring 2000 IDF we were floored when Albert Yu mentioned that the Pentium 4’s Arithmetic Logic Units (it has 2) would operate at twice the operating frequency of the CPU.  Meaning that for the 1.5GHz Pentium 4 that Mr. Yu was demonstrating, the integer units would be running at 3GHz.  At a first look this would seem to indicate that the Pentium 4 and all other NetBurst based processors would have the absolute highest level of integer performance. 

With the Pentium 4’s integer units running at 3GHz, how could AMD even begin to compete? 

Fortunately for AMD, Intel had to double pump (Intel’s term for the 2x clocking of the units) the ALUs in order to deliver integer performance that was at least equal to that of a lower clocked Pentium III.  Confused?  Let’s take a look at the Hyper Pipelined Technology behind the Pentium 4 again, and this time let’s see how it affects integer performance.

The Pentium 4 has a very advanced branch predictor that can help to avoid any mis-predicted branches that may occur in the later stages of its pipeline.  The Pentium 4’s branch predictor is actually much more advanced than the Athlon’s, unfortunately regardless of how advanced it is, you can’t predict something that is generally unpredictable.  This is the case when it comes to integer instructions. 

The nature of integer instructions is that predicting branches when dealing with these type of operations is quite difficult.  In many cases, when dealing with these integer instructions as you would when running many business/office level applications, the Pentium 4’s branch predictor will mis-predict a branch sending the instructions back to the start of the 20 stage pipeline.  This penalty is huge compared to what it would be on the Pentium III since it only has a 10-stage pipeline.

Because of this, Intel has been playing down the necessity for high integer performance.  If you recall, this is actually the second time they have done this, the first was with the original Celeron which was cacheless and thus performed quite poorly in most integer applications. 

While they are correct in stating that performance under Microsoft Word is much less critical than performance under 3D Studio MAX or Quake III Arena for example (since the limiting factor becomes how quickly the user can input data), if you remember from the days of the original Celeron, the business/office user community was quite disappointed in the processor because the benchmarks showed lackluster integer performance. 

Since the ALUs are double pumped, as the Pentium 4’s clock speed increases, the integer performance of the processor should begin to distance itself from the Pentium III since for every 100MHz increase in clock speed the ALUs effective operating frequency will increase by 200MHz. 

Apparently other portions of the Pentium 4 are also double pumped, when combined with the double pumped ALUs you can see a clear trend towards achieving lower latencies in certain parts of the CPU. 

Hyper Pipelined Technology The Pentium 4’s Cache
Comments Locked

22 Comments

View All Comments

Log in

Don't have an account? Sign up now