Intel Pentium 4 1.7GHz: Does the prophecy hold true?by Anand Lal Shimpi on April 23, 2001 2:42 AM EST
- Posted in
Now Showing at 1.7GHz
The Pentium 4 1.7GHz has not changed any from the 1.3, 1.4 and 1.5GHz parts that are currently available. A brief overview of its specs are provided below however since we have covered them all in great depth already we will only provide a quick description of the feature set here. For more information consult our original review of the Pentium 4 here.
Hyper Pipelined Technology – The Pentium 4 features a much longer pipeline than either the Pentium III or the Athlon. This unfortunately means that the Pentium 4 accomplishes less per clock, however it does pave the way for the Pentium 4 to achieve much higher clock speeds. The theory behind this is that the enablement of much higher clock speeds will allow the Pentium 4 to offer a greater performance advantage over its predecessors because being able to do less per clock doesn’t matter if you can hit incredibly high clock speeds. Case in point would be that the Pentium III was only able to reach 1GHz on its 0.18-micron process while the Pentium 4 is currently at 1.7GHz on the same 0.18-micron process. And as you’re about to see, there is a clear performance difference between the two.
Improved Branch Prediction – Obviously with such a long pipeline, it is necessary to have an improved Branch Prediction Unit which the Pentium 4 does boast. The BPU is arguably the most advanced in this sector which is something that has held back the Athlon’s performance somewhat. Luckily, it seems like AMD will also be giving the Athlon an improved BPU in the upcoming Palomino core. In any case, the Pentium 4’s BPU must be solid otherwise the penalties associated with its Hyper Pipelined Architecture would cripple the P4 beyond reparation.
Rapid Execution Engine – Two of the Pentium 4’s ALUs (Arithmetic Logic Units: they handle Integer operations) are double pumped, meaning they transfer twice as much data per clock effectively giving them throughput identical to that of ALUs operating at twice the core frequency. In the case of the 1.7GHz Pentium 4, this means that the ALUs operate as if they were normal ALUs (not double pumped) clocked at 3.4GHz. As we have discovered in the past, this is necessary in order to provide the Pentium 4 with respectable performance when running Integer code. Integer code is generally much more susceptible to mis-predicted branches, the lower latency/higher effective clocked ALUs allow the branch mis-predict penalties associated with the Pentium 4’s extremely long pipeline to be minimized when dealing with integer operations.
12K micro-op trace cache – This special cache replaces and improves upon the traditional L1 instruction cache. The 8-way set associative Execution Trace Cache caches micro-ops after they have been decoded and they are also cached in the predicted path of execution. This helps to hide some of the performance penalties caused by such a long pipeline.
256KB Advanced Transfer Cache – The Pentium 4’s L2 cache subsystem is quite incredible to say the least. Not only does the processor have a 256-bit internal pathway to its L2 cache, it is also able to transfer data from the cache once every clock meaning that it has the highest peak cache bandwidth figures of any processor in its class. At 1.7GHz, the Pentium 4 has a maximum of 54.4GB/s of bandwidth to/from its L2 cache. In comparison a Pentium III at 1.0GHz can only offer 16GB/s of bandwidth for L2 data transfers and similarly an Athlon at 1.33GHz can only offer 10GB/s of peak bandwidth (the Athlon only has a 64-bit datapath to its L2).
Hardware Prefetch – The Pentium 4 is able to predict what data it will need before it is actually requested to get it from main memory and it will fetch it directly into cache, thus when it is requested the data is already in its cache. In the event that the data isn’t needed, this becomes a waste of cache space and also FSB/memory bandwidth. In either case, Hardware Prefetch is a FSB/memory bandwidth hog luckily this next feature of the Pentium 4 architecture helps avoid that being a problem.
Quad Pumped 100MHz FSB + Dual Channel RDRAM – The Pentium 4 has a 100MHz FSB that is quad pumped to offer data bandwidth equivalent to that of a 400MHz FSB, meaning it can transfer at most 3.2GB/s of data to the Pentium 4. This bus runs synchronously with the i850’s (P4 chipset) dual channel RDRAM setup that runs at 400MHz over a 2 x 16-bit wide buses, for a total of 3.2GB/s of peak memory bandwidth. While RDRAM was not necessary on the Pentium III platform, when coupled with the Pentium 4, the bandwidth RDRAM offers is very well appreciated.
SSE2 – The Pentium 4 offers an improvement over the original 70 SSE instructions with its 144 new SSE2 instructions however even under SPEC CPU2000, the performance improvement offered by SSE2 optimizations alone is supposedly around 5%. With SPEC CPU2000 being a highly synthetic benchmark, it is unlikely that SSE2 would translate into any real world performance gains in today’s applications. One thing that isn’t being taken into account here is SSE2’s ability to handle two 64-bit SIMD-Int and SIMD-FP (Single Instruction Multiple Data; click here for an explanation) operations. This ability isn’t being taken advantage of in SPEC CPU2000 and could prove to be one of SSE2’s greatest assets.