<b>Updated</b> CPU Cheatsheet - Seven Years of Covert CPU Operations
by Jarred Walton on August 28, 2004 9:00 AM EST- Posted in
- CPUs
Celeron, Pentium II and III Processors
I'm going to forego listing the various models of these processors for the time being. If anyone has a real desire to see them listed, feel free to let me know. If you're running one of these processors still, I can feel for you. I still use one at work, and I only upgraded from my P3 back in March. Given the price of upgrading, though - $225 will get you a decent motherboard, 512 MB RAM, and an Athlon XP 2500+ - you really should upgrade if at all possible.
The old Pentium Pro P6 architecture was a 12-stage pipeline, more or less concluding with the Pentium III (more on that later). It had three specialized AGUs, two ALUs - one that handled simple instructions and a second one for the more complex instructions - and one FPU. The FPU also added support for SSE (which AMD lacked until the Athlon XP, but by then Intel was pushing the P4) and MMX - and they were generally faster on these instructions than AMD. That's not too surprising, considering that they created the technologies and AMD had to license them from Intel.
Intel could have certainly stuck with the design for a lot longer, as the last gasp Tualatin core offered pretty competitive performance clock for clock with the Athlon up to 1.4 GHz (the last Pentium III-S). In fact, the later 1.0A to 1.4A Celeron processors were very good overclocking chips, and a 1.1A running on a 133 MHz bus gave pretty decent performance. (I have just such a system powering my Home Theater PC.) Newer and better chipsets could have improved speed further, but Intel cut off the line and focused on pushing the Pentium 4 and NetBurst. This appears now to have been more of a marketing driven decision, although for the most part it can't be said that it was the worst idea ever.
Celeron 2 and Pentium 4 Processors
Pentium 4 and Celeron (Desktop) | |||||||
P4 1.3 | 1300 | Willamette | 256 | 100 | 13.0X | 423 | |
C 1.7 | 1700 | Willamette | 128 | 100 | 17.0X | 478 | |
P4 1.4 | 1400 | Willamette | 256 | 100 | 14.0X | 423 | |
P4 1.4 | 1400 | Willamette | 256 | 100 | 14.0X | 478 | |
C 1.8 | 1800 | Willamette | 128 | 100 | 18.0X | 478 | |
P4 1.5 | 1500 | Willamette | 256 | 100 | 15.0X | 423 | |
P4 1.5 | 1500 | Willamette | 256 | 100 | 15.0X | 478 | |
C 2.0 | 2000 | Northwood | 128 | 100 | 20.0X | 478 | |
P4 1.6 | 1600 | Willamette | 256 | 100 | 16.0X | 423 | |
P4 1.6 | 1600 | Willamette | 256 | 100 | 16.0X | 478 | |
C 2.1 | 2100 | Northwood | 128 | 100 | 21.0X | 478 | |
P4 1.7 | 1700 | Willamette | 256 | 100 | 17.0X | 423 | |
P4 1.7 | 1700 | Willamette | 256 | 100 | 17.0X | 478 | |
C 2.2 | 2200 | Northwood | 128 | 100 | 22.0X | 478 | |
P4 1.6 | 1600 | Northwood | 512 | 100 | 16.0X | 478 | |
C 2.3 | 2300 | Northwood | 128 | 100 | 23.0X | 478 | |
C 2.4 | 2400 | Northwood | 128 | 100 | 24.0X | 478 | |
C 2.5 | 2500 | Northwood | 128 | 100 | 25.0X | 478 | |
P4 1.8 | 1800 | Northwood | 512 | 100 | 18.0X | 478 | |
C 2.6 | 2600 | Northwood | 128 | 100 | 26.0X | 478 | |
C 2.7 | 2700 | Northwood | 128 | 100 | 27.0X | 478 | |
C 2.8 | 2800 | Northwood | 128 | 100 | 28.0X | 478 | |
P4 2.0 | 2000 | Northwood | 512 | 100 | 20.0X | 478 | |
P4 2.2 | 2200 | Northwood | 512 | 100 | 22.0X | 478 | |
C D 320 | 2400 | Prescott | 256 | 133.3 | 18.0X | 478 | |
P4 2.4 | 2400 | Northwood | 512 | 100 | 24.0X | 478 | |
C D 325 | 2533 | Prescott | 256 | 133.3 | 19.0X | 478 | |
C D 325/J | 2533 | Prescott | 256 | 133.3 | 19.0X | T/775 | |
P4 2.26B | 2267 | Northwood | 512 | 133.3 | 17.0X | 478 | |
C D 330 | 2667 | Prescott | 256 | 133.3 | 20.0X | 478 | |
C D 330/J | 2667 | Prescott | 256 | 133.3 | 20.0X | T/775 | |
P4 2.4B | 2400 | Northwood | 512 | 133.3 | 18.0X | 478 | |
P4 2.6 | 2600 | Northwood | 512 | 100 | 26.0X | 478 | |
P4 2.4A* | 2400 | Prescott | 1024 | 133.3 | 18.0X | 478 | |
C D 335 | 2800 | Prescott | 256 | 133.3 | 21.0X | 478 | |
C D 335/J | 2800 | Prescott | 256 | 133.3 | 21.0X | T/775 | |
P4 2.53B | 2533 | Northwood | 512 | 133.3 | 19.0X | 478 | |
C D 340 | 2933 | Prescott | 256 | 133.3 | 22.0X | 478 | |
C D 340/J | 2933 | Prescott | 256 | 133.3 | 22.0X | T/775 | |
P4 2.4C | 2400 | Northwood | 512 | 200 | 12.0X | 478 | |
P4 2.66B | 2667 | Northwood | 512 | 133.3 | 20.0X | 478 | |
P4 2.8B | 2800 | Northwood | 512 | 133.3 | 21.0X | 478 | |
P4 2.6C | 2600 | Northwood | 512 | 200 | 13.0X | 478 | |
P4 2.8A* | 2800 | Prescott | 1024 | 133.3 | 21.0X | 478 | |
P4 2.8E | 2800 | Prescott | 1024 | 200 | 14.0X | 478 | |
P4 520/J | 2800 | Prescott | 1024 | 200 | 14.0X | T/775 | |
P4 3.06B HTT | 3067 | Northwood | 512 | 133.3 | 23.0X | 478 | |
P4 2.8C | 2800 | Northwood | 512 | 200 | 14.0X | 478 | |
P4 3.0E | 3000 | Prescott | 1024 | 200 | 15.0X | 478 | |
P4 530/J | 3000 | Prescott | 1024 | 200 | 15.0X | T/775 | |
P4 3.0C | 3000 | Northwood | 512 | 200 | 15.0X | 478 | |
P4 3.2E | 3200 | Prescott | 1024 | 200 | 16.0X | 478 | |
P4 3.2C | 3200 | Northwood | 512 | 200 | 16.0X | 478 | |
P4 3.4E | 3400 | Prescott | 1024 | 200 | 17.0X | 478 | |
P4 550/J | 3400 | Prescott | 1024 | 200 | 17.0X | T/775 | |
P4 3.4C | 3400 | Northwood | 512 | 200 | 17.0X | 478 | |
P4 560/J | 3600 | Prescott | 1024 | 200 | 18.0X | T/775 | |
P4EE 3.2 | 3200 | Gallatin | 512 | 200 | 16.0X | 478 | 2048 |
P4 540/J | 3800 | Prescott | 1024 | 200 | 19.0X | T/775 | |
P4 570J | 3800 | Prescott | 1024 | 200 | 19.0X | T/775 | |
P4EE 3.4 | 3400 | Gallatin | 512 | 200 | 17.0X | 478 | 2048 |
P4EE 3.4 | 3400 | Gallatin | 512 | 200 | 17.0X | T/775 | 2048 |
P4 580J | 4000 | Prescott | 1024 | 200 | 20.0X | T/775 | |
P4EE 3.46 | 3467 | Gallatin | 512 | 266 | 13.0X | T/775 | 2048 |
P4EE 3.73 | 3733 | Prescott | 2048 | 266 | 14.0X | T/775 | |
* Prescott 2.4A and 2.8A processors have HyperThreading Technology (HTT) disabled. | |||||||
*** Front Side Bus (FSB) Speeds are "quad pumped", so Intel's FSB numbers are four times the actual bus speed on Pentium 4/Celeron, Pentium M/Celeron M, and Itanium processors. Multipliers are based off the base bus speed, not the FSB value. |
NetBurst consists of a deep 20-stage pipeline coupled to an 8-stage fetch/decode unit. Due to the time spent fetching and decoding instructions, Intel created a new type of cache called a trace cache. This contained pre-decoded micro-ops, so for a large percentage of instructions, NetBurst runs as a 20-stage pipeline. Certain types of code run very well on NetBurst, while others - specifically branch-heavy code, like that seen in compilers and some games - do not. An incorrect branch prediction on P4 costs about twice as many lost cycles as an incorrect branch prediction on P3 or Athlon, which is why Intel added a more robust branch prediction unit.
The long pipeline allowed clockspeeds to scale very quickly with NetBurst. It was also a bandwidth hungry design, so increasing bus speeds combined with dual-channel memory eventually pushed the P4 beyond the reach of the Athlon XP. On the server front with the Xeon processors, the bandwidth was provided by adding L3 cache.
The Prescott further extended the NetBurst pipeline to 23 stages in addition to the 8 fetch/decode stages. For whatever reason, Intel generally describes the pipeline of the Prescott as 31 stages while only calling the earlier design a 20 stage pipeline. Besides the additional stages, Prescott doubled the L2 cache of the Northwood, added SSE3 support, and to the best of my knowledge contains deactivated x86-64 support - called EM64T by Intel and AMD64 by its creator AMD. Xeon versions of Prescott with the 64-bit support enabled are now shipping, and likely by the time XP-64 is released we will see 64-bit enabled desktop processors.
The Pentium 4 architecture also saw the introduction of Symmetric Multi-Threading (SMT) for Intel processors - they chose to call it Hyper Threading Technology (HTT). It appears to have been a part of the core from the very beginning, but Intel didn't enable the functionality until the P4 3.06 was launched, at which time it became available in the Xeon platforms as well. Later, it was enabled in all the 800 FSB "C" processors. Due to the length of the P4 pipeline, HTT allows the execution units to stay busy in the event of an incorrect branch prediction. The second thread can continue to run while the other thread recovers. In an ideal scenario, HTT could potentially increase performance by 20 or even 50 percent. In real world tests, however, rarely does it improve performance by more than 5 to 10 percent, and there are even times when it hurts performance.
With the switch to socket 775 LGA, Intel has also adopted model names. This
likely has something to do with the recent difficulties Intel has encountered
in scaling the NetBurst architecture to higher speeds. However, an even bigger
problem is Intel's own Pentium M architecture (which is the next section).
Anyway, we now have new model numbers which are supposed to reflect the overall
capabilities of the chip, with higher numbers indicating more desirable chips.
Comparing between families of chips should not be done based solely off the
model number, however - there will certainly be instances where a 5xx chip
offers better performance than a 7xx chip, and perhaps we'll also see some 3xx
chips outperform their "superiors". For the time being, all of the 5xx chips
are Prescott cores with 1 MB of L2 cache and an 800 MHz FSB. Future processors
are also listed, and you can see where they will likely fall in the
performance spectrum.
Mobile Celeron, Mobile P4, Celeron M and Pentium M Processors
Mobile Pentium/Celeron Chips** | |||||||
MC 1.4 | 1400 | Willamette | 128 | 100 | 14.0X | 478M | |
MC 1.5 | 1500 | Willamette | 128 | 100 | 15.0X | 478M | |
MC 1.6 | 1600 | Willamette | 128 | 100 | 16.0X | 478M | |
MC 1.7 | 1700 | Willamette | 128 | 100 | 17.0X | 478M | |
MC 1.4 | 1400 | Northwood | 256 | 100 | 14.0X | 478M | |
MC 1.8 | 1800 | Willamette | 128 | 100 | 18.0X | 478M | |
MC 1.5 | 1500 | Northwood | 256 | 100 | 15.0X | 478M | |
MC 2.0 | 2000 | Willamette | 128 | 100 | 20.0X | 478M | |
MC 1.6 | 1600 | Northwood | 256 | 100 | 16.0X | 478M | |
CM 353/J | 900 | Dothan | 1024 | 100 | 9.0X | 478M | |
MC 2.1 | 2100 | Willamette | 128 | 100 | 21.0X | 478M | |
CM 333 | 900 | Banias | 1024 | 100 | 9.0X | 478M | |
PM 900 (ULV) | 900 | Banias | 1024 | 100 | 9.0X | 478M | |
MC 1.7 | 1700 | Northwood | 256 | 100 | 17.0X | 478M | |
MC 2.2 | 2200 | Willamette | 128 | 100 | 22.0X | 478M | |
MC 1.8 | 1800 | Northwood | 256 | 100 | 18.0X | 478M | |
CM 373J | 1000 | Dothan | 1024 | 100 | 10.0X | 478M | |
MC 2.3 | 2300 | Willamette | 128 | 100 | 23.0X | 478M | |
PM 1.0 (ULV) | 1000 | Banias | 1024 | 100 | 10.0X | 478M | |
MC 2.4 | 2400 | Willamette | 128 | 100 | 24.0X | 478M | |
MC 2.0 | 2000 | Northwood | 256 | 100 | 20.0X | 478M | |
PM 723/J (ULV) | 1000 | Dothan | 2048 | 100 | 10.0X | 478M | |
PM 1.1 (LV) | 1100 | Banias | 1024 | 100 | 11.0X | 478M | |
CM 350/J | 1300 | Dothan | 512 | 100 | 13.0X | 478M | |
MC 2.2 | 2200 | Northwood | 256 | 100 | 22.0X | 478M | |
PM 1.2 (LV) | 1200 | Banias | 1024 | 100 | 12.0X | 478M | |
CM 320 | 1300 | Banias | 512 | 100 | 13.0X | 478M | |
MC 2.4 | 2400 | Northwood | 256 | 100 | 24.0X | 478M | |
PM 1.3 | 1300 | Banias | 1024 | 100 | 13.0X | 478M | |
PM 718 (LV) | 1300 | Banias | 1024 | 100 | 13.0X | 478M | |
CM 330 | 1400 | Banias | 512 | 100 | 14.0X | 478M | |
MC 2.5 | 2500 | Northwood | 256 | 100 | 25.0X | 478M | |
CM 360/J | 1400 | Dothan | 1024 | 100 | 14.0X | 478M | |
MC 2.6 | 2600 | Northwood | 256 | 100 | 26.0X | 478M | |
CM 340 | 1500 | Banias | 512 | 100 | 15.0X | 478M | |
PM 1.4 | 1400 | Banias | 1024 | 100 | 14.0X | 478M | |
PM 713 (ULV) | 1400 | Banias | 1024 | 100 | 14.0X | 478M | |
MC 2.7 | 2700 | Northwood | 256 | 100 | 27.0X | 478M | |
CM 370J | 1500 | Dothan | 1024 | 100 | 15.0X | 478M | |
MC D 325 | 2533 | Prescott | 256 | 133.3 | 19.0X | T/775 | |
MC 2.8 | 2800 | Northwood | 256 | 100 | 28.0X | 478M | |
PM 1.5 | 1500 | Banias | 1024 | 100 | 15.0X | 478M | |
PM 705 | 1500 | Banias | 1024 | 100 | 15.0X | 478M | |
PM 733/J (ULV) | 1400 | Dothan | 2048 | 100 | 14.0X | 478M | |
PM 738/J (LV) | 1400 | Dothan | 2048 | 100 | 14.0X | 478M | |
MC D 330 | 2667 | Prescott | 256 | 133.3 | 20.0X | T/775 | |
MC D 335 | 2800 | Prescott | 256 | 133.3 | 21.0X | T/775 | |
PM 1.6 | 1600 | Banias | 1024 | 100 | 16.0X | 478M | |
PM 715 | 1500 | Dothan | 2048 | 100 | 15.0X | 478M | |
PM 758J (LV) | 1500 | Dothan | 2048 | 100 | 15.0X | 478M | |
MC D 340 | 2933 | Prescott | 256 | 133.3 | 22.0X | T/775 | |
PM 1.7 | 1700 | Banias | 1024 | 100 | 17.0X | 478M | |
MC D 345 | 3066 | Prescott | 256 | 133.3 | 23.0X | T/775 | |
MP4 2.8 | 2800 | Northwood | 512 | 133.3 | 21.0X | 478M | |
MP4 2.8 HT | 2800 | Northwood | 512 | 133.3 | 21.0X | 478M | |
PM 735 | 1700 | Dothan | 2048 | 100 | 17.0X | 478M | |
MC D 350 | 3200 | Prescott | 256 | 133.3 | 24.0X | T/775 | |
PM 730/J | 1600 | Dothan | 2048 | 133.3 | 12.0X | 478M | |
MP4 518 | 2800 | Prescott | 1024 | 133.3 | 21.0X | ?478M | |
PM 745 | 1800 | Dothan | 2048 | 100 | 18.0X | 478M | |
PM 753J (ULV) | 1800 | Dothan | 2048 | 100 | 18.0X | 478M | |
MP4 3.0 | 3000 | Northwood | 512 | 133.3 | 22.5X | 478M | |
MP4 3.0 HT | 3000 | Northwood | 512 | 133.3 | 22.5X | 478M | |
PM 740/J | 1733 | Dothan | 2048 | 133.3 | 13.0X | 478M | |
MP4 532 | 3067 | Prescott | 1024 | 133.3 | 23.0X | ?478M | |
MP4 3.2 HT | 3200 | Northwood | 512 | 133.3 | 24.0X | 478M | |
MP4 538 | 3200 | Prescott | 1024 | 133.3 | 24.0X | ?478M | |
PM 750/J | 1867 | Dothan | 2048 | 133.3 | 14.0X | 478M | |
PM 755 | 2000 | Dothan | 2048 | 100 | 20.0X | 478M | |
PM 760/J | 2000 | Dothan | 2048 | 133.3 | 15.0X | 478M | |
MP4 552 | 3467 | Prescott | 1024 | 133.3 | 26.0X | ?478M | |
MP4 558 | 3600 | Prescott | 1024 | 133.3 | 27.0X | ?478M | |
PM 770/J | 2133 | Dothan | 2048 | 133.3 | 16.0X | 478M | |
PM 765 | 2400 | Dothan | 2048 | 100 | 24.0X | 478M | |
** There are several chips in the mobile sector. PM is for Pentium M, MP4 is the Mobile Pentium 4, CM is the Celeron M, and MC is the Mobile Celeron (P4 core). | |||||||
*** Front Side Bus (FSB) Speeds are "quad pumped", so Intel's FSB numbers are four times the actual bus speed on Pentium 4/Celeron, Pentium M/Celeron M, and Itanium processors. Multipliers are based off the base bus speed, not the FSB value. |
With the scaling clock speeds of the Pentium 4, not even the specially designed Mobile versions were really suited for use in laptops. (Of course, they were still used, but Intel had other plans.) Higher clockspeeds mean higher power requirements as well as increased heat output, which makes it very difficult to get increased battery life. In response to pressure from companies such as Transmeta, Intel commissioned a design team in Israel to put together a high-performance, low-power processor. The end result was the Pentium M. Where the push for high clockspeeds was the driving force behind the NetBurst design, Pentium M was targeted at reaching specific thermal requirements. While specific details are rather hard to come by, since Intel is trying to protect its lead in the Mobile space, the Pentium M appears to be a modified version of the venerable P6 architecture.
One of the improvements made to the P6 architecture was a large L2 cache, which could be powered and accessed in 32K sections. This allows large portions of the cache to be in a low-power "sleep" mode at any given time, so they get the performance benefit of a large cache without incurring as much of the usual power increase. The L1 cache was also doubled from the PIII to 32K+32K data and instruction. Floating point performance was increased with the doubling of MMX/SSE units - although this really only helped with SSE optimized code - and there were a few other architectural changes. Overall, the Pentium M is able to provide performance that's roughly the equivalent of an Athlon processor of the same clock speed, while requiring much less power. Battery life in laptops that use the Pentium M can often be 25 to 50 percent longer than equivalent laptops that use the Mobile Pentium 4, Mobile Celeron or Mobile Athlon XP chips.
The length of the above chart should be an indication of how big the mobile market has become. One of the reasons for this increase in size is likely the cut-throat conditions that exist in the desktop CPU market. Intel charges a hefty premium for most of their mobile processors since, generally speaking, anyone looking for a high-performance laptop has more money to burn. This is what I call the "mobility tax": you should only buy a laptop if portability is a primary concern; otherwise, your money will go a lot further with a desktop system. Certainly, business types that use computers for presentations and work on the road will be willing to pay this so-called tax.
With the release of the Dothan core Pentium M chips, Intel has also switched to model numbers. Here, however, there are many factors that influence the overall number. Ultra-Low Voltage processors running at lower clock speeds can end up rated higher than faster processors that require more power. This is supposed to reflect the relative desirability of certain features, as an increased battery life could be more important to some people than raw performance. Of course, Intel specifically states that the model numbers are not measures of performance, but only the technically literate are likely to know this. In their own words: "Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See http://www.intel.com/products/processor_number for details."
Itanium and Itanium 2 Processors
Itanium (Server) | |||||||
Itanium | 733 | Merced | 96 | 66 | 11.0X | PAC-418 | 2048 |
Itanium | 733 | Merced | 96 | 66 | 11.0X | PAC-418 | 4096 |
Itanium | 200 | Merced | 96 | 66 | 12.0X | PAC-418 | 2048 |
Itanium | 200 | Merced | 96 | 66 | 12.0X | PAC-418 | 4096 |
Itanium 2 | 900 | McKinley | 256 | 100 | 9.0X | PAC-611 | 1536 |
Itanium 2 | 900 | McKinley | 256 | 100 | 9.0X | PAC-611 | 3072 |
Itanium 2 | 1000 | McKinley | 256 | 100 | 10.0X | PAC-611 | 1536 |
Itanium 2 | 1000 | McKinley | 256 | 100 | 10.0X | PAC-611 | 3072 |
Itanium 2 LV | 1000 | Deerfield | 256 | 100 | 10.0X | PAC-611 | 1536 |
Itanium 2 LV | 1500 | Deerfield | 256 | 100 | 15.0X | PAC-611 | 1536 |
Itanium 2 | 1300 | Madison | 256 | 100 | 13.0X | PAC-611 | 3072 |
Itanium 2 | 1400 | Madison | 256 | 100 | 14.0X | PAC-611 | 4096 |
Itanium 2 | 1500 | Madison | 256 | 100 | 15.0X | PAC-611 | 6144 |
*** Front Side Bus (FSB) Speeds are "quad pumped", so Intel's FSB numbers are four times the actual bus speed on Pentium 4/Celeron, Pentium M/Celeron M, and Itanium processors. Multipliers are based off the base bus speed, not the FSB value. |
Itanium processors are likely one of the least understood CPUs by most computer enthusiasts. Given that the cheapest models still cost over $1000, that's not really surprising. These processors are meant to target the high-end corporate world. They are often used in massively parallel processing situations, and Itaniums are capable of working in up to 512-way SMP systems. Of course, that doesn't really explain what the Itanium is.
For starters, Itanium is the way that Intel envisioned 64-bit computing, and it is built on a new instruction set dubbed IA-64 (Intel Architecture 64).. IA-64 was a clean break from x86 legacy code, and it was designed for the future. Really, its competition isn't the Xeon or Opteron CPUs, although some mistakenly compare it with these processors. Itanium is meant to compete in the high-end corporate 64-bit computing world, going up against servers based on the IBM Power4/5, HP PA-RISC, Sun UltraSparc-III, and DEC Alpha. If none of those names ring a bell, that's not very surprising. The quad-processor IBM Power4 system that was used as the main server at a company I worked for (and they had two units for redundancy) cost somewhere in the neighborhood of $500,000, and the RAID-5 array that provided data storage was another $500,000. Perhaps more important than the hardware was the service contract with IBM that helped guarantee everything stayed running. The cost of the support contract with IBM (for dozens of such setups) was supposedly around $300 million dollars a year!
The Alpha technology, interestingly enough, was purchased by Compaq, who merged with HP, and HP worked with Intel on the design of the Itanium, with the intention of using it in place of PA-RISC once it was complete. I believe that some (all?) of the Alpha technology was later transferred to Intel, most likely for use in furthering the design of the Itanium processors. Compaq/HP has continued to support this chip for the past several years, but they haven't invested a lot of money into researching new iterations of the design. This makes sense, since HP is encouraging its enterprise customers to switch to their Itanium platforms. Recently, HP announced that the 1.3 GHz (I think that was the speed) EV7 version of the Alpha chip will be the last.
These systems are often referred to as "Big Tin" systems, and they're in a league of their own. They are frequently used in systems that process huge amounts of data - their 64-bit addressing allows the use of many gigabytes of physical RAM - and they are usually optimized for input/output functions. Of course, reliability and up-time are far more important than actual performance numbers, and often once a system has been built around a specific architecture, large corporations will stick with that hardware unless there is tremendous incentive to switch to something else. Switching usually consists of several years of coding, testing, debugging, and validation - a task not to be undertaken lightly, to be sure.
For the processor design, Intel continued with their radical departure from accepted norms. Instead of a RISC or CISC approach, Intel went back to a technology that had been used in old mainframes and other computers of yore, VLIW (Very Long Instruction Word). Itanium is not a strict VLIW machine, though, as VLIW has some well known drawbacks that Intel worked to overcome, and Intel chose to call their new approach EPIC, "Explicitly Parallel Instruction Computer". In contrast to designs such as the Xeon and Opteron, which can issue up to three instructions per cycle, the Itanium 2 (forget Itanium 1 for a minute) can issue eight instructions per clock, and unlike VLIW designs, future Itanium chips could further increase the issue width without needing to recompile the code. In theory, then, a 1 GHz Itanium chip could perform roughly as fast as a 2.66 GHz Xeon/Opteron, or the 1.5 GHz Itanium 2 would be roughly as fast as a 4 GHz Xeon/Opteron. That's just theoretical performance, of course, and the overall system design will play a large role in determining how much of the potential of any system is actually realized.
To help reach that potential, Itanium chips run off a 128-bit quad-pumped system bus, using standard SDRAM (for the time being). The lower clock speeds combined with the wider bus make the SDRAM less of an issue than with high-speed desktop systems. The initial Itanium design, Merced, had four integer units (ALUs), two floating point units (FPUs), and three branch units (BRUs), two SIMD (i.e. MMX/SSE) units, and two load/store units - also called address generation units (AGUs) in other CPUs. The modified McKinley (and later) designs have six ALUs, three BRUs, two FPUs, one SIMD, two load units, and two store units - sort of like having 4 AGUs, except that they're more specialized. In addition, the McKinley has roughly three times the cache bandwidth as Merced. Merced was also a six issue design with a deeper pipeline (10 stages) and less memory bandwidth - a rather problematic design. McKinley and later designs are eight issue designs with shorter pipelines (8 stages) and more memory bandwidth. While Merced rarely made full use of its six issue design, McKinley's enhancements help it come a lot closer to issuing the maximum eight instructions per clock.
That doesn't really tell a whole lot about the architecture, and I don't really want to go much deeper than that right now. Suffice it to say that Itanium depends in a large part on compiler technology in order to reach its potential, and Intel has apparently had more difficulties in that area than they initially anticipated, but lately this seems to be less of a problem. The initial Merced design was also flawed, if you couldn't tell from my above description of the architecture, but Itanium 2 goes a long way toward rectifying the problems.
Many have called the Itanium a failure - coming up with such names as Itanic to describe the processor - especially now that AMD has launched Opteron and Intel is following suit with x86-64 support. However, they're really very different goals, and in the target market segment, Itanium is still managing to compete. Needless to say, it helps that Intel has very deep pockets thanks to the income generated from their desktop and mobile processor divisions. Itanium may or may not live in the long term, but short term Intel has plans to keep it around at least another three or four years, and they will likely keep it around longer to support existing clients. Honestly, though, I doubt any of us will ever be running an IA64 processor on our desktop systems.
74 Comments
View All Comments
JarredWalton - Monday, August 30, 2004 - link
#50 - Good catch. Obviously, there was some cutting and pasting involved. At some point, I corrected all of the names, but missed some of the clock speeds (at least on the Intel charts).#53 - Yes, you are correct. Someone corrected me before, but I didn't change both AMD charts. The Clawhammer supposedly does not have all three HyperTransport paths, so the FX would have to use the Sledgehammer core. It's just a little odd trying to figure out what AMD is doing on those cores. If it were Intel, every core version (i.e. different cache size, different memory controller, different socket) would probably get its own name. :)
OC DETECTIVE - Monday, August 30, 2004 - link
Actually #25's assertion that the FX 939 is a Clawhammer is incorrect. See details of correspondence with AMD's technical dept.over hereit is a Sledgehammer!
http://www.xtremesystems.org/forums/showthread.php...
Pumpkinierre - Sunday, August 29, 2004 - link
#49 There was a post not so long back that had the Prescott pipeline at 22 stages. But your information is right at launch. I just wonder how valid all this pipeline model is or whether the processor takes what it needs for the task required.karlreading - Sunday, August 29, 2004 - link
very informative article, very handy when talking hardware!!!heintjeput2 - Sunday, August 29, 2004 - link
A found a few things who are probably wrongP4 2.2 2800 Northwood 512 100 28.0X 478
should be:
P4 2.2 2200 Northwood 512 100 22.0X 478
and:
P4 3.2E 3800 Prescott 1024 200 19.0X 478
should be:
P4 3.2E 3200 Prescott 1024 200 16.0X 478
P4 540/J 3800 Prescott 1024 200 19.0X T/775
should be:
P4 540/J 3200 Prescott 1024 200 16.0X T/775
P4 3.2C 3800 Northwood 512 200 19.0X 478
>>
P4 3.2C 3200 Northwood 512 200 16.0X 478
P4EE 3.2 3800 Gallatin 512 200 19.0X 478 2048
>>
P4EE 3.2 3200 Gallatin 512 200 16.0X 478 2048
PM 1.2 (LV) 1800 Banias 1024 100 18.0X 478M
>>
PM 1.2 (LV) 1200 Banias 1024 100 12.0X 478M ??
MP4 3.2 HT 3800 Northwood 512 133 28.5X 478M
>>
MP4 3.2 HT 3200 Northwood 512 133 25.5X 478M
Athlon XP-M 2600+ 1933 Barton 512 133.3 14.5X
>>
Athlon XP-M 2600+ 2000 Barton 512 133.3 15.0X
Sempron 3100+ 1800 Paris** 256 200 9.0X 754
>>
Sempron 3100+ 1800 Paris* 256 200 9.0X 754
add:
Athlon XP-M 2400+ (ULV) 1800 Barton 512 133.3 13.5X
Athlon XP-M 2400+ (LV) 1800 Barton 512 133.3 13.5X
Athlon XP-M 2500+ (LV) 1867 Barton 512 133.3 14.0X
Athlon XP-M 2600+ (LV) 2000 Barton 512 133.3 15.0X
IntelUser2000 - Sunday, August 29, 2004 - link
I don't understand why people don't look up at Anandtech's old articles for information(or at least don't seem to)Take a look at the Pentium 4 Willamette article that states 10-stage pipeline for Pentium III and 20-stage pipeline for Pentium 4. I believe the most common figures are the Integer pipelines not including fetch/decode stages(according to your article anyway).
Link to article: http://www.anandtech.com/cpuchipsets/showdoc.aspx?...
Also why does it say Prescott have 23 stage pipelines?
"The Prescott further extended the NetBurst pipeline to 23 stages in addition to the 8 fetch/decode stages. For whatever reason, Intel generally describes the pipeline of the Prescott as 31 stages while only calling the earlier design a 20 stage pipeline."
JarredWalton - Sunday, August 29, 2004 - link
47 - Somehow I screwed that up in the update. Sorry. The 133 MHz bus (533 FSB) Xeon chips run in socket 604, so the two later Prestonia core Xeons are socket 604 parts. As far as I know, all the Gallatin Xeon cores are still socket 603.Marlin1975 - Saturday, August 28, 2004 - link
ALL the P4 Xeons are listed at socket 603. I know the later and even current ones are now 604.Zebo - Saturday, August 28, 2004 - link
One of the best guides I even read thanks I learned a lot.:)JarredWalton - Saturday, August 28, 2004 - link
Not like anyone is going to notice anymore (*wink*), but the article has now been updated with all of the corrections as well as additional commentary. I hope this clarifies a few things. If there are still errors, send them my way!