CHAPTER 1: The brakes on CPU power

CPU Performance increase hits the brakes.

The growth rate of CPU performance has been spectacular in the past decades. Two legends of computing history, John.L Hennessy and David A. Patterson, have quantified this performance growth to be about 58% per year.

A recent study by the University of göteborg [1] confirmed that the 58% number was true between 1985 and 1996. During the last 7.5 years (1996-2004), the Swedish professors proved that the performance growth has slowed down to an average of 41% per year. Even worse is the conclusion that "there are signs of a continuing decline".

When we focus on Intel's CPUs, the deterioration of CPU performance growth is almost spelling doom. In November 2002, Intel was well ahead of the competition with the introduction of a 3.06 GHz Pentium 4. Intel had doubled the clock speed of its latest x86 architecture within two years, which was quite an accomplishment.

Two and half years later, Intel's Pentium 4 is running at 3.8 GHz, which means that clock speed has increased by only 25%. Of course, we all know that performance does not scale linearly with clock speed. So, let us talk performance.

 CPU  SpecInt2000  SpecFp2000
Pentium 4 3800E 1666 1839
Pentium 4 3060 1167 1096
Pentium 4 1500 560 634

From 2000 to 2002, performance increased by 108%. In the following 3 years, Intel's latest CPU only increased integer performance by 43%. The same does not hold true for SpecFP2000, as the 3.8 GHz Prescott CPU had improved performance by 68%, while the 3.06 GHz was about 73% faster than the first incarnation of the Netburst architecture.

However, SpecFP2000 remains a "special" benchmark, which exaggerates greatly the importance of memory bandwidth as very few other FPU applications behave the same way. The 800 MHz FSB of the 3.8 GHz is 50% faster than the bus to Intel's first Hyperthreaded CPU (3.06 GHz), while the FSB of the latter has only a 33% advantage over the older 1.5 GHz Pentium 4.

Intel's compilers have also improved vastly over the past years, which is positive. However, they have also become better in using special tricks (strip-mining optimizations, for example) to artificially improve the Spec score; tricks that are not usable by developers who need to get real applications to the market. Don't take my word for it, but make sure to read Tim Sweeney's comments in the next article.

These advantages are the main reasons why SpecFP doesn't tell us what most applications do: the pace of CPU performance growth has slowed down significantly, even in FP intensive workloads. Applications such as 3DSMax, Lightwave, Adobe Premiere, video encoding and others show, on average, that the Pentium 4 3.8 GHz is about 20-45% faster than the Pentium 4 3.06 GHz, while the latter is easily between 60% and 90% faster than our 1.5 GHz reference point.

Demystifying the slowdown

It is no mystery that the three main reasons why CPU progress is slowing down are:

  • Total dissipated power
  • Wire Delay
  • "The memory wall"

However, simply stating that these three problems are the reason why it is getting very hard to design CPUs that perform better is an oversimplification. There are decent solutions for each of these problems, and the real reason why they have slowed down CPU progress is more subtle.

We are going to cover the memory wall in more detail later. Suffice it to say, it is well known that DRAM speeds up by about 10% per year, while CPUs run 40% to 60% faster each year.

Power problems

In order to understand power problems, you have to understand the following formula, which describes switching power:

Power ~ ½ CV ² Af

In other words, dissipated power is linear with the effective capacitance, activity and frequency. Power increases quadratically with the CPU's core voltage. Activity is the factor that is influenced by the software you run; the more intensive the software, the higher the amount of the time that the transistors are active.

With each major transition to a new process technology that has a reduction in transistor feature size of 2, the same die area becomes 4 times smaller. For example, Willamette (introduced with 180 nm technology) would have been more or less 4 times smaller using the 90 nm technology. That is simplified of course, but it shows that the die gets smaller and smaller. Now that should not be such a problem as Vdd (Vcore) can also be reduced, and as a result, you can reduce power by a factor of two or even more. Of course, as CPUs extract more ILP and have deeper pipelines, they become more complex and use more transistors. The result is that the power reductions of decreasing Vdd are negated by the increasing amount of transistors.

And there are limitations of the amount of power that you can dissipate through a shrinking die area. But switching power is not the worst problem, as it can be reduced by applying a few clever techniques.

One of them is clock gating, a power-saving technique implemented extensively in the Pentium 4. Clock gating logic will only activate the clocks in a Functional Unit Block (FUB) when it needs to work. Together with other power-saving techniques, switching or dynamic power is more or less under control; over time, it increases linearly, while the amount of transistors used is increasing exponentially.


Index CHAPTER 1 (con't)
Comments Locked

65 Comments

View All Comments

  • Momental - Wednesday, February 9, 2005 - link

    #41, I understood what he meant when he stated that AMD could only be so lucky to have something which was a technological failure, ie: Prescott, sell as well as it has. Even the article clearly summarizes that Prescott in and of itself isn't a piece of junk per se, only that is has no more room for evolution as Intel originally had hoped.

    #36 wasn't saying that it was a flop sales-wise, quite the contrary. The thing has sold like hotcakes!

    I, like many others here, literally got dizzy as I struggled to keep up with all of the technical terminology and mathmetical formulas. My brain is, as of this moment, threatening to strike if I don't get it a better health and retirement plan along with a shorter work week. ;)
  • Ivo - Wednesday, February 9, 2005 - link

    1. About the multiprocessing: Of coarse, there are many (important!) applications, which are more than satisfied with the existing mono-CPU performance. Some other will benefit from dual CPUs. Matrix 2CPU+2GPU combinations could be essential e.g. for stereo-visualization. Probably, desktop machines with enhanced voice/image analytical capabilities could require even more sophisticated CPU Matrices. I suppose, the mono- and multi-CPU solutions will coexist in the near future.

    2. About the leakage problem: New materials like SOI are part of the solution. Another part are the new techniques. Let us take a lesson from the nature: our blood-transportation system consists of tiny capillaries and much thicker arteries. Maybe it could make sense to combine 65 nm transistors e.g. in the cash memory and 90 nm transistors in the ALU?
  • Noli - Wednesday, February 9, 2005 - link

    "Netburst architecture is very innovative and even genial"

    genius-like?
    If by genial you mean 'having a pleasant or friendly disposition', it sounds weird. It can mean 'conducive to growth' in this context but that's not so intuitive because a) it wasn't and b) at best it was only theoretically genial.

    Presumably it's not genial as in 'of or relating to the chin' :)

    Agree monolithic was confusing but it was the intel dude who said it - I thought it meant 'large single unit' rather than 'old (as in technology)' as in: increasing processing power by increasing the size and complexity of a single core is now not as efficient as strapping two cores together - a duallithic unit :)

    Sorry to be a pedantic twat.
  • Xentropy - Wednesday, February 9, 2005 - link

    Some of the verbage in that final chapter makes me wonder how much better Prescott might have done if Intel had just left out everything 64-bit and developed an entirely different processor for 64-bit. Especially since we won't have a mainstream OS that'll even utilize those instructions for another few months, and it's already been about a year since release, they could have easily gotten away with putting 64-bit off for the next project. It's pretty obvious by now even the 32-bit Prescotts have those 64-bit transistors sitting around. Even if not active, they aren't exactly contributing to the power efficiency of the processor.

    I think one big reason Intel thinks dual core will be the savior of even the Prescott line is supposedly dual cores running at 3Ghz only require equivalent power draw to a single core at 3.6Ghz and should be just as fast in some situations (multitasking, at least). Dual core at 85% clockspeed will be slower for gaming, though, so dual core Prescott still won't close the gap with AMD for gaming enthusiasts (98% of this site's readership), and may even represent an even further drop in performance per watt. Here's hoping for Pentium-M on the desktop. :>
  • piroroadkill - Wednesday, February 9, 2005 - link

    #36 -- You really didn't read the article and get the point of it. It wasn't a failure from a sales point of view, and this article was not written from a sales point of view, but a technical point of view, and how the Prescott helped in furthering CPU technology.

    Thus, a failure.
  • ViRGE - Wednesday, February 9, 2005 - link

    Although I think I sank more than I swam, that was a very good and informative article Johan. I just have one request for a future article since I'm guessing the next one is on multi-core tech: will someone at AT run the full AT benchmark suite against a SMP Xeon machine so that we can get a good idea ahead of time what dual-core performance will be like against single core? My understanding is that the Smithfields aren't going to be doing much else new besides putting 2 cores on one die(i.e. no cache sharing or other new tech), so SMP benchmarks should be fairly close to dual-core benchmarks.
  • Griswold - Wednesday, February 9, 2005 - link

    Point and case as to why the marketing department is the most important (and powerful) part of any highly successful company. It's not the R&D labs who tell you what works and what comes next, it's the PR team.
  • quidpro - Wednesday, February 9, 2005 - link

    Someone needs to make a new Tron movie so I can understand this better.
  • tore - Wednesday, February 9, 2005 - link

    Great article, on page 3 you talk about BJT transistor with a base, collector and emitter, since all modern cpu's use mosfets should you talk about a mosfet with a gate, source and drain?
  • Questar - Wednesday, February 9, 2005 - link

    "The Pentium 4 "Prescott" is, despite its innovative architecture, a failure."


    AMD wishes they had a "failure" that sold like Prescott.


Log in

Don't have an account? Sign up now