CHAPTER 1: The brakes on CPU power

CPU Performance increase hits the brakes.

The growth rate of CPU performance has been spectacular in the past decades. Two legends of computing history, John.L Hennessy and David A. Patterson, have quantified this performance growth to be about 58% per year.

A recent study by the University of göteborg [1] confirmed that the 58% number was true between 1985 and 1996. During the last 7.5 years (1996-2004), the Swedish professors proved that the performance growth has slowed down to an average of 41% per year. Even worse is the conclusion that "there are signs of a continuing decline".

When we focus on Intel's CPUs, the deterioration of CPU performance growth is almost spelling doom. In November 2002, Intel was well ahead of the competition with the introduction of a 3.06 GHz Pentium 4. Intel had doubled the clock speed of its latest x86 architecture within two years, which was quite an accomplishment.

Two and half years later, Intel's Pentium 4 is running at 3.8 GHz, which means that clock speed has increased by only 25%. Of course, we all know that performance does not scale linearly with clock speed. So, let us talk performance.

 CPU  SpecInt2000  SpecFp2000
Pentium 4 3800E 1666 1839
Pentium 4 3060 1167 1096
Pentium 4 1500 560 634

From 2000 to 2002, performance increased by 108%. In the following 3 years, Intel's latest CPU only increased integer performance by 43%. The same does not hold true for SpecFP2000, as the 3.8 GHz Prescott CPU had improved performance by 68%, while the 3.06 GHz was about 73% faster than the first incarnation of the Netburst architecture.

However, SpecFP2000 remains a "special" benchmark, which exaggerates greatly the importance of memory bandwidth as very few other FPU applications behave the same way. The 800 MHz FSB of the 3.8 GHz is 50% faster than the bus to Intel's first Hyperthreaded CPU (3.06 GHz), while the FSB of the latter has only a 33% advantage over the older 1.5 GHz Pentium 4.

Intel's compilers have also improved vastly over the past years, which is positive. However, they have also become better in using special tricks (strip-mining optimizations, for example) to artificially improve the Spec score; tricks that are not usable by developers who need to get real applications to the market. Don't take my word for it, but make sure to read Tim Sweeney's comments in the next article.

These advantages are the main reasons why SpecFP doesn't tell us what most applications do: the pace of CPU performance growth has slowed down significantly, even in FP intensive workloads. Applications such as 3DSMax, Lightwave, Adobe Premiere, video encoding and others show, on average, that the Pentium 4 3.8 GHz is about 20-45% faster than the Pentium 4 3.06 GHz, while the latter is easily between 60% and 90% faster than our 1.5 GHz reference point.

Demystifying the slowdown

It is no mystery that the three main reasons why CPU progress is slowing down are:

  • Total dissipated power
  • Wire Delay
  • "The memory wall"

However, simply stating that these three problems are the reason why it is getting very hard to design CPUs that perform better is an oversimplification. There are decent solutions for each of these problems, and the real reason why they have slowed down CPU progress is more subtle.

We are going to cover the memory wall in more detail later. Suffice it to say, it is well known that DRAM speeds up by about 10% per year, while CPUs run 40% to 60% faster each year.

Power problems

In order to understand power problems, you have to understand the following formula, which describes switching power:

Power ~ ½ CV ² Af

In other words, dissipated power is linear with the effective capacitance, activity and frequency. Power increases quadratically with the CPU's core voltage. Activity is the factor that is influenced by the software you run; the more intensive the software, the higher the amount of the time that the transistors are active.

With each major transition to a new process technology that has a reduction in transistor feature size of 2, the same die area becomes 4 times smaller. For example, Willamette (introduced with 180 nm technology) would have been more or less 4 times smaller using the 90 nm technology. That is simplified of course, but it shows that the die gets smaller and smaller. Now that should not be such a problem as Vdd (Vcore) can also be reduced, and as a result, you can reduce power by a factor of two or even more. Of course, as CPUs extract more ILP and have deeper pipelines, they become more complex and use more transistors. The result is that the power reductions of decreasing Vdd are negated by the increasing amount of transistors.

And there are limitations of the amount of power that you can dissipate through a shrinking die area. But switching power is not the worst problem, as it can be reduced by applying a few clever techniques.

One of them is clock gating, a power-saving technique implemented extensively in the Pentium 4. Clock gating logic will only activate the clocks in a Functional Unit Block (FUB) when it needs to work. Together with other power-saving techniques, switching or dynamic power is more or less under control; over time, it increases linearly, while the amount of transistors used is increasing exponentially.


Index CHAPTER 1 (con't)
POST A COMMENT

65 Comments

View All Comments

  • stephenbrooks - Wednesday, February 09, 2005 - link

    #28 - that's interesting. I was thinking myself just a few days ago "I wonder if those wires go the long way on a rectangular grid or do they go diagonally?" Looks like there's still room for improvement. Reply
  • Chuckles - Wednesday, February 09, 2005 - link

    The word comes from Latin. "mono" meaning one, "lithic" meaning stone. So monolithic refers to the fact that it is a single cohesive unit.
    The reason you associate "lithic" with old is only due to the fact that anthropologists use Paleolithic and Neolithic to describe time periods in human history in the Stone Age. The words translate as "old stone" and "new stone" respectively.
    I have seen plenty of monolithic benches around here. Heck, a slab granite countertop qualifies as a monolith.
    Reply
  • theOracle - Wednesday, February 09, 2005 - link

    Very good article - looks like a university paper with all the references etc! Looking forward to part two.

    Re "monolithic", granted the word doesn't mean old but anything '-lithic' instantly makes me think ancient (think neolithic etc). -lithic means a period in stone use by humans, and a monolith is a (usually ancient) stone monument; I think its fair to say Intel were trying to make the audience think 'old technology'.
    Reply
  • DavidMcCraw - Wednesday, February 09, 2005 - link

    Great article, but this isn't accurate:

    "Note the word "monolithic", a word with a rather pejorative meaning, which insinuates that the current single core CPUs are based on old technology."

    Neither the dictionary nor technical meanings of monolithic imply 'old technology'. Rather, it simply refers to the fact that the single-core CPU being referred to is as large as the two smaller chips, but is in one part.

    In the context of OS kernel architectures, the Linux kernel is a good example of monolithic technology... but I doubt many people consider it old tech!
    Reply
  • IceWindius - Wednesday, February 09, 2005 - link

    Even this articles makes my head hurt, so much about CPU's is hard to understand and grasp. I wish I kneow how those CPU engineers do this for a living.

    I wish someone like Arstechinca would make something really built ground up like CPU's for morons so I could start understanding this stuff better.
    Reply
  • JohanAnandtech - Wednesday, February 09, 2005 - link

    Jason and Anand have promised me (building some pressure ;-) a threaded comment system so I can answer more personally. Until then:

    1. Thanks for all the encouraging comments. It really gives a warm feeling to read them, and it is basically the most important motivation for writing more

    2. Slashbin (27): Typo. just typed with a small period of insanity. Voltage of course, fixed

    3. CSMR: the SPEC numbers of intel are artificially high, as they have been spending more and more time on aggressive compiler optimisations. All other benchmarks clearly show the slowdown.
    Reply
  • CSMR - Tuesday, February 08, 2005 - link

    Excellent article. Couple of odd things you might want to amend in chapter one: "CPUs run 40 to 60% faster each year" contradicts the previous discussion about slowed CPU speed increases. Also power formula explanation on the same page doesn't really make sense as pointed out by #27. Reply
  • Doormat - Tuesday, February 08, 2005 - link

    Good article. The only real thing I wanted to bring up was something called the "X Consortium". I wrote a paper in my solid state circuit design class a few years ago. Basically instead of having all the interconnects within a chip laid out in a grid-like fashion, it allows them to be diagonal (and thus, a savings of, at most, 29% - for the math impaired it could be at most 1/sqrt(2)). Perhaps the tools arent there or its too patent encumbered. If interconnects are really an issue then they should move to this diagonal interconnect technology. I actually dont think they are a very pressing need right now - leakage current is the most pressing issue. The move to copper interconnects a while ago helped (increased conductivity over aluminum, smaller die sizes mean shorter distances to traverse, typically).

    It will be very interesting to see what IBM does with their Cell chips and SOI (and what clock speed AMD releases their next A64/Opteron chips at since they've teamed with IBM). If indeed these cell chips run at 4GHz and dont have leakage current issues then there is a good chance that issue is mostly remedied (for now at least).
    Reply
  • slashbinslashbash - Tuesday, February 08, 2005 - link

    " In other words, dissipated power is linear with the e ffective capacitance, activity and frequency. Power increases quadratically with frequency or clock speed." (Page 2)

    Typo there? Frequency can't be both linear and quadratic..... from the equation itself, it looks like voltage is quadratic. (assuming the V is voltage)
    Reply
  • AnnoyedGrunt - Tuesday, February 08, 2005 - link

    And of course I meant to refer to post 23 above.
    -D!
    Reply

Log in

Don't have an account? Sign up now