When a CPU becomes a sieve

The real problem is leakage power, and the Intel power graph below illustrates this perfectly.


Fig 2. "Leakage power grows exponentially ".

As you can see, dynamic power - which does useful work - has increased relatively slowly despite the increase in CPU complexity. Leakage power, however, increases exponentially, and not linearly. It has grown quickly from a "minor nuisance" to a "circuit killing monster".

Leakage is comparable to a small hole in a waterhose of a firefighter. The more pressure (i.e. the higher the core voltage), the bigger the hole gets, and thus, the more water that leaks to the ground. The thinner the walls of the tube (i.e. smaller process technology), the quicker the holes become bigger, and the more water you lose, the harder the pumps must work to get the same amount of water to extinguish the fire. If the pumps overheat, you better throttle them down, or they will cease to work after a while.

Power Leakage happens as a part of the current, which is supposed to make our transistors switch leaks away in the substrate and finally in the ground. There are several leakage currents, but the two most important ones are the gate oxide tunnelling current and sub-threshold leakage.[3]


Fig 3. I3 is the Gate oxide tunnelling currents, I2 is the Sub-threshold leakage current

Gate oxide tunnelling (I3) currents get more important with smaller process technology as the gate oxide that is supposed to insulate the transistor becomes thinner and thinner. As a result, current that is going through the transistors leaks away - the gate oxide becomes a sieve instead of being the "wall of a tube".

Sub-threshold leakage (I2) transistor is the leakage current flowing through the transistor when it is supposed to be turned off. To understand this, we got to back to basic transistor technology.

Normally, a voltage threshold of x volts is needed to get current across the transistor, with x volts being the threshold. This way, the transistor is being used as a switch with a binary function: more or equal to threshold voltage = ON = 1, less than the threshold voltage = OFF.

The point that you have to remember is this: ideally, as long as the threshold voltage is not reached, no current should run through the transistor. However, as transistors and interconnects get smaller and smaller (smaller process technology), the insulation between drain and source gets worse and worse. As a result, a small leakage current gets through the transistor (I 2) even though the threshold voltage is not reached (the Transistor is off).

That subthreshold leakage has become a major problem, which has been made clear by Shekhar Borkar [5] (Intel Fellow, Director of Circuit Research). He illustrated this by the logarithmic graph below.


Fig 4. Subthreshold leakage - notice the logarithmic scale!

Subthreshold leakage was only a small problem at the time of Willamette - the leakage problem wasted a few watts at 180 nm. The graph is based on Moore's law: every two years, the number of transistors doubles. As you can see, without countermeasures, it wouldn't be interesting to use devices that make use of 45 nm technology. They would simply leak too much power, up to 100 Watts!

And subthreshold leakage is only part of the leakage problem. Together with gate oxide tunnelling, CPUs made of 65 nm technology would leak more power than what they need for making the transistors switch. It is comparable to a fuel tank that has so many holes, causing it to leak more gasoline to the ground than what the fuel pump can pump to the engine.

Let us check the third and last problem for high performance CPUs.

Wire delay

It is hard to imagine that the little wires - the metal interconnects - between transistors can be a limiting factor. About twenty years ago, transistor switching speeds were pretty low, and wire delays were completely ignored. However, as process technology became better, transistors were capable of switching much faster. Right now, the fastest transistors in the labs can attain 100 GHz (the record being around 300-500 GHz) and more. So, transistor switching speed still has a lot of headroom.

The tiny wires between the different transistors are still not the problem. Functional blocks are also wired to the TLBs (Translation Lookaside Buffer) and caches. The real problem is these global wires - they are a lot longer . If the RC delay is too high, the clock speed will have to be reduced to get a working CPU.

The speeds at which signals travel through the global wires (from logic blocks to the caches, for example) are quite a bit slower than what the maximum speed (speed of light) allows. The reason is the resistance (R, Ohm) and capacitive resistance (C) of the wire. As the whole CPU was made with smaller process technology, the wires also shrunk. You probably know from your lessons of physics that resistance increases as the cross section of the wire gets smaller and the length of the wire gets longer. So, if you shrink a wire, the effect of the shorter length is completely negated by the smaller thickness of the wire. You could make the wires thicker, but it wouldn't be easy and that would increase the capacitance of the wire. The result is that wire delay remains, more or less, the same (in nanoseconds).

However, gate switching speed improves a lot with smaller transistors (for example, 100%). So, while RC delay improves with a very small percentage (or nothing all), gates might switch up to 100% (simplified example) faster as process technology improves. The RC delay of the global wires becomes more a bottleneck that makes bumping up the clock speed hard. Modern Integrated Circuits (ICs), such as CPUs, must be partitioned, as a signal can travel for a time slightly less than the length of one clockpulse.


CHAPTER 1: The brakes on CPU power CHAPTER 2: Why single core CPUs are no longer "cool"
POST A COMMENT

65 Comments

View All Comments

  • stephenbrooks - Wednesday, February 09, 2005 - link

    #28 - that's interesting. I was thinking myself just a few days ago "I wonder if those wires go the long way on a rectangular grid or do they go diagonally?" Looks like there's still room for improvement. Reply
  • Chuckles - Wednesday, February 09, 2005 - link

    The word comes from Latin. "mono" meaning one, "lithic" meaning stone. So monolithic refers to the fact that it is a single cohesive unit.
    The reason you associate "lithic" with old is only due to the fact that anthropologists use Paleolithic and Neolithic to describe time periods in human history in the Stone Age. The words translate as "old stone" and "new stone" respectively.
    I have seen plenty of monolithic benches around here. Heck, a slab granite countertop qualifies as a monolith.
    Reply
  • theOracle - Wednesday, February 09, 2005 - link

    Very good article - looks like a university paper with all the references etc! Looking forward to part two.

    Re "monolithic", granted the word doesn't mean old but anything '-lithic' instantly makes me think ancient (think neolithic etc). -lithic means a period in stone use by humans, and a monolith is a (usually ancient) stone monument; I think its fair to say Intel were trying to make the audience think 'old technology'.
    Reply
  • DavidMcCraw - Wednesday, February 09, 2005 - link

    Great article, but this isn't accurate:

    "Note the word "monolithic", a word with a rather pejorative meaning, which insinuates that the current single core CPUs are based on old technology."

    Neither the dictionary nor technical meanings of monolithic imply 'old technology'. Rather, it simply refers to the fact that the single-core CPU being referred to is as large as the two smaller chips, but is in one part.

    In the context of OS kernel architectures, the Linux kernel is a good example of monolithic technology... but I doubt many people consider it old tech!
    Reply
  • IceWindius - Wednesday, February 09, 2005 - link

    Even this articles makes my head hurt, so much about CPU's is hard to understand and grasp. I wish I kneow how those CPU engineers do this for a living.

    I wish someone like Arstechinca would make something really built ground up like CPU's for morons so I could start understanding this stuff better.
    Reply
  • JohanAnandtech - Wednesday, February 09, 2005 - link

    Jason and Anand have promised me (building some pressure ;-) a threaded comment system so I can answer more personally. Until then:

    1. Thanks for all the encouraging comments. It really gives a warm feeling to read them, and it is basically the most important motivation for writing more

    2. Slashbin (27): Typo. just typed with a small period of insanity. Voltage of course, fixed

    3. CSMR: the SPEC numbers of intel are artificially high, as they have been spending more and more time on aggressive compiler optimisations. All other benchmarks clearly show the slowdown.
    Reply
  • CSMR - Tuesday, February 08, 2005 - link

    Excellent article. Couple of odd things you might want to amend in chapter one: "CPUs run 40 to 60% faster each year" contradicts the previous discussion about slowed CPU speed increases. Also power formula explanation on the same page doesn't really make sense as pointed out by #27. Reply
  • Doormat - Tuesday, February 08, 2005 - link

    Good article. The only real thing I wanted to bring up was something called the "X Consortium". I wrote a paper in my solid state circuit design class a few years ago. Basically instead of having all the interconnects within a chip laid out in a grid-like fashion, it allows them to be diagonal (and thus, a savings of, at most, 29% - for the math impaired it could be at most 1/sqrt(2)). Perhaps the tools arent there or its too patent encumbered. If interconnects are really an issue then they should move to this diagonal interconnect technology. I actually dont think they are a very pressing need right now - leakage current is the most pressing issue. The move to copper interconnects a while ago helped (increased conductivity over aluminum, smaller die sizes mean shorter distances to traverse, typically).

    It will be very interesting to see what IBM does with their Cell chips and SOI (and what clock speed AMD releases their next A64/Opteron chips at since they've teamed with IBM). If indeed these cell chips run at 4GHz and dont have leakage current issues then there is a good chance that issue is mostly remedied (for now at least).
    Reply
  • slashbinslashbash - Tuesday, February 08, 2005 - link

    " In other words, dissipated power is linear with the e ffective capacitance, activity and frequency. Power increases quadratically with frequency or clock speed." (Page 2)

    Typo there? Frequency can't be both linear and quadratic..... from the equation itself, it looks like voltage is quadratic. (assuming the V is voltage)
    Reply
  • AnnoyedGrunt - Tuesday, February 08, 2005 - link

    And of course I meant to refer to post 23 above.
    -D!
    Reply

Log in

Don't have an account? Sign up now