CHAPTER 3: Containing the epidemic problems

Reducing leakage

Leakage is such a huge problem that it could, in theory, make any advance in process technology useless. Without countermeasures, a 45 nm Pentium 4 would consume 100 to 150 Watts on leakage alone, and up to burn 250 Watts in total. The small die would go up in smoke before the ROM program would have finished the POST sequence.

However, smart researchers have found ways to reduce leakage significantly. SOI – Silicon on Insulator - improves the insulation of the gate and thus reduces leakage currents. SOI has made process technology even more complex, making it harder for AMD to get high binsplits on the Opteron and Athlon 64. However, it is clear that the Athlon 64 has a lot less trouble with leakage power than the Prescott, despite the fact that the Athlon 64 has only 20% less transistors than the Intel Prescott (106 versus 125 million).

The most spectacular reduction of leakage will probably come from Intel's "high-k" materials, which will replace the current silicon dioxide gate dielectric. Thanks to this advancement and other small improvements, Intel expects to reduce gate leakage by over one hundredfold! This new technology will be used when Intel moves to 45nm technology.

Another promising technique is Gate Bias technology. By using special sleep transistors, leakage can be reduced by up to 90% while the dynamic power is also reduced with 50% and more.

Body Bias techniques make it possible to control the voltage of a transistor. The objective is to make transistors slow (low leakage) when they are not used, and fast when they are. Stacked transistors and many other technologies also allow for reduction in leakage.[4]

One could probably write a book on this, but the message should be clear: the leakage problem is not going to stop progress. SOI already reduces the problem significantly and high K materials will make sure that the whole leakage problem will remain to be a nuisance, but not a major concern until the industry moves to even smaller structures than 45 nm.

At the same time, strained silicon will reduce the amount of dynamic power needed. With strained silicon, electrons experience less resistance. As a result, CPUs can get up to 35 percent faster without consuming more. This is what should allow the Athlon 64 stepping "E0" to reach higher clock speeds without consuming more.

Reducing Wire Delay

Although wire delay has not been so much in spotlight as leakage power, it is an important hurdle that designers have to take when they target high clock speeds. The resistance of wires has been reduced by both AMD and Intel using copper instead of aluminium. Capacitance has been lowered by using lower-K materials separating wires.


Fig 5. 8 Metal layers to reduce wire delay in Intel's 65 nm CPUs

Adding more metal layers is another strategy. More metal layers enable the wires connecting different parts of the CPU to be packed more densely. More densely means shorter wires. And shorter wires result in lower resistance, which, in turn, reduce the total RC Delay.


Fig 6. Repeaters on the Itanium Die

Of course, there are limits on what adding more metal layers, using SOI and lower-K materials can do to reduce RC delay. If some of the global wires are still too long, they are broken up into smaller parts, which are connected by repeaters. Repeaters can be used as much as you like, but they consume power of course.

Now that we have wire delay and leakage more or less out of control, let us try to find out what went exactly wrong with the Pentium 4 "Prescott". The answer is not as obvious as it seems.


CHAPTER 2: Why single core CPUs are no longer "cool" CHAPTER 4: The Pentium 4 crash landing
Comments Locked

65 Comments

View All Comments

  • Zak - Wednesday, August 22, 2007 - link

    I seem to remember reading somewhere, probably couple of years ago, about research being done on hyperconductivity in "normal" temperatures. Right now hyperconductivity occurs only in extremely low temperatures, right? If materials were developed that achieve the same in normal temperatures it'd solve lots of these issues, like wire delay and power loss, wouldn't it?

    Z.
  • Tellme - Monday, February 21, 2005 - link

    Carl what i meant was that soon we might not see much improved performance with multicores as well because the data comes too late to the processor for quick execution. (That is true for single cores as well).

    Did you checked the link?
    Their idea is simple.
    "If you can't bring the memory bandwidth to the processor, then bring the processors to the memory."
    Intresting no?
    Currently processor waits most of its time for data to be processed.

  • carl0ski - Saturday, February 19, 2005 - link

    #61 i thought p4 already had memory bandwidth problems,
    AMD has a temporary work around (on die memory controller) which aids in multiple CPU's/Dies using the same fsb to access the Ram.

    Intel has proposed multiple fsb's , one each CPU/die.

    Does anyone know if that means they will need sperate RAM dimms for each FSB? because that would prove an expensive system.
  • carl0ski - Saturday, February 19, 2005 - link

    [quote]59 - Posted on Feb 12, 2005 at 11:28 AM by fitten Reply
    #57 What was the performance comparison of the 1GHz Athlon vs. the 1GHz P3? IIRC, the Athlon was faster by some margin. If this was the case, then there was a little more than tweaking that went on in the Pentium-M line. Because they started out looking at the P3 doesn't mean that what they ended up with was the P3 with a tweak here or there. :)[/quote]

    #59 didnt P3 1ghz run 133mhz sdram? on a 133fsb?
    Athlon 1ghz had a nice DDR 266 fsb to support it.

  • Tellme - Monday, February 14, 2005 - link

    Nice article.

    I think dual cores will soon reach hit the wall ie Memory Bandwidth.

    Hopefully memory and processors are integrates in near future.

    See
    http://www.ee.ualberta.ca/~elliott/cram/

  • ceefka - Monday, February 14, 2005 - link

    Though still a little too technical for me, it makes a good read.

    It's good to know that Intel has eaten their words and realized they had to go back to the drawing board.

    I believe rather sooner than later multicore will mean 4 - 8 cores providing the power to emulate everything that is not necessarily native, like running MAC OSX on an AMD or Intel box. Iow the CELL will meet its match.
  • fitten - Saturday, February 12, 2005 - link

    #57 What was the performance comparison of the 1GHz Athlon vs. the 1GHz P3? IIRC, the Athlon was faster by some margin. If this was the case, then there was a little more than tweaking that went on in the Pentium-M line. Because they started out looking at the P3 doesn't mean that what they ended up with was the P3 with a tweak here or there. :)
  • avijay - Friday, February 11, 2005 - link

    EXCELLENT Article! One of the very best I've ever read. Nice to see all the references at the end as well. Could someone please point me to Johan's first article at AT please. Thanks.
    Great Work!
  • fishbreath - Friday, February 11, 2005 - link

    For those of you who don't actually know this:

    1) The Dotham IS a Pentium 3. It was tweaked by Intel in Israel, but it's heart and soul is just a PIII.

    1b) All P4's have hyperthreading in them, and always have had. It was a fuse feature that was not announced until there were applications to support them. But anyone who has HT and Windows XP knows that Windows simply has a smoother 'feel' when running on an HT processor!

    2) Complex array processors are already in the pipeline (no pun intended). However the lack of an operating system or language to support them demands they make their first appearance in dedicated applications such as h264 encoders.
  • blckgrffn - Friday, February 11, 2005 - link

    Yay for Very Large Scale Integration (more than 10,000 transistors per chip)! :) I wonder when the historians will put down in the history books that we have hit the fifth generation of computing org....

Log in

Don't have an account? Sign up now