Intel's 45nm Dual-Core E8500: The Best Just Got Betterby Kris Boughton on March 5, 2008 3:00 AM EST
- Posted in
The Truth About Processor "Degradation"
Degradation - the process by which a CPU loses the ability to maintain an equivalent overclock, often sustainable through the use of increased core voltage levels - is usually regarded as a form of ongoing failure. This is much like saying your life is nothing more than your continual march towards death. While some might find this analogy rather poignant philosophically speaking, technically speaking it's a horrible way of modeling the life-cycle of a CPU. Consider this: silicon quality is often measured as a CPU's ability to reach and maintain a desired stable switching frequency all while requiring no more than the maximum specified process voltage (plus margin). If the voltage required to reach those speeds is a function of the CPU's remaining useful life, then why would each processor come with the same three-year warranty?
The answer is quite simple really. Each processor, regardless of silicon quality, is capable of sustained error-free operation while functioning within the bounds of the specified environmental tolerances (temperature, voltage, etc.), for a period of no less than the warranted lifetime when no more performance is demanded of it than its rated frequency will allow. In other words, rather than limit the useful lifetime of each processor, and to allow for a consistent warranty policy, processors are binned based on the highest achievable speed while applying no more than the process's maximum allowable voltage. When we get right down to it, this is the key to overclocking - running CPUs in excess of their rated specifications regardless of reliability guidelines.
As soon as you concede that overclocking by definition reduces the useful lifetime of any CPU, it becomes easier to justify its more extreme application. It also goes a long way to understanding why Intel has a strict "no overclocking" policy when it comes to retaining the product warranty. Too many people believe overclocking is "safe" as long as they don't increase their processor core voltage - not true. Frequency increases drive higher load temperatures, which reduces useful life. Conversely, better cooling may be a sound investment for those that are looking for longer, unfailing operation as this should provide more positive margin for an extended period of time.
The graph above shows three curves. The middle line models the minimum required voltage needed for a processor to continuously run at 100% load for the period shown along the x-axis. During this time, the processor is subjected to its specified maximum core voltage and is never overclocked. Additionally, all of the worst-case considerations come together and our E8500 operates at its absolute maximum sustained Tcase temperature of 72.4ºC. Three years later, we would expect the CPU to have "degraded" to the point where slightly more core voltage is needed for stable operation - as shown above, a little less than 1.15V, up from 1.125V.
Including Vdroop and Voffset, an average 45nm dual-core processor with a VID of 1.25000 should see a final load voltage of about 1.21V. Shown as the dashed green line near the middle of the graph, this represents the actual CPU supply voltage (Vcore). Keep in mind that the trend line represents the minimum voltage required for continued stable operation, so as long as it stays below the actual supply voltage line (middle green line) the CPU will function properly. The lower green line is approximately 5% below the actual supply voltage, and represents an example of an offset that might be used to ensure a positive voltage margin is maintained.
The intersection point of the middle line (minimum required voltage) and the middle green line (actual supply voltage) predicts the point in time when the CPU should "fail," although an increase in supply voltage should allow for longer operation. Also, note how the middle line passes through the lower green line, representing the desired margin to stability at the three-year point, marking the end of warranty. The red line demonstrates the effect running the processor above the maximum thermal specification has on rated product lifetime - we can see the accelerated degradation caused by the higher operating temperatures. The blue line is an example of how lowering the average CPU temperature can lead to increased product longevity.
Because end of life failures are usually caused by a loss of positive voltage margin (excessive wear/degradation) we can establish a very real correlation between the increased/decreased probability of these types of failures and the operating environment experienced by the processor(s) in question. Here we see the effect a harsher operating environment has on observed failure rate due to the new end of life failure rate curve. By running the CPU outside of prescribed operating limits, we are no longer able to positively attribute any failure near the end of warranty to any known cause. Furthermore, because Intel is unable to make a distinction in failure type for each individual case of warranty failure when overclocking or improper use is suspected, policy is established which prohibits overclocking of any kind if warranty coverage is desired.
So what does all of this mean? So far we have learned that of the three basic failure types, failures due to degradation (i.e. wearing out) are in most cases directly influenced by the means and manner in which the processor is operated. Clearly, the user plays a considerable role in the creation and maintenance of a suitable operating environment. This includes the use of high-quality cooling solutions and pastes, the liberal use of fans to provide adequate case ventilation, and finally proper climate control of the surrounding areas. We have also learned that Intel has established easy to follow guidelines when it comes to ensuring the longevity of your investment.
Those that choose to ignore these recommendations and/or exceed any specification do so at their own peril. This is not meant to insinuate that doing so will necessarily cause immediate, irreparable damage or product failure. Rather, every decision made during the course of overclocking has a real and measureable "consequence." For some, there may be little reason to worry as concern for product life may not be a priority. On the other hand, perhaps precautions will be taken in order to accommodate the higher voltages like the use of water-cooling or phase-change cooling. In any case, the underlying principles are the same - overclocking is never without risk. And just like life, taking calculated risks can sometimes be the right choice.