Determining a Processor Warranty Period

Like most electrical parts, a CPU's design lifetime is often measured in hours, or more specifically the product's mean time to failure (MTTF), which is simply the reciprocal of the failure rate. Failure rate can be defined as the frequency at which a system or component fails over time. A lower failure rate, or conversely a higher MTTF, suggests the product on average will continue to function for a longer time before experiencing a problem that either limits its useful application or altogether prevents further use. In the semiconductor industry, MTTF is often used in place of mean time between failures (MTBF), a common method of conveying product reliability with hard drive. MTBF suggests the item is capable of repair following failure, which is often not the case when it comes to discrete electrical components such as CPUs.

A particular processor product line's continuous failure rate, as modeled over time, is largely a function of operating temperature - that is, elevated operating temperatures lead to decreased product lifetimes. Which means, for a given target product lifetime, it is possible to derive with a certain degree of confidence - after accounting for all the worst-case end-of-life minimum reliability margins - a maximum rated operating temperature that produces no more than the acceptable number of product failures over a period of time. What's more, although none of Intel's processor lifetimes are expressly published, we can only assume the goal is somewhere near the three-year mark, which just so happens to correspond well with the three-year limited warranty provided to the original purchaser of any Intel boxed processor. (That last part was a joke; this is no accident.)

When it comes to semiconductors, there are three primary types of failures that can put an end to your CPU's life. The first, and probably the more well-known of the three, is called a hard failure, which can be said to have occurred whenever a single overstress incident can be identified as the primary root cause of the product's end of life. Examples of this type of failure would be the processor's exposure to an especially high core voltage, a recent period of operation at exceedingly elevated temperatures, or perhaps even a tragic ESD event. In any case, blame for the failure can be (or most often obviously should be) traced back and attributed to a known cause.



This is different from the second failure mode, known as a latent failure, in which a hidden defect or previously unnoted spot of damage from an earlier overstress event eventually brings about the component's untimely demise. These types of failure can lay dormant for many years, sometimes even for the life of the product, unless they are "coaxed" into existence. Several severity factors can be used to determine whether a latent failure will ultimately result in a complete system failure, one of those being the product's operating environment following the "injury." It is accepted that components subjected to a harsher operating environment will on average reach their end of life sooner than those not stressed nearly as hard. This particular failure mode is sometimes difficult to identify as without the proper post-mortem analysis it can be next to impossible to determine whether the product reached end-of-life due to a random failure or something more.

The third and final failure type, an early failure, is commonly referred to as "infant mortality". These failures occur soon after initial use, usually without any warning. What can be done about these seemingly unavoidable early failures? They are certainly not attributable to random failures, so by definition it should be possible to identify them and remove them via screening. One way to detect and remove these failures from the unit population is with a period of product testing known as "burn-in." Burn-in is the process by which the product is subjected to a battery of tests and periods of operation that sufficiently exercise the product to the point where these early failures can be caught prior to packaging for sale.

In the case of Intel CPUs, this process may even be conducted at elevated temperatures and voltages, known as "heat soaking." Once the product passes these initial inspections it is trustworthy enough to enter the market for continuous duty within rated specifications. Although this process can remove some of the weaker, more failure prone products from the retail pool - some of which might have very well gone on to lead a normal existence - the idea is that by identifying them earlier, fewer will come back as costly RMA requests. There's also the fact that large numbers of in-use failures can have a significant negative impact on the public perception of a company's ability to supply reliable products. Case in point: the mere mention of errata usually has most consumers up in arms before they are even aware of the applicability.



The graphic above illustrates how the observed failure rate is influenced by the removal of early failures. Because of this nearly every in-use failure can be credited as unavoidable and random in nature. By establishing environmental specifications and usage requirements that ensure near worry-free operation for the product's first three years of use, a "warranty grace period" can be offered. This removes all doubt as to whether the failure occurred because of a random event or the start of its eventual wear-out, where degradation starts to play a role in the observed failures.

Implementing process manufacturing, assembly, and testing advancements that lower the probability of random failures is a big part of improving any product's ultimate reliability rate. By carefully analyzing the operating characteristics of each new batch of processors, Intel is able to determine what works in achieving this goal and what doesn't. Changes that bring about significant improvements in product reliability - enough to offset the cost of a change to the manufacturing process - are sometimes implemented as a stepping change. The additional margin created by the change is often exploitable with respect to realizing greater overclocking potential. Let's discuss this further and see exactly what it means.

"Accurate" Temperature Monitoring? The Truth About Processor "Degradation"
Comments Locked

45 Comments

View All Comments

  • TheJian - Thursday, March 6, 2008 - link

    http://www.newegg.com/Product/Product.aspx?Item=N8...">http://www.newegg.com/Product/Product.aspx?Item=N8...

    You can buy a Radeon 3850 and triple your 6800 performance (assuming it's a GT with an ultra it would be just under triple). Check tomshardware.com and compare cards. You'd probably end up closer to double performance because of a weaker cpu, but still once you saw your fps limit due to cpu you can crank the hell out of the card for better looks in the game. $225 vs probably $650-700 for a new board+cpu+memory+vidcard+probably PSU to handle it all. If you have socket 939 you can still get a dual core Opty144 for $97 on pricewatch :) Overclock the crap out of it you might hit 2.6-2.8 and its a dual core. So around $325 for a lot longer life and easy changes. It will continue to get better as dual core games become dominant. While I would always tell someone to spend the extra money on the Intel currently (jeez, the OC'ing is amazing..run at default until slow then bump it up a ghz/core, that's awesome), if you're on a budget a dual core opty and a 3850 looks pretty good at less than half the cost and both are easy to change out. Just a chip and a card. That's like a 15 minute upgrade. Just a thought, in case you didn't know they had an excellent AGP card out there for you. :)
  • mmntech - Wednesday, March 5, 2008 - link

    I'm in the same boat with the X2 3800+. Anyway, when it comes to dual vs quad, the same rules apply back when the debate was single versus dual. Very few games support quad core but a quad will be more future proof and give better multitasking. The ultimate question is how much you want to spend, how long you intend to keep the processor, and what the future road maps for games and CPU tech are within that period.

    I'm a long time AMD/nVidia man but I'm liking what Intel and ATI are putting out. I'm definitely considering these Wolfdales, especially that sub $200 one. I'm going to wait for the prices and benchmarks for the triple core Phenoms though before I begin planning an upgrade.
  • Margalus - Wednesday, March 5, 2008 - link

    the current state of affairs generally point to the higher clocked dual core. Very few games can take advantage of 4 cores, so the more speed you get the better.
  • Spacecomber - Wednesday, March 5, 2008 - link

    This has been mentioned in a couple of articles, now, that what these processors will run at with no more than 1.45v core voltage applied is what really matters for most people buying one of these 45nm chips. So, it begs the question, what are the results at this voltage?

    While the section on processor failure was somewhat interesting, I think that it should have been a separate article.
  • retrospooty - Wednesday, March 5, 2008 - link

    "these processors will run (safely) at with no more than 1.45v core voltage applied is what really matters for most people buying one of these 45nm chips. So, it begs the question, what are the results at this voltage"

    Very good point. Since these CPU's are deemed safe up to 1.45 volt, lets see how far they clock at 1.45 volts. 4.5 ghz at 1.6 volts is nice for a suicide run, but lets see it at 1.45.
  • Spoelie - Wednesday, March 5, 2008 - link

    This reads like an excerpt of a press release:

    "We could argue that when it came to winning the admiration and approval of overclockers, enthusiasts, and power users alike, no other single common product change could have garnered the same overwhelming success."

    Except that it was not. It was a knee-jerk reaction to the K8 release way back in 2003. It was too expensive to matter to anyone except for the filthy rich. The FX around that time was more successful. In recent years they just polished the concept a bit, but gaining admiration and overwhelming success because of it?? I think not. The Conroe architecture was the catalyst, not some expensive niche product.

    "Our love affair with the quad-core began not too long ago, starting with the release of Intel's QX6700 Extreme Edition processor. Ever since then Intel has been aggressive in their campaign to promote these processors to users that demand unrivaled performance and the absolute maximum amount of jaw-dropping, raw processing power possible from a single-socket desktop solution. Quickly following their 2.66GHz quad-core offering was the QX6800 processor, a revolutionary release in its own right in that it marked the first time users could purchase a processor with four cores that operated at the same frequency as the current top dual-core bin - at the time the 2.93GHz X6800."

    Speed bump revolutionary? Oh well ;)

    No beef with the rest of the article, those two paragraphs just stand out as being overly enthousiastic, more so than informative.
  • MaulSidious - Wednesday, March 5, 2008 - link

    this articles a bit late isn't it? seeing as they been out for quite a while now.
  • MrModulator - Wednesday, March 5, 2008 - link

    Well it's being updated from time to time. I think it is relevant since Cubase 4 is still the latest version used of cubase and the performance is the same today. What is important with this is that they measure up two equally clocked processors where the difference is in the number of cores. Yes, the quad is better at higher latencys but it loses the advantage at lower latencys and even gets beaten by the dual-core.
    More of a reminder of the limitations of current day quadcores in some situations. This will probably change when Nehalem is introduced with its on-die memory controller, a higher FSB and faster DD3 memory.
  • adiposity - Wednesday, March 5, 2008 - link

    Uh, what? I think he's saying these processors were on the shelves over a month ago. This article is acting like they are just about to come out!

    -Dan
  • MrModulator - Wednesday, March 5, 2008 - link

    Yeah, you talk about games and maximum cpufrequency on dual core is important, but there are other areas that are much more interesting. Performance for sequencers where you make music (in DAW-based computeres) is seldom mentioned. It is very important to be able to cram out every ounce of performance in real-time with a lot of software synthesizers and effects using a low latency setting(not memory latency but the delay from when you press a key on the synt until it is procesed in the computer and put out from the soundcard for example).
    Here's an interesting benchmark:

    http://www.adkproaudio.com/benchmarks.cfm">http://www.adkproaudio.com/benchmarks.cfm
    (Sorry, using the linking button didn't work, you have to copy the link manually)
    If you scroll down to the Cubase4/Nuendo3 Test you can compare the QX6850(4 core) with the E6850 (2 core). They both run at 3 GHz. Look at what happens when the latency is lowered. Yes the dualcore actually beats the quadcore, even though these applications use all cores available. The reason could be that all 4 cores compete for the fsb and memory access when the latency is really low. Very interesting indeed, as DAW is an area in much more for cpu than gaming...

Log in

Don't have an account? Sign up now