"Accurate" Temperature Monitoring?

In the past, internal CPU temperatures were sensed using a single on-die diode connected to an external measurement circuit, which allowed for an easy means of monitoring and reporting "actual" processor temperatures in near real-time. Many motherboard manufacturers took advantage of this capability by interfacing the appropriate processor pins/pads to an onboard controller, such as one of any of the popular Super I/O chips available from Winbond. Super I/O chips typically control most if not all of the standard motherboard input/output traffic associated with common interfaces including floppy drives, PS/2 mice and keyboards, high-speed programmable serial communications ports (UARTs), and SPP/EPP/ECP-enabled parallel ports. Using either a legacy ISA bus interface or a newer LPC (low pin-count) interface, the Super I/O also monitors several critical PC hardware parameters like power supply voltages, temperatures, and fan speeds.

This method of monitoring CPU temperature functioned satisfactorily up until Intel conducted their first process shrink to 65nm. The reduction in circuit size influenced some of the temperature-sensing diode's operating characteristics enough that no amount of corrective calibration effort would be able to ensure sufficient accuracy over the entire reporting range. From this point on Intel engineers knew they would need something better. From this came the design we see effectively utilized in every CPU produced by Intel today, starting with Yonah - one of the first 65nm processors and a precursor to the now wildly-successful Core 2 architecture.

The new design, called a Digital Thermal Sensor (DTS), no longer relied on the use of an external biasing circuit where power conditioning tolerances and slight variances in sense line impedances can introduce rather large signaling errors. Because of this, many of the reporting discrepancies noted using the older monitoring methods were all but eliminated. Instead of relying on each motherboard manufacturer to design and implement this external interface, Intel made it possible for core temperatures to be retrieved easily, all without the need for any specialized hardware. This was accomplished through the development and documentation of a standard method for reading these values directly from a single model specific registers (MSR) and then computing actual temperatures by applying a simple transformation formula. This way the complicated process of measuring these values would be well hidden from the vendor.



A few quick lines of code (excluding the custom device driver required) is all that is needed to quickly retrieve and report values encoded in an MSR for each core.

The transformation formula we spoke of is actually exceedingly simple to implement. Instead of storing the absolute temperature for each core, the MSR is designed to essentially count down the margin to the core's maximum thermal limit, often incorrectly referred to as "Tjunction" (or junction temperature). When this value reaches zero, the core temperature has reached its Tjunction set point. Therefore, calculating the actual temperature should be as easy as subtracting the remaining margin (stored in the MSR) from the processor's known Tjunction value. There is a problem however: Intel has never published Tjunction values for any CPU other than the mobile models. The reason for this is simple. Since mobile processors lack an integrated heat spreader (IHS), it is not possible to establish a thermal specification with respect to its maximum case temperature ("Tcase"), normally measured from an embedded probe located top, dead-center in the IHS.

Thus, calculating actual core temperatures requires two separate data points, only one of which is readily available from the MSR. The other, Tjunction, must be known ahead of time. Early implementations of the process used to determine a processor's particular Tjunction value by isolating a single status bit from a different MSR that was used flag whether the part in question was engineered to a maximum Tjunction set point of either 85ºC or 100ºC. Because Merom - the mobile Core 2 version of Conroe - used one of these two values, it was somehow decided that the desktop products, built on the same process, must also share these set points. Unfortunately, it turns out this is not the case.



All of these prominent means for monitoring Intel CPU core temperatures are based on assumed maximum Tjunction setpoints which cannot be verified.

More than a few programs have been released over the last few years, each claiming to accurately report these DTS values in real-time. The truth is that none can be fully trusted as the Tjunction values utilized in these transformations may not always be correct. Moreover, Intel representatives have informed us that these as-of-yet unpublished Tjunction values may actually vary from model to model - sometimes even between different steppings - and that the temperature response curves may not be entirely accurate across the whole reporting range. Since all of today's monitoring programs have come to incorrectly assume that Tjunction values are a function of the processor family/stepping only, we have no choice but to call everything we thought we had come to know into question. Until Intel decides to publish these values on a per-model basis, the best these DTS readings can do for us is give a relative indication of each core's remaining thermal margin, whatever that may be.

So Dual-Cores are no Longer Extreme? Determining a Processor Warranty Period
POST A COMMENT

45 Comments

View All Comments

  • mdma35 - Friday, October 09, 2009 - link

    Epic Article was pleasure to read thnx for sucj informative stuff Reply
  • jamstan - Sunday, July 13, 2008 - link

    I just did a build with an E8500. The temp always shows 30 degrees no matter how high I overclock it or what speed I have my Vantec Tornado at. Being an overclocker it stinks that I bought a cpu with a temp sensor that doesn't work. I guess its a common problem with this cpu and I hear Intel won't RMA a cpu with a bad sensor. I'm gonna be giving them a call. Reply
  • Johnbear007 - Saturday, March 08, 2008 - link

    I'd still like to know (other than microcenter) what retailer(S) are carrying the q6600 for "under 200$". I would much rather have a sub 200$ q6600 than a 260$ e8400 from mwave Reply
  • MrSpadge - Thursday, March 06, 2008 - link

    I do not agree with much of mindless1's critique on page 3, but we arrive at a somewhat similar conclusion: the section " The Truth About Processor "Degradation" " is lacking. Rather than adressing my issues with mindless1's post I'll just explain my point.

    Showing the influence of temperature on reliability is nice and well, but you neglect the factor which is by far the most important: voltage. It's effect on reliability / expected lifetime / MTTF is much higher than temperature (within sane limits).

    How did you generate the curves in the first plot on that page? Is it just a guess or do you have exact data? Since you mention the 8500 specifically I can imagine that you got the data (or formula) from some insider. If so I'd be curious about how these curves look like if you apply e.g. 1.45 V. There should be a drastic reduction in lifetime.

    If you don't think voltage is that important and you have no ways to adjust the calculations, you could pm dmens here at AT. I'd say he's expert enough in this field.

    MrS
    Reply
  • Toferman - Thursday, March 06, 2008 - link

    Another great article, thanks for your work on this Kris. :) Reply
  • xkon - Thursday, March 06, 2008 - link

    where are the sub $200 q6600's? i know microcenter had some for $200, but they are no where near me. any other ones? stating it in the article like that makes me think they are available at almost any retailer for that price. maybe if it was rephrased to something like they have been known to be priced as low as $200 or something like that. then again. maybe i'm not in the know, and am just not looking hard enough. Reply
  • TheJian - Thursday, March 06, 2008 - link

    Yet another example of lies. The cheapest Q6600 on pricewatch is $243. And that doesn't come with a 3yr warranty OR a heatsink. So really the cheapest is $253 for retail box with heatsink/fan and 3yr. That's a FAR cry from $200. Cheapest on Cnet.com is $255. Where did they search to find these magical $200 Q6600 chips? I want one. I suspect pricegrabber etc would show the same. I'm too lazy to check now...LOL Reply
  • MaulSidious - Thursday, March 06, 2008 - link

    dunno about america but in britain you can get a q6600 anywhere for 130-150 pounds Reply
  • Johnbear007 - Thursday, March 06, 2008 - link

    150 pounds is about 250-300$ american which is nowhere near what the articles author is claiming. One microcenter deal doesnt really constitute claiming you can bag one from retailer(S) for under 200$. Also, another poster pointed to what he called a q6700 for 80$. That is not true, it was an e6700 which is dual core not quad. Reply
  • Karaktu - Wednesday, March 05, 2008 - link

    I would just like to point out that it has been possible to run a sub-90-watt maximum HTPC for nearly two years. In fact, I've been doing it.

    It DOES require a Core Duo or Core 2 Duo mobile chip, but MoD isn't a new concept.

    ASUS N4L-VM DH
    - Using onboard Intel graphics, Realtek SPDIF and Gigabit network
    Core Duo T2500 (2.0GHz)
    - Cooled by a Nactua NC-U6 northbridge cooler and 60mm fan set to low
    2 x 1GB DDR2 667
    Vista View D1N1-E NTSC/ATSC PCI-E tuner
    Vista View D1N1-I NTSC/ATSC PCI tuner
    - (That's two analog and two HDTV tuners)
    1TB WDC GP 5400rpm hard drive
    750GB Samsung Spinpoint F1 7200rpm hard drive
    Antec Fusion case (rev 1)
    - VFD
    - 430-watt 80 Plus power supply
    - 2 x 120mm TriCool fans set to low
    - External IR for remote and keyboard
    Running MCE 2005

    Idles at 68 watts AT THE WALL and draws a maximum of 90 watts at full load (recording 4 shows and watching a fifth show/movie).

    If I ever get around to dropping the PSU to an EA-380, I'm sure the efficiency would go up a little since I would be closer to that magic 20 - 80% range on the power supply.

    Joe
    Reply

Log in

Don't have an account? Sign up now