Intel's 45nm Dual-Core E8500: The Best Just Got Betterby Kris Boughton on March 5, 2008 3:00 AM EST
- Posted in
"Accurate" Temperature Monitoring?
In the past, internal CPU temperatures were sensed using a single on-die diode connected to an external measurement circuit, which allowed for an easy means of monitoring and reporting "actual" processor temperatures in near real-time. Many motherboard manufacturers took advantage of this capability by interfacing the appropriate processor pins/pads to an onboard controller, such as one of any of the popular Super I/O chips available from Winbond. Super I/O chips typically control most if not all of the standard motherboard input/output traffic associated with common interfaces including floppy drives, PS/2 mice and keyboards, high-speed programmable serial communications ports (UARTs), and SPP/EPP/ECP-enabled parallel ports. Using either a legacy ISA bus interface or a newer LPC (low pin-count) interface, the Super I/O also monitors several critical PC hardware parameters like power supply voltages, temperatures, and fan speeds.
This method of monitoring CPU temperature functioned satisfactorily up until Intel conducted their first process shrink to 65nm. The reduction in circuit size influenced some of the temperature-sensing diode's operating characteristics enough that no amount of corrective calibration effort would be able to ensure sufficient accuracy over the entire reporting range. From this point on Intel engineers knew they would need something better. From this came the design we see effectively utilized in every CPU produced by Intel today, starting with Yonah - one of the first 65nm processors and a precursor to the now wildly-successful Core 2 architecture.
The new design, called a Digital Thermal Sensor (DTS), no longer relied on the use of an external biasing circuit where power conditioning tolerances and slight variances in sense line impedances can introduce rather large signaling errors. Because of this, many of the reporting discrepancies noted using the older monitoring methods were all but eliminated. Instead of relying on each motherboard manufacturer to design and implement this external interface, Intel made it possible for core temperatures to be retrieved easily, all without the need for any specialized hardware. This was accomplished through the development and documentation of a standard method for reading these values directly from a single model specific registers (MSR) and then computing actual temperatures by applying a simple transformation formula. This way the complicated process of measuring these values would be well hidden from the vendor.
A few quick lines of code (excluding the custom device driver required) is all that is needed to quickly retrieve and report values encoded in an MSR for each core.
The transformation formula we spoke of is actually exceedingly simple to implement. Instead of storing the absolute temperature for each core, the MSR is designed to essentially count down the margin to the core's maximum thermal limit, often incorrectly referred to as "Tjunction" (or junction temperature). When this value reaches zero, the core temperature has reached its Tjunction set point. Therefore, calculating the actual temperature should be as easy as subtracting the remaining margin (stored in the MSR) from the processor's known Tjunction value. There is a problem however: Intel has never published Tjunction values for any CPU other than the mobile models. The reason for this is simple. Since mobile processors lack an integrated heat spreader (IHS), it is not possible to establish a thermal specification with respect to its maximum case temperature ("Tcase"), normally measured from an embedded probe located top, dead-center in the IHS.
Thus, calculating actual core temperatures requires two separate data points, only one of which is readily available from the MSR. The other, Tjunction, must be known ahead of time. Early implementations of the process used to determine a processor's particular Tjunction value by isolating a single status bit from a different MSR that was used flag whether the part in question was engineered to a maximum Tjunction set point of either 85ºC or 100ºC. Because Merom - the mobile Core 2 version of Conroe - used one of these two values, it was somehow decided that the desktop products, built on the same process, must also share these set points. Unfortunately, it turns out this is not the case.
All of these prominent means for monitoring Intel CPU core temperatures are based on assumed maximum Tjunction setpoints which cannot be verified.
More than a few programs have been released over the last few years, each claiming to accurately report these DTS values in real-time. The truth is that none can be fully trusted as the Tjunction values utilized in these transformations may not always be correct. Moreover, Intel representatives have informed us that these as-of-yet unpublished Tjunction values may actually vary from model to model - sometimes even between different steppings - and that the temperature response curves may not be entirely accurate across the whole reporting range. Since all of today's monitoring programs have come to incorrectly assume that Tjunction values are a function of the processor family/stepping only, we have no choice but to call everything we thought we had come to know into question. Until Intel decides to publish these values on a per-model basis, the best these DTS readings can do for us is give a relative indication of each core's remaining thermal margin, whatever that may be.