Thread It Like Its Hot

Hyper Threading was a great technology, simply first introduced on the wrong processor. The execution units of any modern day microprocessor are power hungry and consume a lot of die space, the last thing you want is to have them be idle with nothing to do. So you implement various tricks to keep them fed and working as often as possible. You increase cache sizes to make sure they never have to wait on main memory, you integrate a memory controller to ensure that trips to main memory are as speedy as possible, you prefetch data that you think you'll need in the future, you predict branches, etc...

Enabling simultaneous multi-threaded (SMT) execution is one of the most power efficient uses of a microprocessor's transistor budget, as it requires a very minimal increase in die size but can easily double the utilization of a CPU's execution units. SMT, or as Intel calls it, Hyper Threading does this by simply dispatching two threads of instructions to an individual processor core at the same time without increasing the available execution resources. Parallelism is paramount to extracting peak performance out of any out of order core, double the number of instructions being looked at to extract parallelism from and you increase your likelihood of getting work done without waiting on other instructions to retire or data to come back from memory.

In the Pentium 4 days enabling Hyper Threading required less than a 5% increase in die size but resulted in anywhere from a 0 - 35% increase in performance. On the desktop we rarely saw a big boost in performance except in multitasking scenarios, but these days multithreaded software is far more common than it was six years ago when Hyper Threading first made its debut.


This table shows what needed to be added, partitioned, shared or unchanged to enable Hyper Threading on Intel's Core microarchitecture

When the Pentium 4 made its debut however all we really had to worry about was die size, power consumption had yet to become a big issue (which the P4 promptly changed). These days power efficiency, die size and performance all go hand in hand and thus the benefits of Hyper Threading must also be looked at from the power perspective.

I took a small sampling of benchmarks ranging from things like POV-Ray which scales very well with more threads to iTunes, an application that couldn't care less if you had more than two cores. What we're looking at here are the performance and power impact due to Hyper Threading:

Intel Core i7-965 (Nehalem 3.2GHz) POV-Ray 3.7 Beta 29 Cinebench R10 1CPU Race Driver GRID
HT Disabled 3239 PPS 207W 4671 CBMarks 161.8W 103 fps 300.7W
HT Enabled 4202 PPS 233.7W 4452 CBMarks 159.5W 102.9 fps 302W

 

Looking at POV-Ray we see a 30% increase in performance for a 12% increase in total system power consumption, that more than exceeds Intel's 2:1 rule for performance improvement vs. increase in power consumption. The single threaded Cinebench test shows a slight decrease in both performance and power consumption (negligible) and the same can be said for Race Driver GRID.

When Hyper Threading improves performance, it does so at a reasonable increase in power consumption. When performance isn't impacted, neither is power consumption. This time around Hyper Threading has no drawbacks, while before the only way to get it was with a processor that was too hot and barely competitive, today Intel offers it on an architecture that we actually like. Hyper Threading is actually the first indication of Nehalem's true strength, not performance, but rather power efficiency...

Intel's Warning on Memory Voltage Is Nehalem Efficient?
Comments Locked

73 Comments

View All Comments

  • Clauzii - Thursday, November 6, 2008 - link

    I still use PS/2. None of the USB keyboards I've borrowed or tried out would work in 'boot'. Also I think a PS/2 keyboard/mouse don't lag so much, maybe because it has it's own non-shared interrupt line.

    But I can see a problem with PS/2 in the future, with keyboards like the Art Lebedev ones. When that technology gets more pocket friendly I'd gladly like to see upgraded but still dedicated keyboard/mouse connectors.
  • The0ne - Monday, November 3, 2008 - link

    Yes. I have the PS2 keyboard on-hand in case my USB keyboard can't get in :)
  • Strid - Monday, November 3, 2008 - link

    Ahh, makes sense. Thanks for clarifying!
  • Genx87 - Monday, November 3, 2008 - link

    After living through the hell that were ATI drivers back in 2003-2004 on a 9600 Pro AIW. I didnt learn and I plopped money down on a 4850 and have had terrible driver quality since. More BSOD from the ati driver than I have had in windows in the past 5 years combined from anything. Back to Nvidia for me when I get a chance.

    That said this review is pretty much what I expected after reading the preview article in August. They are really trying to recapture market in the 4 socket space. A place where AMD has been able to do well. This chip is designed for server work. Ill pick one up after my E8400 runs out of steam.
  • Griswold - Tuesday, November 4, 2008 - link

    You're just not clever enough to setup your system properly. I have two indentical systems sitting here side by side with the only difference being the video card (HD3870 in one and a 8800GT in the other) and the box with the nvidia cards gives me order of magnitude more headaches due to crashing driver. While that also happens on the 3870 machine now and then, its nowehere nearly as often. But the best part: none of the produces a BSOD. That is why I know you're most likely the culprit (the alternative is faulty hardware or a pathetic overclock).
  • Lord 666 - Monday, November 3, 2008 - link

    The stock speed of a Q9550 is 2.83ghz, not 2.66qhz.

    Why the handicap?
  • Anand Lal Shimpi - Monday, November 3, 2008 - link

    My mistake, it was a Q9450 that was used. The Q9550 label was from an earlier version of the spreadsheet that got canned due to time constraints. I wanted a clock-for-clock comparison with the i7-920 which runs at 2.66GHz.

    Take care,
    Anand
  • faxon - Monday, November 3, 2008 - link

    toms hardware published an article detailing that there would be a cap on how high you are allowed to clock your part before it would downclock it back to stock. since this is an integrated par of the core, you can only turn it off/up/down if they unlock it. the limit was supposedly a 130watt thermal dissipation mark. what effect did this have in your tests on overclocking the 920?
  • Gary Key - Monday, November 3, 2008 - link

    We have not had any problems clocking our 920 to the 3.6GHz~3.8GHz level with proper cooling. The 920, 940, and 965 will all clock down as core temps increase above the 80C level. We noticed half step decreases above 80C or so and watched our core multipliers throttle down to as low as 5.5 when core temps exceeded 90C and then increase back to normal as temperatures were lowered.

    This occurred with stock voltages or with the VCore set to 1.5V, it was dependent on thermals, not voltages or clock speeds in our tests. That said, I am still running a battery of tests on the 920 right now, but I have not seen an artificial cap yet. That does not mean it might not exist, just that we have not triggered it yet.

    I will try the 920 on the Intel board that Toms used this morning to see if it operates any differently than the ASUS and MSI boards.
  • Th3Eagle - Monday, November 3, 2008 - link

    I wonder how close you came to those temperatures while overclocking these processors.

    The 920 to 3.6/3.8 is a nice overclock but I wonder what you mean by proper cooling and how close you came to crossing the 80C "boundary"?

Log in

Don't have an account? Sign up now