Thread It Like Its Hot

Hyper Threading was a great technology, simply first introduced on the wrong processor. The execution units of any modern day microprocessor are power hungry and consume a lot of die space, the last thing you want is to have them be idle with nothing to do. So you implement various tricks to keep them fed and working as often as possible. You increase cache sizes to make sure they never have to wait on main memory, you integrate a memory controller to ensure that trips to main memory are as speedy as possible, you prefetch data that you think you'll need in the future, you predict branches, etc...

Enabling simultaneous multi-threaded (SMT) execution is one of the most power efficient uses of a microprocessor's transistor budget, as it requires a very minimal increase in die size but can easily double the utilization of a CPU's execution units. SMT, or as Intel calls it, Hyper Threading does this by simply dispatching two threads of instructions to an individual processor core at the same time without increasing the available execution resources. Parallelism is paramount to extracting peak performance out of any out of order core, double the number of instructions being looked at to extract parallelism from and you increase your likelihood of getting work done without waiting on other instructions to retire or data to come back from memory.

In the Pentium 4 days enabling Hyper Threading required less than a 5% increase in die size but resulted in anywhere from a 0 - 35% increase in performance. On the desktop we rarely saw a big boost in performance except in multitasking scenarios, but these days multithreaded software is far more common than it was six years ago when Hyper Threading first made its debut.


This table shows what needed to be added, partitioned, shared or unchanged to enable Hyper Threading on Intel's Core microarchitecture

When the Pentium 4 made its debut however all we really had to worry about was die size, power consumption had yet to become a big issue (which the P4 promptly changed). These days power efficiency, die size and performance all go hand in hand and thus the benefits of Hyper Threading must also be looked at from the power perspective.

I took a small sampling of benchmarks ranging from things like POV-Ray which scales very well with more threads to iTunes, an application that couldn't care less if you had more than two cores. What we're looking at here are the performance and power impact due to Hyper Threading:

Intel Core i7-965 (Nehalem 3.2GHz) POV-Ray 3.7 Beta 29 Cinebench R10 1CPU Race Driver GRID
HT Disabled 3239 PPS 207W 4671 CBMarks 161.8W 103 fps 300.7W
HT Enabled 4202 PPS 233.7W 4452 CBMarks 159.5W 102.9 fps 302W

 

Looking at POV-Ray we see a 30% increase in performance for a 12% increase in total system power consumption, that more than exceeds Intel's 2:1 rule for performance improvement vs. increase in power consumption. The single threaded Cinebench test shows a slight decrease in both performance and power consumption (negligible) and the same can be said for Race Driver GRID.

When Hyper Threading improves performance, it does so at a reasonable increase in power consumption. When performance isn't impacted, neither is power consumption. This time around Hyper Threading has no drawbacks, while before the only way to get it was with a processor that was too hot and barely competitive, today Intel offers it on an architecture that we actually like. Hyper Threading is actually the first indication of Nehalem's true strength, not performance, but rather power efficiency...

Intel's Warning on Memory Voltage Is Nehalem Efficient?
POST A COMMENT

74 Comments

View All Comments

  • Kaleid - Monday, November 3, 2008 - link

    http://www.guru3d.com/news/intel-core-i7-multigpu-...">http://www.guru3d.com/news/intel-core-i...and-cros... Reply
  • bill3 - Monday, November 3, 2008 - link

    Umm, seems the guru3d gains are probably explained by them using a dual core core2dou versus quad core i7...Quad core's run multi-gpu quiet a bit better I believe.

    Reply
  • tynopik - Monday, November 3, 2008 - link

    what about those multi-threading tests you used to run with 20 tabs open in firefox while running av scan while compressing some files while converting something else while etc etc?

    this might be more important for daily performance than the standard desktop benchmarks
    Reply
  • D3SI - Monday, November 3, 2008 - link


    So the low end i7s are OC'able?

    what the hell is toms hardware talking about lol
    Reply
  • conquerist - Monday, November 3, 2008 - link

    Concerning x264, Nehalem-specific improvements are coming as soon as the developers are free from their NDA.
    See http://x264dev.multimedia.cx/?p=40">http://x264dev.multimedia.cx/?p=40.
    Reply
  • Spectator - Monday, November 3, 2008 - link

    can they do some CUDA optimizations?. im guessing that video hardware has more processors than quad core intel :P

    If all this i7 is new news and does stuff xx faster with 4 core's. how does 100+ core video hardware compare?.

    Yes im messing but giant Intel want $1k for best i7 cpu. when likes of nvid make bigger transistor count silicon using a lesser process and others manufacture rest of vid card for $400-500 ?

    Where is the Value for money in that. Chukkle.
    Reply
  • gramboh - Monday, November 3, 2008 - link

    The x264 team has specifically said they will not be working on CUDA development as it is too time intensive to basically start over from scratch in a more complex development environment. Reply
  • npp - Monday, November 3, 2008 - link

    CUDA Optimizations? I bet you don't understand completely what you're talking about. You can't just optimize a piece of software for CUDA, you MUST write it from scratch for CUDA. That's the reason why you don't see too much software for nVidia GPUs, even though the CUDA concept was introduced at least two years ago. You have the BadaBOOM stuff, but it's far for mature, and the reason is that writing a sensible application for CUDA isn't exactly an easy task. Take your time to look at how it works and you'll understand why.

    You can't compare the 100+ cores of your typical GPU with a quad core directly, they are fundamentaly different in nature, with your GPU "cores" being rather limited in functionality. GPGPU is a nice hype, but you simply can't offload everything on a GPU.

    As a side note, top-notch hardware always carries price premium, and Intel has had this tradition with high-end CPUs for quite a while now. There are plenty of people who need absolutely the fastest harware around and won't hesitate paying it.
    Reply
  • Spectator - Monday, November 3, 2008 - link

    Some of us want more info.

    A) How does the integrated Thermal sensor work with -50+c temps.

    B) Can you Circumvent the 130W max load sensor

    C) what are all those connection points on the top of the processor for?.

    lol. Where do i put the 2B pencil to. to join that sht up so i dont have to worry about multiply settings or temp sensors or wattage sensors.

    Hey dont shoot the messenger. but those top side chip contacts seem very curious and obviously must serve a purpose :P

    Reply
  • Spectator - Monday, November 3, 2008 - link

    Wait NO. i have thought about it..

    The contacts on top side could be for programming the chips default settings.

    You know it makes sence.Perhaps its adjustable sram style, rather than burning connections.

    yes some technical peeps can look at that. but still I want the fame for suggesting it first. lmao.

    Have fun. but that does seem logical to build in some scope for alteration. alot easier to manufacture 1 solid item then mod your stock to suit market when you feel its neccessary.

    Spectator.
    Reply

Log in

Don't have an account? Sign up now