Is Nehalem Efficient?

At this year's IDF in San Francisco, Intel revealed a little discussed but extremely important aspect of Nehalem's circuit design:

The Nehalem design is Intel's first microprocessor in the past two decades to feature absolutely no domino logic, it's a fully static CMOS design. I've explained the differences between dynamic domino and static CMOS design in the past, but simply put: domino logic is used as a clock speed play. It's incredibly useful in implementing very high speed circuit paths on a chip and hit its all time peak in Intel's usage in the Pentium 4 days. The downside to using such high speed logic is that it requires a lot of power, but in microprocessor design there are always tradeoffs to be made.


There are many other energy efficiency plays within Nehalem

In Nehalem, Intel took the new architecture as an opportunity to revamp its design, went in and removed all remaining domino logic - but without impacting the peak clock speed of the architecture. The tradeoff here is one of die size, by using more parallel logic Intel was able to convert some serial, high speed paths, into larger, slower circuits that removed the need for domino logic. Details are unfortunately light and a bit beyond the scope of this review, but the move to an all static CMOS design is bound to reduce power consumption. Do you smell a comparison coming?

Both Nehalem and Penryn are built on the same 45nm process, available at the same clock speeds and capable of running the very same applications. In theory, Nehalem should be more power efficient, at the same clock speed, across the board thanks to its static CMOS design. To find out I measured average power consumption over the duration of a handful of benchmarks I used in this review.

Performance POV-Ray 3.7 Cinebench XCPU x264 HD Crysis
Intel Core 2 Quad Q9450 (Penryn - 2.66GHz) 2238 PPS 11502 CBMarks 61.5 fps 34.0 fps
Intel Core i7-920 (Nehalem - 2.66GHz) 3528 PPS 16211 CBMarks 74.8 fps 33.2 fps
Nehalem Performance Advantage 57.6% 40.9% 21.6% -2%

 

I picked these four benchmarks because they show us the range of Nehalem's performance, going from no performance improvement all the way up to a gain of nearly 60%. Now let's look at the power consumption in each of these four benchmarks:

Power Consumption POV-Ray 3.7 Cinebench XCPU x264 HD Crysis
Intel Core 2 Quad Q9450 (Penryn - 2.66GHz) 168.1W 175.2W 167.5W 220.8W
Intel Core i7-920 (Nehalem - 2.66GHz) 202.2W 208.6W 176.6W 230.8W
Nehalem Power Disadvantage +34.1W +33.4W +9.1W +10W

 

If you actually go through and do the math you'll find that Nehalem, despite using more power, is more efficient than Penryn. Performance per watt is around 24% better in POV-Ray, 15.5% better in Cinebench and 13% better in the x264 HD test. Crysis, the only benchmark where Nehalem actually falls behind, does require more power and thus Nehalem loses the efficiency battle there.

It seems as if Nehalem is even more polarizing than I had though. Despite the move to a fully static CMOS design, the changes aren't enough to make up for the scenario where Nehalem can't offer more performance; power consumption still goes up, albeit not terribly.

It's also worth noting that the power comparison really depends on the CPU used, here we've got the same comparison but with the Core i7-965 vs. the Core 2 Extreme QX9770, both clocked at 3.2GHz:

Performance POV-Ray 3.7 Cinebench R10 - XCPU x264 HD Crysis
Intel Core 2 Extreme QX9770 (Penryn - 3.2GHz) 2641 PPS 14065 CBMarks 73.2 fps 41.7 fps
Intel Core i7-965 (Nehalem - 3.2GHz) 4202 PPS 18810 CBMarks 85.8 fps 40.5 fps

 

Power Consumption POV-Ray 3.7 Cinebench R10 - XCPU x264 HD Crysis
Intel Core 2 Extreme QX9770 (Penryn - 3.2GHz) 230.7W 227.6W 230.3W 293.6W
Intel Core i7-965 (Nehalem - 3.2GHz) 233.7W 230.7W 196.2W 248.5W

 

It's tough to draw any conclusions based on two CPUs, but it is possible that at higher clock speeds Nehalem's efficiency advantage kicks in. The QX9770 has always been a bit high on the power consumption side, whereas the i7-965, even in situations where it is slower than the QX9770, offers better power efficiency here.

Thread It Like Its Hot Turbo Mode: Gimmicky or Useful?
Comments Locked

73 Comments

View All Comments

  • Kaleid - Monday, November 3, 2008 - link

    http://www.guru3d.com/news/intel-core-i7-multigpu-...">http://www.guru3d.com/news/intel-core-i...and-cros...
  • bill3 - Monday, November 3, 2008 - link

    Umm, seems the guru3d gains are probably explained by them using a dual core core2dou versus quad core i7...Quad core's run multi-gpu quiet a bit better I believe.

  • tynopik - Monday, November 3, 2008 - link

    what about those multi-threading tests you used to run with 20 tabs open in firefox while running av scan while compressing some files while converting something else while etc etc?

    this might be more important for daily performance than the standard desktop benchmarks
  • D3SI - Monday, November 3, 2008 - link


    So the low end i7s are OC'able?

    what the hell is toms hardware talking about lol
  • conquerist - Monday, November 3, 2008 - link

    Concerning x264, Nehalem-specific improvements are coming as soon as the developers are free from their NDA.
    See http://x264dev.multimedia.cx/?p=40">http://x264dev.multimedia.cx/?p=40.
  • Spectator - Monday, November 3, 2008 - link

    can they do some CUDA optimizations?. im guessing that video hardware has more processors than quad core intel :P

    If all this i7 is new news and does stuff xx faster with 4 core's. how does 100+ core video hardware compare?.

    Yes im messing but giant Intel want $1k for best i7 cpu. when likes of nvid make bigger transistor count silicon using a lesser process and others manufacture rest of vid card for $400-500 ?

    Where is the Value for money in that. Chukkle.
  • gramboh - Monday, November 3, 2008 - link

    The x264 team has specifically said they will not be working on CUDA development as it is too time intensive to basically start over from scratch in a more complex development environment.
  • npp - Monday, November 3, 2008 - link

    CUDA Optimizations? I bet you don't understand completely what you're talking about. You can't just optimize a piece of software for CUDA, you MUST write it from scratch for CUDA. That's the reason why you don't see too much software for nVidia GPUs, even though the CUDA concept was introduced at least two years ago. You have the BadaBOOM stuff, but it's far for mature, and the reason is that writing a sensible application for CUDA isn't exactly an easy task. Take your time to look at how it works and you'll understand why.

    You can't compare the 100+ cores of your typical GPU with a quad core directly, they are fundamentaly different in nature, with your GPU "cores" being rather limited in functionality. GPGPU is a nice hype, but you simply can't offload everything on a GPU.

    As a side note, top-notch hardware always carries price premium, and Intel has had this tradition with high-end CPUs for quite a while now. There are plenty of people who need absolutely the fastest harware around and won't hesitate paying it.
  • Spectator - Monday, November 3, 2008 - link

    Some of us want more info.

    A) How does the integrated Thermal sensor work with -50+c temps.

    B) Can you Circumvent the 130W max load sensor

    C) what are all those connection points on the top of the processor for?.

    lol. Where do i put the 2B pencil to. to join that sht up so i dont have to worry about multiply settings or temp sensors or wattage sensors.

    Hey dont shoot the messenger. but those top side chip contacts seem very curious and obviously must serve a purpose :P

  • Spectator - Monday, November 3, 2008 - link

    Wait NO. i have thought about it..

    The contacts on top side could be for programming the chips default settings.

    You know it makes sence.Perhaps its adjustable sram style, rather than burning connections.

    yes some technical peeps can look at that. but still I want the fame for suggesting it first. lmao.

    Have fun. but that does seem logical to build in some scope for alteration. alot easier to manufacture 1 solid item then mod your stock to suit market when you feel its neccessary.

    Spectator.

Log in

Don't have an account? Sign up now