Is Nehalem Efficient?

At this year's IDF in San Francisco, Intel revealed a little discussed but extremely important aspect of Nehalem's circuit design:

The Nehalem design is Intel's first microprocessor in the past two decades to feature absolutely no domino logic, it's a fully static CMOS design. I've explained the differences between dynamic domino and static CMOS design in the past, but simply put: domino logic is used as a clock speed play. It's incredibly useful in implementing very high speed circuit paths on a chip and hit its all time peak in Intel's usage in the Pentium 4 days. The downside to using such high speed logic is that it requires a lot of power, but in microprocessor design there are always tradeoffs to be made.


There are many other energy efficiency plays within Nehalem

In Nehalem, Intel took the new architecture as an opportunity to revamp its design, went in and removed all remaining domino logic - but without impacting the peak clock speed of the architecture. The tradeoff here is one of die size, by using more parallel logic Intel was able to convert some serial, high speed paths, into larger, slower circuits that removed the need for domino logic. Details are unfortunately light and a bit beyond the scope of this review, but the move to an all static CMOS design is bound to reduce power consumption. Do you smell a comparison coming?

Both Nehalem and Penryn are built on the same 45nm process, available at the same clock speeds and capable of running the very same applications. In theory, Nehalem should be more power efficient, at the same clock speed, across the board thanks to its static CMOS design. To find out I measured average power consumption over the duration of a handful of benchmarks I used in this review.

Performance POV-Ray 3.7 Cinebench XCPU x264 HD Crysis
Intel Core 2 Quad Q9450 (Penryn - 2.66GHz) 2238 PPS 11502 CBMarks 61.5 fps 34.0 fps
Intel Core i7-920 (Nehalem - 2.66GHz) 3528 PPS 16211 CBMarks 74.8 fps 33.2 fps
Nehalem Performance Advantage 57.6% 40.9% 21.6% -2%

 

I picked these four benchmarks because they show us the range of Nehalem's performance, going from no performance improvement all the way up to a gain of nearly 60%. Now let's look at the power consumption in each of these four benchmarks:

Power Consumption POV-Ray 3.7 Cinebench XCPU x264 HD Crysis
Intel Core 2 Quad Q9450 (Penryn - 2.66GHz) 168.1W 175.2W 167.5W 220.8W
Intel Core i7-920 (Nehalem - 2.66GHz) 202.2W 208.6W 176.6W 230.8W
Nehalem Power Disadvantage +34.1W +33.4W +9.1W +10W

 

If you actually go through and do the math you'll find that Nehalem, despite using more power, is more efficient than Penryn. Performance per watt is around 24% better in POV-Ray, 15.5% better in Cinebench and 13% better in the x264 HD test. Crysis, the only benchmark where Nehalem actually falls behind, does require more power and thus Nehalem loses the efficiency battle there.

It seems as if Nehalem is even more polarizing than I had though. Despite the move to a fully static CMOS design, the changes aren't enough to make up for the scenario where Nehalem can't offer more performance; power consumption still goes up, albeit not terribly.

It's also worth noting that the power comparison really depends on the CPU used, here we've got the same comparison but with the Core i7-965 vs. the Core 2 Extreme QX9770, both clocked at 3.2GHz:

Performance POV-Ray 3.7 Cinebench R10 - XCPU x264 HD Crysis
Intel Core 2 Extreme QX9770 (Penryn - 3.2GHz) 2641 PPS 14065 CBMarks 73.2 fps 41.7 fps
Intel Core i7-965 (Nehalem - 3.2GHz) 4202 PPS 18810 CBMarks 85.8 fps 40.5 fps

 

Power Consumption POV-Ray 3.7 Cinebench R10 - XCPU x264 HD Crysis
Intel Core 2 Extreme QX9770 (Penryn - 3.2GHz) 230.7W 227.6W 230.3W 293.6W
Intel Core i7-965 (Nehalem - 3.2GHz) 233.7W 230.7W 196.2W 248.5W

 

It's tough to draw any conclusions based on two CPUs, but it is possible that at higher clock speeds Nehalem's efficiency advantage kicks in. The QX9770 has always been a bit high on the power consumption side, whereas the i7-965, even in situations where it is slower than the QX9770, offers better power efficiency here.

Thread It Like Its Hot Turbo Mode: Gimmicky or Useful?
POST A COMMENT

74 Comments

View All Comments

  • Spectator - Monday, November 03, 2008 - link

    that sht is totally logical.

    And Im proper impressed. I would do that.

    you can re-process your entire stock at whim to satisfy the current market. that sht deserves some praise, even more so when die shrinks happen. Its an apparently seemless transition. Unless world works it out and learns how to mod existing chips?

    Chukkle. but hey im drunk; and I dont care. I just thought that would be a logical step. Im still waiting for cheap SSD's :P

    Spectator.
    Reply
  • tential - Monday, November 03, 2008 - link

    We already knew nehalem wasn't going to be that much of a game changer. The blog posts you guys had up weeks ago said that because of the cache sizes and stuff not to expect huge gains in performance of games if any. However because of hyperthreading I think there also needs to be some tests to see how multi tasking goes. No doubt those gains will be huge. Virus scanning while playing games and other things should have extremely nice benefits you would think. Those tests would be most interesting although when I buy my PC nehalem will be mainstream. Reply
  • npp - Monday, November 03, 2008 - link

    I'm very curious to see some scientific results from the new CPUs, MATLAB and Mathematica benchmarks, and maybe some more. It's interesting to see if Core i7 can deliver something on these fronts, too. Reply
  • pervisanathema - Monday, November 03, 2008 - link

    I was afraid Nehalem was going to be a game changer. My wallet is grateful that its overall performance gains do not even come close to justifying dumping my entire platform. My x3350 @ 3.6GHz will be just fine for quite some time yet. :)

    Additionally, its relatively high price means that AMD can still be competitive in the budget to low mid range market which is good for my wallet as well. Intel needs competition.
    Reply
  • iwodo - Monday, November 03, 2008 - link

    Since there are virtually no performance lost when using Dual Channel. Hopefully we will see some high performance DDR3 with low Latency next year?
    And which means apart from having half the core, Desktop version doesn't look so bad.

    And since you state the Socket 1366 will be able to sit a Eight Core inside, i expect the 11xx socket will be able to suit a Quad Core as well?

    So why we dont just have 13xx Socket to fit it all? Is the cost really that high?
    Reply
  • QChronoD - Monday, November 03, 2008 - link

    How long are they going to utilize this new socket??
    $284 for the i7-920 isn't bad, but will it be worth the extra to buy a top end board that will appreciate a CPU upgrade 1-2 years later? Or is this going to be useless once Intel Ticks in '10?
    Reply
  • steveyballme - Monday, November 03, 2008 - link

    We worked side by side with Intel to be sure that Vista was optimised for running on this thing!

    http://fakesteveballmer.blogspot.com">http://fakesteveballmer.blogspot.com
    Reply
  • Strid - Monday, November 03, 2008 - link

    Great article. I enjoyed reading it. One thing I stumbled upon though.

    "The PS/2 keyboard port is a nod to the overclocking crowd as is the clear CMOS switch."

    What makes a PS/2 port good for overclockers? I see the use for the clear CMOS switch, but ...
    Reply
  • 3DoubleD - Monday, November 03, 2008 - link

    In my experience USB keyboards do not consistently allow input during the POST screen. If you are overclocking and want to enter the BIOS or cancel an overclock you need a keyboard that works immediately once the POST screen appears. I've been caught with only a USB keyboard and I got stuck with a bad overclock and had to reset the CMOS to gain control back because I couldn't cancel the overclock. Reply
  • Clauzii - Monday, November 03, 2008 - link

    I thought the "USB Legacy support" mode was for exactly that? So legacy mode is for when the PC are booted in DOS, but not during pre? Reply

Log in

Don't have an account? Sign up now