Intel's Secret: Nehalem Can Be Very Power Efficient

I tried an experiment while I was testing Nehalem, I recorded power consumption while running every single benchmark I ran for the review. I did the same for Intel's Core 2 Extreme QX9770 and compared the two. I published an abridged version of these results in the review, basically showing that the Core i7-965 offered much better power consumption, across the board, than the equivalently clocked QX9770 while the Core i7-920 was outshined by the Q9450 which drew less total system power. Both datapoints were valid but there were too many unanswered questions to draw any serious conclusions at that point. I met with Intel several times since the review went live, tested and retested processors and I believe I've come up with an understanding of what's going on from a power standpoint with Nehalem.

All three of Intel's Core i7 CPUs that will be available at launch this month are 130W TDP parts. At 3.2GHz that's expected, but at 2.66GHz that's a bit high compared to Intel's other quad-core 2.66GHz processors on the market. The Core 2 Quad Q9450, for example, has a 95W TDP and runs at 2.66GHz. The lower TDP is made possible by a lower core voltage, which is enabled by the fact that Intel has been building quad-core Penryns for a while and yields are high enough where driving core voltage down is possible. The same will eventually happen to the Core i7, but it's such a new design, such a radical departure from Intel's previous Core based CPUs and so early in the manufacturing process that there simply hasn't been time to get yields high enough to produce < 100W TDP 2.66GHz parts.


Multiple sample points are necessary for proper analysis...


...and plus lots of Nehalems are more fun

The Q9450 can operate at voltages down to 0.85V and as high as 1.3625V, while the Core i7-920 currently appears to be limited to a minimum of around 1.137V. Power consumption of a CPU at a fixed clock speed is proportional to the square of the voltage, so despite whatever power efficiencies Intel has included in Nehalem they will not outweigh a Penryn running at a lower core voltage. So we'd expect the Core 2 Quad Q9450 to have lower power consumption than the Core i7-920, at least today, until Intel can get a competitively low TDP 920 out on the market. But what about the i7-965?

The Core 2 Extreme QX9770 has a 136W TDP, slightly higher than the 130W TDP of the Core i7-965 and both running at the same 3.2GHz frequency. Now this comparison gave me some very interesting data, look at the power consumption numbers across all of the benchmarks (note that this is average system power, recorded over the entire benchmark run for each test):

CPU Intel Core 2 Extreme QX9770 (3.2GHz) Intel Core i7-965 (3.2GHz)
Idle
138.7W 105.5W
POV-Ray
230.7W 240.4W
Cinebench (1 thread)
194.3W 168.3W
Cinebench (max threads)
227.6W 230.7W
3dsmax 9 SPECapc CPU test
220.1W 209.4W
x264 HD Encode Test
230.3W 196.2W
DivX 6.8.3
221.7W 202.1W
Windows Media Encoder
249W 201.2W
Age of Conan
306.2W 267.3W
Race Driver GRID
348.8W 302W
Crysis
293.6W 248.5W
FarCry 2
324.2W 271.9W
Fallout 3
303.2W 225W

 

When compared to the QX9770, the Core i7-965 draws at worst near to or slightly more than the same amount of power, but at best, you see a significant reduction in total system power consumption. There are only two cases where the QX9770 draws less power than the i7-965.

Note that the idle power on the i7-965 is very low, one thing that must be enabled to achieve this is the QPI power management option in the X58 BIOS which for whatever reason was disabled by default in our original review.

If you want to look at performance, here is the corresponding performance data to that power data:

CPU Intel Core 2 Extreme QX9770 (3.2GHz) Intel Core i7-965 (3.2GHz)
POV-Ray
2641 PPS 4202 PPS
Cinebench (1 thread)
3937 CBMarks 4475 CBMarks
Cinebench (max threads)
14065 CBMarks 18810 CBMarks
3dsmax 9 SPECapc CPU test
13.1 17.6
x264 HD Encode Test
73.2 fps 85.8 fps
DivX 6.8.3
42.4 seconds 32.8 seconds
Windows Media Encoder
29 seconds 24 seconds
Age of Conan
107.9 fps 123 fps
Race Driver GRID
103.0 fps 102.9 fps
Crysis
41.7 fps 40.5 fps
FarCry 2
102.6 fps 115.1 fps
Fallout 3
77.2 fps 83.2 fps

 

When the i7-965 significantly outperforms the QX9770, its power consumption is around the same - thus giving us much better performance per watt. When the i7-965 can't really outperform the QX9770, for example in some of the gaming benchmarks, the total system power consumption is much lower.

I confirmed that I didn't have a particularly low power Core i7-965 by testing multiple chips, and Intel confirmed that my QX9770 fell within the middle of its distribution for power characteristics of all QX9770s. It looks extremely probably that at the same TDP level, Nehalem has the ability to be much more power efficient than even Penryn - all without so much as a die shrink, remember that both of these CPUs are built on the same 45nm process.

The Overclocking Story: Much Ado About Nothing Oooh, Shiny - But Why?
POST A COMMENT

23 Comments

View All Comments

  • lemonadesoda - Wednesday, November 19, 2008 - link

    Anand. Fantastic article, but:

    1./ You didnt mention whether your tests were on 32bit or 64bit. We know that 32bit Core 2 is more efficient due to microcode fusion, whereas that isnt true for 64bit. On i7, opcode fusion is there on 64bit.

    2./ I think you should execute a CPU HALT to observe deep down idle. This figure, say 110W, should then be SUBTRACTED from all other results. Why? Because this is essentially the mainboard/HDD/system power draw excluding the CPU. I see from your figures that the power used (as a delta from idle) on i7 is actually HIGHER than QX9770. So I actually have a very different view than you. I think x58 is much more efficient, and that internal memory controller is less power than older northbridge. But when the i7 is crunching, is is using more power AT THE CPU than the QX9770
    Reply
  • prodystopian - Monday, November 10, 2008 - link

    While this limit is a non-issue for anyone getting a X58 motherboard, what about those looking for the e2xxx of this generation? When looking for a cheap CPU to heavily OC to get an extreme Price/performance, it would be best to pair with a cheap motherboard such as the next P series (not X). I'm assuming we don't know whether this BIOS switch will be on the P series motherboards, but if it is not, that is where the real problem occurs. Reply
  • Live - Sunday, November 09, 2008 - link

    I don't know if this has been answered yet but what are the advantage of the i7-965 higher QPI? Can you overclock the QPI and if so dose it make a difference? Reply
  • Live - Sunday, November 09, 2008 - link

    Live I think you meant to write:

    I don't know if this has been answered yet, but what is the advantage of the i7-965 higher QPI? Can you overclock the QPI and if so does it make a difference?
    Reply
  • CEO Ballmer - Saturday, November 08, 2008 - link

    Made for Vista!

    http://fakesteveballmer.blogspot.com">http://fakesteveballmer.blogspot.com
    Reply
  • Rev1 - Saturday, November 08, 2008 - link

    Maybe im missing something but being that the multiplier was not unlocked how did he get it that high? Reply
  • frazz - Saturday, November 08, 2008 - link

    Surely CPU power at a fixed voltage is proportional to the square of the voltage, not the cube? I thought the formula was this:

    Power dissipation = C.V^2.f where C is the capacitance being switched per clock cycle
    Reply
  • frazz - Saturday, November 08, 2008 - link

    Sorry I meant CPU power at a fixed FREQUENCY is proportional to the square of the voltage. D'oh. Reply
  • HolyFire - Saturday, November 08, 2008 - link

    I agree. This surely was a misinterpretation of Intel's slide, which actually meant: If the frequency is increased proportionally to the voltage, the power will go like voltage cubed. But for a fixed frequency, power goes like voltage squared.

    In either case, I find that slide a little suspicious, as I have not yet seen any theoretical or experimental result suggesting that frequency should be linearly proportional to voltage.
    Reply
  • ltcommanderdata - Friday, November 07, 2008 - link

    Great article. It's nice to see someone do a more in depth analysis of Nehalem's characteristics rather than just printing a bunch of benchmarks.

    In regards to you Hyperthreading tests, it might be interesting to isolate the causes of HT performance increases in Nehalem. HT quite often was a hinderance for Netburst and it would be interesting to see whether the cause was primarily HT's implementation in Netburst or just do the the maturity of HT compatible software at the time. It's an odd coincidence that the last processor to carry HT, besides Atom, was the Pentium Extreme Edition 965 while the first desktop processor to reintroduce HT is again numbered 965 as part of the Core i7 family.

    For instance, you could try to compare the speedup that 965EE receives going from 2 to 4 threads against the i7-965 doing the same. It would also be interesting to see if HT's performance delta improves going from Windows XP to Windows Vista, which would imply that Vista's scheduler is smarter about dispatching tasks to logical cores that don't share resources.

    And in regards to mobile Nehalem, I agree that the power consumption improvements could really benefit notebooks, but it's kind of curious that Nehalem won't come to notebooks until Q3 2009. I believe previous Core 2 rollouts for Merom and Penryn were pretty fast, like a quarter spread between the desktop, notebook, and UP/DP server markets, but this looks to be a 3 quarter spread. I wonder what the delay is? With a Q3 2009 mobile Nehalem launch, they might as well just wait a quarter and do a strong roll out of Westmere on mobile first.
    Reply

Log in

Don't have an account? Sign up now