Power Consumption: Hot Hot HOT

I won’t rehash the full ongoing issue with how companies report power vs TDP in this review – we’ve covered it a number of times before. But in a quick sentence, Intel uses one published value for sustained performance, and an unpublished ‘recommended’ value for turbo performance, the latter of which is routinely ignored by motherboard manufacturers. Most high-end consumer motherboards ignore the sustained value, often 125 W, and allow the CPU to consume as much as it needs with the real limits being the full power consumption at full turbo, the thermals, or the power delivery limitations.

One of the dimensions of this we don’t often talk about is that the power consumption of a processor is always dependent on the actual instructions running through the core.  A core can be ‘100%’ active while sitting around waiting for data from memory or doing simple addition, however a core has multiple ways to run instructions in parallel, with the most complex instructions consuming the most power. This was noticeable in the desktop consumer space when Intel introduced vector extensions, AVX, to its processor design. The concurrent introduction of AVX2, and AVX-512, means that running these instructions draws the most power.

AVX-512 comes with its own discussion, because even going into an ‘AVX-512’ mode causes additional issues. Intel’s introduction of AVX-512 on its server processors showcased that in order to remain stable, the core had to reduce the frequency and increase the voltage while also pausing the core to enter the special AVX-512 power mode. This made the advantage of AVX-512 suitable only for strong high-performance server code. But now Intel has enabled AVX-512 across its product line, from notebook to enterprise, allowing these chips to run AI code faster and enabling a new use cases. We’re also a couple of generations on from then, and AVX-512 doesn’t get quite the same hit as it did, but it still requires a lot of power.

For our power benchmarks, we’ve taken several tests that represent a real-world compute workload, a strong AVX2 workload, and a strong AVX-512 workload. Note that Intel lists the Core i7-11700K as a 125 W processor.

Motherboard 1: Microcode 0x2C

Our first test using Agisoft Photoscan 1.3 shows a peak power consumption around 180 W, although depending on the part of the test, we have sustained periods at 155 W and 130 W. Peak temperatures flutter with 70ºC, but it spends most of the time at around the 60ºC mark.

For the AVX2 workload, we enable POV-Ray. This is the workload on which we saw the previous generation 10-core processors exceed 260 W.

At idle, the CPU is consuming under 20 W while touching 30ºC. When the workload kicks in after 200 seconds or so, the power consumption rises very quickly to the 200-225 W band. This motherboard implements the ‘infinite turbo’ strategy, and so we get a sustained 200-225 W for over 10 minutes. Through this time, our CPU peaks at 81ºC, which is fairly reasonable for some of the best air cooling on the market. During this test, a sustained 4.6 GHz was on all cores.

Our AVX-512 workload is 3DPM. This is a custom in-house test, accelerated to AVX2 and AVX512 by an ex-Intel HPC guru several years ago (for disclosure, AMD has a copy of the code, but hasn’t suggested any changes).

This tests for 10-15 seconds and then idles for 10 seconds, and does rapidly go through any system that doesn’t run an infinite turbo. What we see here in this power only graph is the alarming peaks of 290-292 W. Looking at our data, the all-core turbo under AVX-512 is 4.6 GHz, sometimes dipping to 4.5 GHz. Ouch. But that’s not all.

Our temperature graph looks quite drastic. Within a second of running AVX-512 code, we are in the high 90ºC, or in some cases, 100ºC. Our temperatures peak at 104ºC, and here’s where we get into a discussion about thermal hotspots.

There are a number of ways to report CPU temperature. We can either take the instantaneous value of a singular spot of the silicon while it’s currently going through a high-current density event, like compute, or we can consider the CPU as a whole with all of its thermal sensors. While the overall CPU might accept operating temperatures of 105ºC, individual elements of the core might actually reach 125ºC instantaneously. So what is the correct value, and what is safe?

The cooler we’re using on this test is arguably the best air cooling on the market – a 1.8 kilogram full copper ThermalRight Ultra Extreme, paired with a 170 CFM high static pressure fan from Silverstone. This cooler has been used for Intel’s 10-core and 18-core high-end desktop variants over the years, even the ones with AVX-512, and not skipped a beat. Because we’re seeing 104ºC here, are we failing in some way?

Another issue we’re coming across with new processor technology is the ability to effectively cool a processor. I’m not talking about cooling the processor as a whole, but more for those hot spots of intense current density. We are going to get to a point where can’t remove the thermal energy fast enough, or with this design, we might be there already.

Smaller Packaging

I will point out an interesting fact down this line of thinking though, which might go un-noticed by the rest of the press – Intel has reduced the total vertical height of the new Rocket Lake processors.

The z-height, or total vertical height, of the previous Comet Lake generation was 4.48-4.54 mm. This number was taken from a range of 7 CPUs I had to hand. However, this Rocket Lake processor is over 0.1 mm thinner, at 4.36 mm. The smaller height of the package plus heatspreader could be a small indicator to the required thermal performance, especially if the airgap (filled with solder) between the die and the heatspreader is smaller. If it aids cooling and doesn’t disturb how coolers fit, then great, however at some point in the future we might have to consider different, better, or more efficient ways to remove these thermal hotspots.

Motherboard 2: Microcode 0x34

As an addendum to this review a week after our original numbers, we obtained a second motherboard that offered a newer microcode version from Intel.

On this motherboard, the AVX-512 response was different enough to warrant mentioning. Rather than enable a 4.6 GHz all-core turbo for AVX-512, it initially ramped up that high, peaking at 276 W, before reducing down to 4.4 GHz all-core, down to 225 W. This is quite a substantial change in behaviour:

This means that at 4.4 GHz, we are running 200 MHz slower (which gives a 3% performance decrease), but we are saving 60-70 W. This is indicative of how far away from the peak efficiency point that these processors are.

There was hope that this will adjust the temperature curve a little. Unfortunately we still see peaks at 103ºC when AVX-512 is first initiated, however during the 4.4 GHz time scale we are more akin to 90ºC, which is far more palatable.

On AVX2 workloads with the new 0x34 microcode, the results were very similar to the 0x2C microcode. The workload ran at 4.6 GHz all-core, reached a peak power of 214 W, and the processor temperature was sustained around 82ºC.

Peak Power Comparison

For completeness, here is our peak power consumption graph. These are the peak power consumption numbers taken from a series of benchmarks on which we run our power monitoring tools.

(0-0) Peak Power

Intel Core i7-11700K Review CPU Tests: Microbenchmarks
POST A COMMENT

541 Comments

View All Comments

  • blppt - Saturday, March 13, 2021 - link

    They did try to at least 'ride it out' until Zen could get done, and that required smoothing out the rough edges, so they did devote some resources.

    BD/PD never did any better than a low-end solution for the desktop/laptop market, but they had to offer something until Zen was done.
    Reply
  • Oxford Guy - Sunday, March 28, 2021 - link

    'They did try to at least 'ride it out' until Zen could get done, and that required smoothing out the rough edges, so they did devote some resources.'

    Wow... watch the goal posts move.

    Riding out = doing nothing. Piledriver was not improved. The entire higher-performance & supercomputer market was unchanged from Piledriver to Zen. All AMD did was ship cheap knock-off APU rubbish and console trash.

    The fact that AMD succeeded with Zen is probably mostly a testament to one largely ignored feature of monopoly power: the monopolist can become so slow and inefficient that a nearly dead competitor can come back to best it. That's not symptomatic of a well-run economic system. It's a trainwreck.

    AMD should have been wealthy enough to do proper R&D and bulldozer would have never happened in the first place. But, Intel was a huge abusive monopolist and everyone went right along, content to feed the problem. After AMD did Bulldozer and Piledriver the company should have been dead. If there had been adequate competition it would have been. So, ironically, AMD can thank Intel for being its only competition, for resting on its laurels because of its extreme monopolization.
    Reply
  • GeoffreyA - Wednesday, March 10, 2021 - link

    Oxford Guy. I don't remember the exact details and am running largely from memory here. Yes, I agree, Bulldozer had far lower IPC than Phenom, but, according to their belief, was supposed to restore them to the top and knock Intel down. In practice, it failed miserably and was worse even than Netburst. Credit must be given, however, for their raising Bulldozer's IPC a lot each generation (something like 20-30% if I remember right), and curtailing power. It also addressed weaknesses in K10 and surpassed K10's IPC eventually. Anyway, working against such a hopeless design surely taught them a lot; and pouring that knowledge into a classic x86 design, Zen, took it further than Skylake after just one iteration.

    AMD would have done better had they just persisted with K10, which wasn't that far behind Nehalem. But, perhaps we wouldn't have had Zen: it took AMD's going through the lowest depths, passing through the fire as it were, to become what they are today, leaving Intel baffled. I agree, they were truly idiotic in the last decade but no more. May it stay that way!

    Concerning CMT, I don't know much about it to comment, but think Bulldozer's principal weakness came from sharing execution units---the FP units I believe and others---between modules. Zen kept each core separate and gave it full (and weighty) resources, along with a micro-op cache and other improvements. As for Jaguar, it may be junk from a desktop point of view, yes, but was excellent in its domain and left Atom in the dust.
    Reply
  • Oxford Guy - Sunday, March 28, 2021 - link

    'Credit must be given, however, for their raising Bulldozer's IPC a lot each generation (something like 20-30% if I remember right), and curtailing power.'

    Piledriver was a small IPC improvement and regressed in AVX. Piledriver's AVX was so extremely poor that it was faster to not use it. Piledriver was a massive power hog. The 32nm SOI process node, according to 'TheStilt' was improved over time which is probably the main source of power efficiency improvement in Piledriver versus Bulldozer. I do not recall the IPC improvement of Piledriver over Bulldozer but it was nothing close to 20% I think. Instead, it merely made it possible to raise clocks further, along with the aforementioned node improvement. And, 'TheStilt' said the node got better after Piledriver's first generation. The 'E' parts, for instance, were quite a lot improved in leakage — but the whole line (other than the 9000 series which he said should have been sent to the scrapper) improved in leakage. What didn't improve, sadly, is the bad Piledriver design. AMD never bothered to fix it.

    While Piledriver, when clocked high (like 4.7 GHz) could be relevant against Sandy in multi-thread (including well-threaded games like Desert of Kharak) it was extremely pitiful in single-thread. And, it sucked down boatloads of power to get to 4.7, even with the best-leakage chips.

    And, going back to your 20–30% claim. Steamroller, which was considered a serious disappointment, featured only 4 of the CMT quasi cores, not 8. Excavator cut things in cache land even further. Both were cost-cutting parts, not performance improvements. Piledriver killed both of them simply by turning up the clocks high. The multi-thread performance of Steamroller and Excavator was not competitive because of the lack of cache, lack of cores, and lack of clock. Single-thread was a bit improved but, again, the only thing one could really do was blast current through Piledriver. It was a disgusting situation due to the single-threaded performance, which was unacceptable in 2012 and an abomination for the later years AMD kept peddling Piledriver in.

    The only credit AMD deserves for the construction core period is not going out of business, despite trying so hard to do that.
    Reply
  • GeoffreyA - Sunday, March 28, 2021 - link

    Oxford Guy, while I respect your view, I do not agree with it, and still stand by my statement that AMD deserves credit for improving Bulldozer and executing yearly. Agreed, my 20-30% claim was not sober but I just meant it as a recollection and did qualify my statement.

    I don't think it's fair to put AMD down for embarking on Bulldozer. When they set out, quite likely they thought it was going to go further than the aging Phenom/K10 design, and the fact is, while falling behind in IPC compared with K10, it improved on a lot of points and laid the foundation. Its chief weakness was the idea of sharing resources, like the fetch, decode, and FP units, as well as going for a deeper pipeline. (The difference from Netburst is that Bulldozer was decently wide.)

    Piledriver refined the foundation, raising IPC and adding a perceptron branch predictor, still used in Zen by the way, and I believe finally surpassed K10's IPC (and that of Llano). While being made on the same 32 nm process, it dropped power by switching to hard-edge flip flops, which took some work to put in. They used that lowered power to raise clock speeds, bringing power to the same level as Bulldozer. And Trinity, the Piledriver APU, surpassed Llano. I need to learn more about Steamroller and Excavator before I comment, but note in passing that SR improved the architecture again, giving each integer core its own fetch/decode units, among other things; and Excavator switched to GPU libraries in laying out the circuitry, dropping power and area, the tradeoff being lower frequency.
    Reply
  • GeoffreyA - Sunday, March 28, 2021 - link

    Also, the reviews show that things were not as bad as we remember, though power was terrible.

    https://www.anandtech.com/show/6396/the-vishera-re...

    https://www.anandtech.com/show/5831/amd-trinity-re...
    Reply
  • Oxford Guy - Tuesday, April 6, 2021 - link

    I don't need to look at reviews agaih. I know how bad the IPC was in Bulldozer, Piledriver, Steamroller, and Excavator. Single-thread in Cinebench R15, for instance, was really low even at 5.2 GHz in Piledriver. It takes chilled water to get it to bench at that clock. Reply
  • GeoffreyA - Wednesday, March 10, 2021 - link

    Lack of competition, high prices, lack of integrity. I agree it's one big mess, but there's so little we can do, except boycotting their products. As it stands, the best advice is likely: find a product at a decent price, buy it, be happy, and let these rotten companies do what they want. Reply
  • Oxford Guy - Sunday, March 28, 2021 - link

    'find a product at a decent price, buy it, be happy'

    Buy a product you can't buy so you can prop up monopolies that cause the problem of shortage + bad pricing + low choice (features to choose from/i.e. innovation, limited).
    Reply
  • GeoffreyA - Sunday, March 28, 2021 - link

    The only solution is a worldwide boycott of their products, till they drop their prices, etc. Reply

Log in

Don't have an account? Sign up now