Power Improvements

Although Haswell’s platform power is expected to drop considerably in mobile, particularly with Haswell U and Y SKUs (Ultrabooks and ultrathins/tablets), there are benefits to desktop Haswell parts as well.

There’s more fine grained power gating, lower chipset power and the CPU cores can transition between power states about 25% quicker than in Ivy Bridge - allowing the power control unit to be more aggressive in selecting lower power modes. We’ve also seen considerable improvements on lowering platform power consumption at the motherboard level as well. Using ASUS’ Z77 Deluxe and Z87 Deluxe motherboards for the Haswell, Ivy and Sandy Bridge CPUs, I measured significant improvements in idle power consumption:

Idle Power

These savings are beyond what I’d expect from Haswell alone. Intel isn’t the only one looking to make things as best as can be in the absence of any low hanging fruit. The motherboard makers are aggressively polishing their designs in order to grow their marketshare in a very difficult environment.

Under load, there’s no escaping the fact that Haswell can burn more power in pursuit of higher performance:

Load Power - x264 HD 5.0.1 Benchmark

Here I’m showing an 11.8% increase in power consumption, and in this particular test the Core i7-4770K is 13% faster than the i7-3770K. Power consumption goes up, but so does performance per watt.

The other big part of the Haswell power story is what Intel is calling FIVR: Haswell’s Fully Integrated Voltage Regulator. Through a combination of on-die and on-package circuitry (mostly inductors on-package), Haswell assumes responsibility of distributing voltages to individual blocks and controllers (e.g. PCIe controller, memory controller, processor graphics, etc...). With FIVR, it’s easy to implement tons of voltage rails - which is why Intel doubled the number of internal voltage rails. With more independent voltage rails, there’s more fine grained control over the power delivered to various blocks of Haswell.

Thanks to a relatively high input voltage (on the order of 1.8V), it’s possible to generate quite a bit of current on-package and efficiently distribute power to all areas of the chip. Voltage ramps are 5 - 10x quicker with FIVR than with a traditional on-board voltage regulator implementation.

In order to ensure broad compatibility with memory types, there’s a second input voltage for DRAM as well.

FIVR also comes with a reduction in board area and component cost. I don’t suppose this is going to be a huge deal for desktops (admittedly the space and cost savings are basically non-existent), but it’ll mean a lot for mobile.

No S0ix for Desktop

You’ll notice that I didn’t mention any of the aggressive platform power optimizations in my sections on Haswell power management, that’s because they pretty much don’t apply here. The new active idle (S0ix) states are not supported by any of the desktop SKUs. It’s only the forthcoming Y and U series parts that support S0ix.

Introduction Memory, Platform & Overclocking


View All Comments

  • bji - Sunday, June 02, 2013 - link

    TSX is so esoteric in its applicability that I think you'd be very hard pressed to a) find a benchmark that could actually exercise it in a meaningful way and b) have any expectation that this benchmark would translate into any actual perceived performance gain in any application run by 99.999% of users.

    In other words - TSX is only going to help performance in some very rare and obscure types of software that "normal" users will never even come close to using, let alone caring about the performance of.

    However I am intruiged by your speculation that TSX will be beneficial for physics simulation, which I guess could translate to perceivable performance increases for software that end users might actually use in the form of game physics. I found a paper that described techniques for using transactional memory to improve performance for physics simulation but it only found a 27% performance increase, which is not exactly earth shattering (I wouldn't call it "huge for game physics" personally).
  • Amaranthus - Monday, June 03, 2013 - link

    One of the main (and already implemented) uses of TSX is hardware lock elision. I'd guess the hypothesis is that physics code takes locks defensively but rarely actually have contention because they're working on different parts of the world. In this scenario more fine grained locks on sections of the world would let you scale better but that is a lot of work and HLE gives you the same benefit for free. Reply
  • Jaybus - Monday, June 03, 2013 - link

    No. HLE (XACQUIRE and XRELEASE) do nothing by themselves. They reuse REPNE/REPE prefixes and on CPUs that do not support TSX are ignored on instructions that would be valid for XACQUIRE/XRELEASE if TSX were available. It is a backward compatibility method. Since all of those instructions may have a LOCK prefix, without TSX capability, a normal lock is used, NOT the optimistic locking provided by TSX that allows other threads to see the lock as already free.

    Without TSX the code is still (software) lock-free, but there is no possibility of multiple threads accessing the same memory simultaneously (as there is with TSX), so one or more threads will see a pipeline stall due to the LOCK prefix.
  • bji - Monday, June 03, 2013 - link

    I can't imagine that lock elision is that beneficial to very many applications. Lock contention is almost never a significant performance bottleneck; yeah there are poorly designed applications where lock contention can have a more significant effect, but proper multithreaded coding has the contended sections of code reduced to the smallest number of instructions possible, at which point the effects of lock contention are minimized.

    In order to take advantage of transactional memory and get the full benefits of TSX you have to write such radically different algorithms that I doubt that it's worth it except in the most unusual and specific cases. OK so you can use TSX instructions to make a hashtable or other container class suffer slightly less from lock contention, but that is oh so very rarely a significant aspect to the performance of any program.
  • klmccaughey - Monday, June 03, 2013 - link

    As a programmer, I disagree. This is a very useful feature set that, if it was more widely adopted, would prove very useful for many workaday tasks that the CPU performs. Reply
  • bji - Monday, June 03, 2013 - link

    As a programmer, I am pretty sure that the benefits of TSX are limited to a very unusual and uncommon set of problems the performance increase of which will mean very very little to 99.99% of users 99.99% of the time. Also fully transactional memory algorithms require significant rework from their non-transactional counterparts meaning that taking full advantage of TSX takes developer effort which will not be worth it except in very rare circumstances.

    The HLE instructions may have some very minor benefit because they can be used with algorithms that don't need to be reworked at all (you just get a little bit more parallelism for free), but even then you're going to be avoiding some lock contention; even if you completely eliminated lock contention from most algorithms they would only be fractionally faster in real world usage. Lock contention just isn't that big of a deal in normal circumstances.
  • klmccaughey - Monday, June 03, 2013 - link

    Exactly. It would be the ubiquity of these features that would cause them to be useful - splitting them into segments defeats the adoption and use of said features. Intel are pushing segmentation too hard (too greedily?) Reply
  • bill5 - Saturday, June 01, 2013 - link

    kind if a weird, scattered review.

    loved the q6600 though, since i still have one. and the 8350, since i have my eye on it.

    be interesting if this pushed 8350 prices down enough to be more attractive (it's currently only 180 on newegg). if not i'll probably go with i5 4670 (even though i'm getting tired of these faux msrp's, bet money that chip will be 229 on newegg forget 213)

    ps, my bill4 account was apparently banned (it kept saying i was posting spam wouldn't allow me to post) i post controversial things that probably get downvoted, but they arent spam. please stop doing that.
  • bill5 - Saturday, June 01, 2013 - link

    this really shows where the 8350 fails, single thread.

    it looks like clock for clock it's ipc may be similar to my q6600. it only gains in single thread due to the gaudy 4.0 clock speed.

    otoh go to multithread and it holds it's own against the other ~200 intel chips.
  • Nexus-7 - Saturday, June 01, 2013 - link

    In for one 4770k --

    I'm coming from an i5-750 running at 4GHz. I'm thinking this will be a sufficiently large leap forward although I'm tempted to wait for Ivy Bridge-E's 6c12t monsters.

Log in

Don't have an account? Sign up now