Speed Shift v2: Speed Harder
As originally reported at Kaby Lake-Y/U Launch

One of the new features for Skylake was Speed Shift. With the right OS driver, the system could relinquish control of CPU turbo to the CPU itself. Using internal metric collection combined with access to system-level sensors, the CPU could adjust the frequency with more granularity and faster than the OS can. The purpose of Speed Shift was to allow the system to respond quicker to requests for performance (such as interacting with a touch screen or browsing the web), reduce delays and improve the user experience. So while the OS was limited to predefined P-state options, a Speed Shift enabled processor with the right driver had a near contiguous selection of CPU multipliers within a wide range to select from.

The first iteration of Speed Shift reduced the time for the CPU to hit peak frequencies from ~100 milliseconds down to around 30. The only limitation was the OS driver, which is now a part of Windows 10 and comes by default. We extensively tested the effects of the first iteration of Speed Shift at launch.

With Kaby Lake, the hardware control around Speed Shift has improved. Intel isn’t technically giving this a new name, but it is an iterative update which I prefer to call ‘v2’, if only because the adjustment from v1 to v2 is big enough to note. There is no change in the OS driver, so the same Speed Shift driver works for both v1 and v2, but the performance means that a CPU can now reach peak frequency in 10-15 milliseconds rather than 30.

The light green and yellow lines show the difference between v1 and v2, with the yellow Kaby Lake processor getting up to 3.5 GHz incredibly quickly. This will have an impact on latency limited interactions as well as situations where delays occur, such as asynchronous web page loading. Speed Shift is a play for user experience, so I’m glad to see it is being worked on. We will obviously have to test this when we can.

A note about the graph, to explain why the lines seem to zig-zag between lower and higher frequencies because I have encountered this issue in the past. Intel’s test, as far as we were told, relies on detecting register counters that increment as instructions are processed. By monitoring the value of these registers, the frequency can be extrapolated. Depending on the polling time, or adjacent point average (a common issue with counter based time benchmarks I’ve experienced academically), it can result it statistical variation depending on the capability of the code.

While this graph uses the i7-7500U, which was released back in September, Speed Shift v2 is a feature for all Kaby Lake processors in the stack with the right OS. We still have not received an official word if Intel is working closely with Apple to bring the feature to macOS, or even if it will be promoted if it ever makes the transition – Apple may never promote it so as not to confuse the non-technical media that follow Apple, but also not allow Intel to specify that Apple is using it. Or, it’ll be part of a presentation; we don’t know.

A New Optimized 14nm Process: 14nm+ Optane Memory: Support for Intel 3D XPoint
Comments Locked

43 Comments

View All Comments

  • Lolimaster - Tuesday, January 3, 2017 - link

    Considering the minimum cores you get per module is 4, I see AMD selling months later a 3c/6t cpu for $99.

    They will make a tweak for the raven ridge APU since the core count for those is 4c max.
  • jjj - Friday, January 6, 2017 - link

    Every segment they don't cover (and they don't have Zen APUs yet) is business left on the table - the budget segment is big enough and in regions they care about.

    Maybe they should go to 49$ with quads and disable HT, some cache but it is likely that if they don't do that, most would make an effort to get the 99$ quad. Just hope they don't get too greedy and start way higher, Intel can make quads without a GPU too, won't take too long and AMD needs to exploit this window of opportunity and gain,not just revenue, but hearts and minds.
  • name99 - Tuesday, January 3, 2017 - link

    "We still have not received an official word if Intel is working closely with Apple to bring the feature to macOS, or even if it will be promoted if it ever makes the transition"

    Could some more-or-less unexpected interaction between Speed Shift 2 and the rest of MacOS be the reason for the apparently random dramatic swings in the battery lifetime of the new MacBook Pros? We hardly know enough to point fingers at either Apple or Intel, but I could certainly imagine that each side has a certain mental model of what the other side is/"should" be doing, and the mismatch between those models means that the CPU is randomly being told to run at maximum speed when the OS actually wants it to dramatically slow down.
    I agree that this sounds kinda dumb of the surface, but I could imagine that there are enough layers between UI/framework code, the power driver, the core OS, and EFI, that something gets confused along the way including, perhaps, exposing a bug (again either on the Apple side or the Intel side) that just didn't get triggered (or at least not very often) on either previous x86 CPUs or on Linux/Windows.
  • rodmunch69 - Tuesday, January 3, 2017 - link

    My 5 year old 3930k can still basically keep up with Intel's latest and greatest with stock voltage OC. Hum... I used to buy new stuff every year, or every two years at most, because there was normally a good gain to be had. It's legit been 5 years now and my PC with a little work, in multi core tests, is just as fast as anything out there. That's pitiful on Intel's behalf. They've gotten fat and lazy and the consumer is paying for it. Trump needs to tell AMD to put the A back into their chips and actually put out some products at the high-end that actually pushes Intel to be great again.
  • Laststop311 - Wednesday, January 4, 2017 - link

    Is it really worth saving 60 dollars to get an unlocked i3 vs the unlocked i5? I really can't see any situation where 60 dollars is the difference between being able to afford a new pc or not. With DX12 it HIGHLY benefits from having 4 cores (really 6 cores is optimum with 8 only slightly improving). Being stuck with 2 cores in this day is severely crippling your lifespan of the pc. You will waste GPU power and be constrained by the 2 cores all in the name to save 60 dollars. Nah it's not worth it.

    Kaby lake in general is not worth it. Everyone with quad core sandy bridge and above is going to see very minimal gains from a quad core cpu. You really need to go to 6 cores to get any real performance increase and you also need to be playing in dx 12 mode. Your best bet is to wait for the 2019 tock of 10nm coffee lake. Intel will be moving to pci-e 4.0 which doubles the bandwidth so an 8x pci-e 4.0 is the same as a 16x pci-e 3.0. Since gpu's only lose a few percentage points of performance on 8x pci-e 3.0, 8x pci-e 4.0 will give them all the breathing room they need. This leaves you 16x lanes of the 24 lanes to use for m2 storage devices or capture cards without having to use the higher latency PCH pci-e lanes. Or with multi GPU you still have 8x cpu pci-e lanes and you only need 2x pci-e 4.0 lanes to give you 4GB/s (32gbps) so you can fit dual gpu's and 4 pci-e storage devices all connected to the cpu directly and both gpu's will get 16GB/s (128gbps) bandwidth. This gives you massive future proofing. With intel optane maturing you can go single gpu at 16x pci-e 4.0 lanes 32GB/s bandwidth (256gbps) stick an optane drive on 4x lanes giving you a massive 8GB/s (64gbps) and 2 m2 nvme ssd's on 2x lanes each 4GB/s (32gbps) each, with all devices connected directly to CPU for the lowest latency leaving all the PCH lanes free for external ports like TB3 USB 3.1 gen 2 etc.

    By waiting till 2019 you get a real upgrade instead of a sidegrade. pci-e 4.0 will unlock the true potential of Intel optane as i expect by then the optane drives will be maxing out the 4x pci-e 3.0 lanes at 4GB/s and pci-e 4.0 will allow optane to really shine and most likely hit 7GB/s or more. With that kinda storage speed you can transfer an entire blu ray disc image in about 7 seconds.

    Now by all means if you are still on the Q series quad cores than kaby lake is a compelling upgrade and isn't a total waste of money to upgrade. But even in that circumstance I would say try to stick it out another year so you can have a 6 core coffee lake as 6 cores is incredibly useful in dx12.
  • Lolimaster - Wednesday, January 4, 2017 - link

    You mean upgrade to the 8c/16t Ryzen or wait 2018-2019 for the 7nm Zen+?
  • gopher1369 - Wednesday, January 4, 2017 - link

    The only thing that occurs to me is game emulators. Dolphin and PCSX2 require high clock speeds and high IPC, not more cores. It's quite niche, but if you're building an emulator box then the unlocked Anniversary Edition Haswell Pentium is currently the go-to processor, the new i3 should be even better.
  • Laststop311 - Wednesday, January 4, 2017 - link

    What applications use AVX instructions? I wonder how much it will hurt performance for some applications by decreasing AVX to 4.0ghz so you can hit 5.0ghz on everything else. The highest overclock i've seen talked about is 5.1ghz on the i7-7700k using the corsair 115i
  • johnp_ - Wednesday, January 4, 2017 - link

    (3) Embedded DisplayPort* (eDP) 1.4 and PSR2 under evaluation

    I seriously didn't expect that! This means that they actually changed the display pipeline slightly :)
    Now, hopefully laptop vendors will make use of PSR2 to further improve battery life.

    On a side-note: Does anyone know how to overclock the 7820HK when there's no mobile chipset that supports overclocking? Will laptop vendors have to include the Z270 desktop chipset on their platform?
  • keeepcool - Friday, January 6, 2017 - link

    You open intel XTU and press on the arrows till it BSOD's.
    Laptop chipsets are "different" in a lot of senses.

Log in

Don't have an account? Sign up now