Modern computer processors are constantly changing their operating frequency (and voltage) depending on workload. For Intel processors, this is often handled by the operating system which will request a particular level of performance, known as the Performance State or P-State, from the processor. The processor then adjusts its frequencies and voltage levels to accomodate, in a DVFS (dynamic voltage and frequency scaling) sort of way, but only at the P-states fixed at the time of production. While the best for performance would be to run the system at the maximum all the time, due to the high voltage, this is the least efficient way to run a processor and wasteful in terms of energy used, which for mobile devices means a shorter battery life or thermal throttling. With the P-state model, to increase efficiency, the operating system can request lower P-states in order to save power, but if a task requires more performance, and the power/thermal budgets are sufficient, the P-State can be changed to accomodate. This 'technology' on Intel processors has historically been called 'Speed Step'.

With Skylake, Intel's newest 6th generation Core processors, this changes. The processor has been designed in a way that with the right commands, the OS can hand control of the frequency and voltage back to the processor. Intel is calling this technology 'Speed Shift'. We’ve discussed Speed Shift before in Ian’s Skylake architecture analysis, but despite the in-depth talk from Intel, Speed Shift was noticably absent at the time of the launch of the processors. This is due to one of the requirements for Speed Shift - it requires operating system support to be able to hand over control of the processor performance to the CPU, and Intel had to work with Microsoft in order to get this functionality enabled in Windows 10. As of right now, anyone with a Skylake processor is actually not getting the benefit of the technology, at least right now. A patch will be rolled out in November for Windows 10 which will enable this functionality, but it is worth noting that it will take a while for it to roll out to new Windows 10 purchases.

Compared to Speed Step / P-state transitions, Intel's new Speed Shift terminology, changes the game by having the operating system relinquish some or all control of the P-States, and handing that control off to the processor. This has a couple of noticable benefits. First, it is much faster for the processor to control the ramp up and down in frequency, compared to OS control. Second, the processor has much finer control over its states, allowing it to choose the most optimum performance level for a given task, and therefore using less energy as a result. Specific jumps in frequency are reduced to around 1ms with Speed Shift's CPU control from 20-30 ms on OS control, and going from an efficient power state to maximum performance can be done in around 35 ms, compared to around 100 ms with the legacy implementation. As seen in the images below, neither technology can jump from low to high instantly, because to maintain data coherency through frequency/voltage changes there is an element of gradient as data is realigned.

The ability to quickly ramp up performance is done to increase overall responsiveness of the system, rather than linger at lower frequencies waiting for OS to pass commands through a translation layer. Speed Shift cannot increase absolute maximum performance, but on short workloads that require a brief burst of performance, it can make a big difference in how quickly that task gets done. Ultimately, much of what we do falls more into this category, such as web browsing or office work. As an example, web browsing is all about getting the page loaded quickly, and then getting the processor back down to idle.

For this short piece, Intel was able to provide us with the Windows 10 patch for Speed Shift ahead of time, so that we could test and see what kind of gains it can achieve. This gives us a somewhat unique situation, since we can isolate this one variable on a new processor and measure its impact on various workloads.

To test Speed Shift, I’ve chosen several tasks which have workloads that could show some gain from Speed Shift. Tests which run the processor at its maximum frequency for long periods of time are not going to show any significant gain, since you are not limited by the responsiveness of the processor in those cases. The first test is PCMark 8, which is a benchmark which attempts to represent real-life tasks, and the workload is not constant. In addition, I’ve run the system through several Javascript tests, which are the best case scenario for something like Speed Shift, since the processor has to quickly complete a task in order to allow you to enjoy a website.

The processor in question is an Intel Core i7-6600U, with a base frequency of 2.6 GHz, and turbo frequency of 3.4 GHz. Despite the base frequency being rated on the box at 2.6 GHz, the processor can go all the way down to 400 Mhz when idle, so being able to ramp up quickly could make a big impact even on the U-series Skylake processors. My guess is that it will be even more beneficial to the Y series Core m3/m5/m7 parts since they have a larger dynamic range, and typically more thermal constraints.

PCMark 8

PCMark 8 - Home

PCMark 8 - Work

Both the Home and Work tests show a very small gain with Speed Shift enabled. The length of these benchmarks, which are between 30 and 50 minutes, would likely mask any gains on short workloads. I think this illustrates that Speed Shift is just one more tool, and not a holy grail for performance. The gain on Home is just under 3%, and the difference on the Work test is negligible.

JavaScript Tests

JavaScript is one of the use cases where short burst workloads are the name of the game, and here Speed Shift has a much bigger impact. All tests were done with the Microsoft Edge browser.

Mozilla Kraken 1.1

Google Octane 2.0

WebXPRT 2015

WebXPRT 2013

The time to complete the Kraken 1.1 test is the least affected, with just a 2.6% performance gain, but Octane's scores shows over a 4% increase. The big win here though is WebXPRT. WebXPRT includes subtests, and in particular the Photo Enhancement subtest can see up to a 50% improvement in performance. This bumps the scores up significantly, with WebXPRT 2015 showing an almost 20% score increase, and WebXPRT 2013 has a 26% gain. These leaps in performance are certainly the kind that would be noticeable to the end user manipulating photographs in something like Picasa or watching web-page based graph adjustments such as live stock feeds.

Power Consumption

The other side of the coin is power consumption. Having a processor that can quickly ramp up to its maximum frequency could mean that it will consume more power due to the greater penalty of increasing the voltage, but if it can complete the task quickly and get back to idle again, there is a chance to be more efficient when work is done in 10s of milliseconds rather than 100s of milliseconds, as the frequency ramps up and down again before the old P-state method has decided to do anything. The principle of 'work fast, finish now' was the backbone of Intel's 'Race To Sleep' strategy during the ultrabook era and focused on the impulse of response-related performance, however the drive for battery life means that efficiency has tended to matter more, especially as devices and batteries get smaller. 

Due to the way modern processors work, we don’t have the tools to directly measure the SoC power. Intel has told us that Speed Shift does not impact battery life very much, one way or the other, so to verify this, I've run our light battery life test with the option disabled and enabled.

Core i7-6600U Battery Efficiency

This task is likely one of the best case scenarios for Speed Shift. It consists of launching four web pages per minute, with plenty of idle time in between. Although Speed Shift seems to have a slight edge, it is very small and would fall within the margin of error on this test. Some tasks may see a slight improvement in efficiency, and others may see a slight regression, but Speed Shift is less of a power savings tool than other pieces of Skylake. Looking at it another way, if, for example, the XPS 13 with Skylake was to get 15 hours of battery life, Speed Shift would only change the result by about 7 minutes. Responsiveness increases, but net power use remains about the same.

Final Words

With Skylake, while there was not the large leap in clock for clock performance gain that we have become accustomed to with new Intel microarchitectures, but when you look at the overall package, there was a decent net gain in performance combined with new technologies. For example, being able to maintain higher Turbo frequencies on multiple cores has increased the stock to stock performance more than the smaller IPC gains.

Speed Shift is just one small part of the overall performance gain, and one that we have not been able to look at until now. It does lead to some pretty big gains in task completion, if the workloads are bursty and short enough for it to make a difference. It can’t increase the absolute performance of the processor, but it can get it to maximum performance in a much shorter amount of time, as well as get it back down to idle quicker. Intel is billing it as improved responsiveness, and it’s pretty clear that they have achieved that.

The one missing link is operating system support. We’ve been told that the patch to enable this is coming to Windows 10 in November. While this short piece looks at what Speed Shift can bring to the table in terms of performance, if you'd like to read more about how it is implemented, please check out the Skylake architecture analysis which goes into more detail.

Update: Daniel Rubino at Windows Central has tested the latest Windows 10 Insider build 10586 and it appears to enable Speed Shift on his Surface Pro 4, which is in-line with the November timeline we were provided.

POST A COMMENT

54 Comments

View All Comments

  • willis936 - Friday, November 6, 2015 - link

    Sorry? You should be. Bad design would be saying "welp. I can dissipate 20W of heat sustained. Better make the absolute maximum heat output 20W." Nice attempt though. Reply
  • xthetenth - Friday, November 6, 2015 - link

    Exactly, you can't make use of the full capability of the thermal solution in non-sustained workloads unless you can burst above the power use of a sustained workload. Reply
  • xthetenth - Friday, November 6, 2015 - link

    Let's turn this on its head. Let's say you've got a mobile device where the cooling is sufficient to sustain full load without throttling. That means that if you're running a bursty workload, you're leaving performance on the table because it won't be using the maximum the chip is capable of all the time, so by definition the device is going to be underperforming whenever the load isn't constantly the maximum.

    In order to attain maximum performance in both bursty and sustained workloads, the device needs to be able to handle a given amount of heat. Both bursty and sustained workloads need to average that heat, and therefore bursty workloads need to be able to spike above the sustained maximum power.

    So not only is it not bad design to make a device that can clock higher in short bursts than in sustained workloads, it is bad design to make one that cannot because it is leaving power on the table.

    And this is why we shifted toward clocks that can be dynamically increased over the designed maximum sustained clock.
    Reply
  • PaulHoule - Saturday, November 7, 2015 - link

    It depends on what kind of device you are making.

    I have a desktop replacement laptop that got its fan full of dust and it was getting hot and going into thermal throttling. I took it apart and puffed some freon into the fan and when I put the machine back together the CPU started running above 3GHz again.

    For tablet applications you are going to go fanless so the ultimate limit on heat removal is how much you can dissipate through the exterior surface so there isn't a lot of room for "good engineering". (Put the fan back in and you might as well put in a spinning HDD, open up more holes in the case, and pretty soon you have one of these convertable computers that nobody wants to buy and that exist just to confuse people like flight attendants.)
    Reply
  • TomWomack - Sunday, November 8, 2015 - link

    If your mobile computer is regularly under load, you're running inappropriate software. For a more realistic situation - say, you apply a complicated Photoshop filter or MSVC compile, which loads the processor to 100% for twenty seconds, then look at the result for twenty seconds to see what to do next - then a cooling solution which lets the chip get hot over the twenty seconds and cool again during the thinking time is a pretty neat idea.

    Leave video transcoding to the farms of non-mobile processors at Netflix, who only have to do it once.
    Reply
  • emn13 - Sunday, November 8, 2015 - link

    In principle you're right, but your examples are too heavy a load (i.e. more like kraken than webxprt). If your workload takes significantly longer than 100ms (the time needed to increase clockrate without speedshift, apparently), then the approximately 65ms speedshift saves isn't going to matter. MSVC compiles and even complicated photoshop filters likely fall in that territory.

    It's going to matter doing a web-page load, however - that takes a fraction of a second on a modern PC, so saving 65ms might matter. It'll matter doing a simple photoshop filter.

    But anything that takes a second or more? You're not going to notice the (at best) 65ms improvement.
    Reply
  • danwat1234 - Saturday, October 15, 2016 - link

    You calling Rosetta@home and Folding@home inappropriate software? lol Reply
  • FalcomPSX - Monday, November 9, 2015 - link

    its not so much bad design, as poor system choice if your workloads exceed what the machine can realistically handle. A thin and light ultrabook shouldn't be expected to sit at full turbo frequency for extended periods of time, they don't have the cooling capacity for it. The burst performance is there when needed, but if you are running tasks that will max the cpu for extended periods of time, perhaps a different choice of system is more appropriate than complaining that physics of power consumption and cooling requirements don't match your desire for something thin and light. Mobile stuff is getting better all the time but it will ALWAYS lag behind a machine with better cooling, be it a desktop, or more substantial laptop. Reply
  • Shadowmaster625 - Friday, November 6, 2015 - link

    So if I disable speed step on my machine and force it to run at max clock all the time, I should see massively increased performance in WebXPRT because I am simulating what speed shift is doing, and then some. Reply
  • Ian Cutress - Friday, November 6, 2015 - link

    You'll also lose a lot of power by virtue of remaining at high clocks when idle, as well as producing extra heat which limits your sustained performance time when at load. In a desktop, that might not matter much if you have sufficient cooling, but it becomes important in a mobile device. Reply

Log in

Don't have an account? Sign up now