Thread Director: Windows 11 Does It Best

Every operating system runs what is called a scheduler – a low-level program that dictates where workloads should be on the processor depending on factors like performance, thermals, and priority. A naïve scheduler that only has to deal with a single core or a homogenous design has it pretty easy, managing only power and thermals. Since those single core days though, schedulers have grown more complex.

One of the first issues that schedulers faced in monolithic silicon designs was multi-threading, whereby a core could run more than one thread simultaneously. We usually consider that running two threads on a core usually improves performance, but it is not a linear relationship. One thread on a core might be running at 100%, but two threads on a single core, while overall throughput might increase to 140%, it might mean that each thread is only running at 70%. As a result, schedulers had to distinguish between threads and hyperthreads, prioritizing new software to execute on a new core before filling up the hyperthreads. If there is software that doesn’t need all the performance and is happy to be background-related, then if the scheduler knows enough about the workload, it might put it on a hyperthread. This is, at a simple level, what Windows 10 does today.

This way of doing things maximizes performance, but could have a negative effect on efficiency, as ‘waking up’ a core to run a workload on it may incur extra static power costs. Going beyond that, this simple view assumes each core and thread has the same performance and efficiency profile. When we move to a hybrid system, that is no longer the case.

Alder Lake has two sets of cores (P-cores and E-cores), but it actually has three levels of performance and efficiency: P-cores, E-Cores, and hyperthreads on P-cores. In order to ensure that the cores are used to their maximum, Intel had to work with Microsoft to implement a new hybrid-aware scheduler, and this one interacts with an on-board microcontroller on the CPU for more information about what is actually going on.

The microcontroller on the CPU is what we call Intel Thread Director. It has a full scope view of the whole processor – what is running where, what instructions are running, and what appears to be the most important. It monitors the instructions at the nanosecond level, and communicates with the OS on the microsecond level. It takes into account thermals, power settings, and identifies which threads can be promoted to higher performance modes, or those that can be bumped if something higher priority comes along. It can also adjust recommendations based on frequency, power, thermals, and additional sensory data not immediately available to the scheduler at that resolution. All of that gets fed to the operating system.

The scheduler is Microsoft’s part of the arrangement, and as it lives in software, it’s the one that ultimately makes the decisions. The scheduler takes all of the information from Thread Director, constantly, as a guide. So if a user comes in with a more important workload, Thread Director tells the scheduler which cores are free, or which threads to demote. The scheduler can override the Thread Director, especially if the user has a specific request, such as making background tasks a higher priority.

What makes Windows 11 better than Windows 10 in this regard is that Windows 10 focuses more on the power of certain cores, whereas Windows 11 expands that to efficiency as well. While Windows 10 considers the E-cores as lower performance than P-cores, it doesn’t know how well each core does at a given frequency with a workload, whereas Windows 11 does. Combine that with an instruction prioritization model, and Intel states that under Windows 11, users should expect a lot better consistency in performance when it comes to hybrid CPU designs.

Under the hood, Thread Director is running a pre-trained algorithm based on millions of hours of data gathered during the development of the feature. It identifies the effective IPC of a given workflow, and applies that to the performance/efficiency metrics of each core variation. If there’s an obvious potential for better IPC or better efficiency, then it suggests the thread is moved. Workloads are broadly split into four classes:

  • Class 3: Bottleneck is not in the compute, e.g. IO or busy loops that don’t scale
  • Class 0: Most Applications
  • Class 1: Workloads using AVX/AVX2 instructions
  • Class 2: Workloads using AVX-VNNI instructions

Anything in Class 3 is recommended for E-cores. Anything in Class 1 or 2 is recommended for P cores, with Class 2 having higher priority. Everything else fits in Class 0, with frequency adjustments to optimize for IPC and efficiency if placed on the P-cores. The OS may force any class of workload onto any core, depending on the user.

There was some confusion in the press briefing as to whether Thread Director can ‘learn’ during operation, and how long it would take – to be clear, Thread Director doesn’t learn, it already knows from the pre-trained algorithm. It analyzes the instruction flow coming into a core, identifies the class as listed above, calculates where it is best placed (which takes microseconds), and communicates that to the OS. I think the confusion came with the difference in the words ‘learning’ and ‘analyzing’. In this case, it’s ‘learning’ what the instruction mix to apply to the algorithm, but the algorithm itself isn’t updated in the way that it is ‘learning’ and adjusting the classes. Ultimately even if you wanted to make the algorithm self-learn your workflow, the algorithm can’t actually see which thread relates to which program or utility – that’s something on the operating system level, and down to Microsoft. Ultimately, Thread Director could suggest a series of things, and the operating system can choose to ignore them all. That’s unlikely to happen in normal operation though.

One of the situations where this might rear its head is to do with in-focus operation. As showcased by Intel, the default behavior of Windows changes depending on whether on the power plan.

When a user is on the balanced power plan, Microsoft will move any software or window that is in focus (i.e. selected) onto the P-cores. Conversely, if you click away from one window to another, the thread for that first window will move to an E-core, and the new window now gets P-core priority. This makes perfect sense for the user that has a million windows and tabs open, and doesn’t want them taking immediate performance away.

However, this way of doing things might be a bit of a concern, or at least it is for me. The demonstration that Intel performed was where a user was exporting video content in one application, and then moved to another to do image processing. When the user moved to the image processing application, the video editing threads were moved to the E-cores, allowing the image editor to use the P-cores as needed.

Now usually when I’m dealing with video exports, it’s the video throughput that is my limiting factor. I need the video to complete, regardless of what I’m doing in the interim. By defocusing the video export window, it now moves to the slower E-cores. If I want to keep it on the P-cores in this mode, I have to keep the window in focus and not do anything else. The way that this is described also means that if you use any software that’s fronted by a GUI, but spawns a background process to do the actual work, unless the background process gets focus (which it can never do in normal operation), then it will stay on the E-cores.

In my mind, this is a bad oversight. I was told that this is explicitly Microsoft’s choice on how to do things.

The solution, in my mind, is for some sort of software to exist where a user can highlight programs to the OS that they want to keep on the high-performance track. Intel technically made something similar when it first introduced Turbo Max 3.0, however it was unclear if this was something that had to come from Intel or from Microsoft to work properly. I assume the latter, given the OS has ultimate control here.

I was however told that if the user changes the Windows Power Plan to high-performance, this behavior stops. In my mind this isn’t a proper fix, but it means that we might see some users/reviews of the hardware with lower performance if the workload doing the work is background, and the reviewer is using the default Balanced Power Plan as installed. If the same policy is going to apply to Laptops, that’s a bigger issue.

Cache and Hybrid Designs DDR5: Detailed Support, XMP, Memory Boost
Comments Locked

395 Comments

View All Comments

  • DigitalFreak - Wednesday, October 27, 2021 - link

    Fanboi says what?
  • Kangal - Friday, October 29, 2021 - link

    What?

    But to get a little serious, I don't think Intel is going to win with their big.LITTLE architecture. I feel like ARM has a huge lead on the 15W (or less) demographic. So it would make sense for x86 designers to double-down on their performance lead in the higher thermal envelope. That's what AMD is (seemingly) going for with its focus on lower-latency Infinity Fabric, +5nm node to clock higher, and their 3D-Stacking of Cache. Not to mention all the help from DDR5, Pcie 5, nVme, Wifi 6 etc etc.

    Intel's approach will win them back the Laptop segment, but they won't be winning the tablet segment back from ARM. And even the Gaming Laptop segment won't be an outright victory against AMD's offerings, not to mention the New MacBook Pros. If anything, Intel should have capitalised on their Atom efficiency cores, and do little.BIG computing in like 2018.

    Servers is a position where Intel may see improvements. But it's still in favour of AMD for now and the near future. The bigger threat comes from next-gen ARM-servers. I doubt anything from the left-field will come, RISC-V is still a paperlaunch/niche for the next few years.

    So while I think Intel is (FINALLY) becoming competitive against AMD, I don't think they have enough to go on. Their node is still inferior. Their Xe-Graphics are still inferior to RDNA-2. And they still lag behind AMD's Cores when you factor in Infinity-Fabric and 3D-Cache. Not to mention that the system/kernel is not quiet optimised yet (let alone individual programs) when thinking about Windows11.

    For now, we have to choose from:
    Android, iOS, macOS, Windows
    RISC-V, ARM, Apple-ARM, Intel, AMD.
    ARM-Mali, PowerVR, Apple-Graphics, Nvidia, AMD RDNA.
  • Silver5urfer - Thursday, October 28, 2021 - link

    What is this fanboy junk...sigh.

    ADL demands Windows 11 POS, you want to shill for the HW which demands installing a strictly mobile junk copied OS with zero respect to computing factor on top where they are saying VBS is mandatory on all OEM machines and purposefully nuked AMD L3 performance ?

    I have a positive opinion on this ADL but it has insane changes, like Intel ITD drama who wants to endure that band aid solution of Intel with 2 layer system in between the OS and CPU. On top the major issue being socket longevity. How long this socket will retain it's value and will Intel release another Z790 next year ? No idea.

    Now for your AMD bashing, Zen 3 wiped the floor with 2 generations of processors yeah they have bugs while OCing and DRAM tuning, but if you run at stock no issues and performs very well competitively. And for the ADL performance, it's honestly a joke. Because ADL has small trash cores since Intel wants to sell more BGA junk and they cannot beat the performance with more cores due to 10nm heat.

    Raptor Joke lol so ADL CPU is going to be EOLed under a year lmao, just like 11900K ? 2 CPUs in succession. While 10900K still stands strong. That's Intel for you. Meanwhile AMD's Zen 3 is now ready for 2022 action as well with AM4 and 3D V Cache. Keep using the yearly socket refresh and chipset refresh and CPU refresh while coming here and spout nonsensical load.

    Finally pay up DDR5 tax and premium premature trash DDR5 quality, by 2023-2024 DDR5 will be matured and all ADL buyers will weep hard.

    Now for the closure. Zen 4 is going to steamroll over Raptor Joke, 100% garunteed. Do you think these companies operate without knowing what their competitor is doing ? they operate 2 years ahead of cycle internally. Plus AM4 experience is very important for AMD to fix the bugs from Platform to CPU. Ultimately they cleared out saying we are not using joker big little design. A full far Zen 4, massive price increase is also coming from them, the IPC boost and the ST SMT performance will send fanboys to darkages.
  • Silver5urfer - Thursday, October 28, 2021 - link

    I forgot to post important thing, be happy that you have AMD as competition else Intel would have been selling you 4C CPUs even in 2021 and AMD is pushing x86 to next level, if that design dies or stagnates PC will die. Keep the x86 alive if you want to own a computer not a consumable garbage ARM product.
  • MaxIT - Thursday, October 28, 2021 - link

    That works both ways: AMD dominion is not welcomed in the same way. Did you see what AMD did with prices ? AMD and Intel are the same: when they think they are above competitors, they start taxing customers. Let them fight to prevail: we customers will be the winners
  • Qasar - Thursday, October 28, 2021 - link

    " Did you see what AMD did with prices ? " you referring to the $50 price increase between zen 2 and 3 ? 50 bucks is nothing, compared to how much intel kept raising their prices over the years before zen 1 was released. but yet, very few complained about that.
  • Oxford Guy - Friday, October 29, 2021 - link

    That $50 is a response to the inflation that has been happening from all of the Covid money printing.
  • mode_13h - Friday, October 29, 2021 - link

    It's not only money-printing. There are legit shortages due to outbreaks in factories, and worker-shortages in certain sectors.

    I suspect one reason for the trucker shortage, in the US, is that truck drivers tend to be older and overweight, which are both risk factors for complications from Covid-19 (which the nature of their job also increased their exposure towards). So, I truly wonder how much the US truck driver shortage is due to drivers unable to continue performing their duties due to complications (or death).
  • Spunjji - Friday, October 29, 2021 - link

    @mode_13h - It's a good point. A lot of chatter about the effects of COVID seems to ignore how many people more than usual died. It's not world-war levels of death, but systems subject to stress have to eat into margins to cope, and a lot of the world's financial and supply-chain systems were already under stress from tariffs and sustained economic strife when COVID hit - so there weren't a lot of margins left.
  • mode_13h - Saturday, October 30, 2021 - link

    @Spunjji a lot more people have long-term effects from Covid-19 than the ones who died. Death is just the worst outcome, but there are many people unable to function at the same level as before. And I'm not only talking about "long Covid", where the immune system seems to be stuck in an overstimulated state, but other sorts of cardiovascular and organ damage it can cause.

Log in

Don't have an account? Sign up now