It seems only fitting that one of the two hardware based exploits to rock the CPU world this week was named Meltdown. Because for the last 24 hours or so, it feels like I’ve been on the verge of one just trying to keep up with all of the new information that has come out on this and the also aptly named Spectre exploit. Suffice it to say, it’s the kind of week we haven’t seen for a long time in the technology industry. But I’m getting ahead of myself, so let’s start at the beginning.

Security researchers working for Google’s Project Zero group, along with other research groups and academic institutions, have discovered a series of far-ranging security risks involving speculative execution. Speculative execution is one of the cornerstones of high-performance execution on modern CPUs, and is found in essentially all CPU designs more performant than an embedded microcontroller. As a result, essentially every last high-performance CPU on the market or that has been produced in the last couple of decades is vulnerable to one or more of a few different exploit scenarios.

The immediate concern is an exploit being called Meltdown, which primarily affects Intel’s CPUs, but also has been confirmed to affect some ARM CPU designs as well. With Meltdown it is possible for malicious code to abuse Intel and ARM’s speculative execution implementations to get the processor to leak information from other processes – particularly the all-knowing operating system kernel. As a result, Meltdown can be readily used to spy on other processes and sneak out information that should be restricted to the kernel, other programs, or other virtual machines.

Meanwhile a second class of attacks is being called Spectre, and the number of processors at risk for exploitation is even wider. Essentially every high-performance processor ever made – Intel, AMD, ARM, and POWER – is thought to be vulnerable here. Like Meltdown, a Spectre attack abuses speculative execution in order to glean information that should be restricted. What makes Spectre different however is that it’s a less-straightforward but much more insidious attack; whereas Meltdown is based on abusing specific implementations of speculative execution, Spectre can be thought of as a (previously unknown) fundamental risk of speculative execution, one that can now be weaponized. Spectre requires more setup work to coerce a target application to leak information, but the fundamental nature of the risk means that Spectre is currently considered harder to mitigate, and in general is not as well understood.

Between Meltdown and Spectre, the end result is that prior to patching and mitigation efforts, virtually every PC and every mobile device is thought to be vulnerable to these attacks. And because the root causes are based in hardware architecture rather than software, these are not as easily fixed as software bugs. The good news is that it looks like the worst of these attacks can be mitigated in a combination of software and CPU microcode updates. However the scale of the problem – virtually all computers and mobile devices currently in use – means that it will take quite a bit of effort to mitigate.

An Early Release Leads To An Incomplete Picture

The way that the news about Meltdown and Spectre have been released was unplanned, and as a result researchers, hardware vendors, software vendors, and the public at large have been trying to catch up on everything that is going on. We now know that the Meltdown and Spectre attacks were discovered last summer, and since then vendors and security researchers have been coordinating their response in order to better understand the ramifications of the exploits and to ensure everyone had time to develop the necessary fixes and guidance. Until yesterday, information on Meltdown and Spectre was not intended to be published until Tuesday, January 9th, which would have been the first Patch Tuesday of the year.

Instead, what happened is that as the mitigation patches for the exploits were committed to the Linux kernel repository, people quickly began piecing together the idea that something was wrong. With both correct and incorrect speculation quickly taking over, the vendors moved up their information and patch releases on the exploits to yesterday. The result, as you might expect from a premature publication, was somewhat haphazard – I was reading new advisories well into the evening – and even now not everyone has published advisories (or at least, not published that would be normal for full advisories).

As a result, I’m still piecing together information, and it’s likely by the time I’ve finished writing this, something in this article will be out of date or wrong. But as of right now, this is everything we know as of this morning.

DON’T PANIC

It is said that despite its many glaring (and occasionally fatal) inaccuracies, the Hitchhiker's Guide to the Galaxy itself has outsold the Encyclopedia Galactica because it is slightly cheaper, and because it has the words 'DON'T PANIC' in large, friendly letters on the cover.

Before diving into some of the deeper technical aspects of Meltdown and Spectre, I think it’s best to start with a high level overview of where things stand. This includes the risks posed by these exploits, how they’re being mitigated, and what individual users need to do (if anything) about the issue.

The good news is that unless you're a cloud service provider, the immediate risk from these attacks is very low. The bad news is that because these exploits are based on hardware vulnerabilities, they will take some time to fix. And there are a lot of devices running a lot of different OSes out there that need to be fixed.

Things We Know

These are local attacks: Both Meltdown and Spectre are local attacks that require executing malicious code on a target machine. This means that these attacks are not (directly) drive-by style remote code execution attacks – think Nimda or Code Red – and that systems cannot be attacked merely by being connected to a network. Conceptually, these are closer to privilege escalation attacks, a class of attacks that help you get deeper into a system you already have access to. With that said however, researchers have shown that they can perform Spectre-based attacks using JavaScript, so it is possible for a web browser to pull a malicious JavaScript file and then attack itself in that fashion.

These are read-only (information disclosure) attacks: Along with not directly being remotely exploitable, even if Meltdown and Spectre attacks are executed on a local system, the nature of the exploit is that these are read-only attacks. That is, they can only read information from a system. They cannot directly force code execution in the OS kernel, in other virtual machines, or other programs. These sort of information disclosure attacks can still be devastating depending on what information is leaked – and there is always the risk of using that information to then chain it into a code execution attack – which is why they’re still concerning. But the real risk is in hostile parties using these attacks to steal information, not to control a system.

The principal threat is to shared hosting environments: Given the above, these attacks are most threatening to shared hosting environments, where multiple users are all capable of executing code on a single system. As a result, cloud service providers like Amazon and Microsoft have already deployed attack mitigation efforts to their services. Individual systems and devices, by comparison, have a much lower practical risk. An attacker still needs to get malicious code executing on an individual system before they can run an attack; and if an attacker can do that, then an individual system is already in a bad position to begin with.

Meltdown and Spectre can be mitigated in software: Because the root issues at the heart of Meltdown and Spectre are at the hardware level, ideally, that hardware needs to be replaced. As replacing 20 years of systems isn’t remotely practical however, like other CPU errata it can be mitigated in a combination of CPU microcode and operating system updates. Vendors like Microsoft, Apple, and the Linux distros are already in the process of rolling out some of these fixes, including an ultra-rare out of band security update from Microsoft Wednesday evening.

…but it will take some time: However because information on the exploit was released earlier than planned, not all mitigation efforts are ready. Full mitigation requires both software and microcode updates, and as Intel has noted in their own announcements, mitigation efforts will take days and weeks.

Mitigating Meltdown will have a variable performance impact: In a nutshell, the mitigation efforts for Meltdown involve better separating user space programs from the OS kernel. As a result, context switches between the user space and the kernel will get more expensive. However the actual performance impact of this process is going to vary with the workload and the CPU architecture.

It’s a bad week to be Intel: Intel has been the dominant CPU supplier for cloud service providers, whom in turn are at the greatest risk from Meltdown and Spectre. Coupled with the fact that Intel is far more broadly affected by the more pressing Meltdown attack than any other vendor, and this means that Intel’s high margin server customers are shouldering the brunt of the risk of these attacks. And that in turn reflects poorly on Intel.

Meanwhile, as for what CPUs are affected, Intel's most recent note confirms that everything Core architecture-based back to the first generation (Nehalem) is affected. Note that this doesn't exclude earlier processors either.

It’s a better week to be AMD (but not great): Conversely, AMD is having a much better week. As best as can be determined, their CPUs aren’t vulnerable to Meltdown attacks – the only vendor among the Big 3 not to be impacted by it. And with Meltdown being the more pressing risk, this means that AMD and its users aren’t nearly as exposed as Intel users. However AMD’s CPUs are still vulnerable to Spectre attacks, and in the long run there are still a lot of unknowns about how dangerous Spectre really is, and how well it can be mitigated.

Things We Don’t Know

When the full mitigation updates will be available for any one platform: As noted above, rolling out updates to mitigate the exploits will take days and weeks. On the PC side of matters, Microsoft’s update is just one piece of the puzzle – though arguably the most important one – as it can mitigate Meltdown without a microcode update. However we’re still waiting on microcode updates to better mitigate Spectre on Intel processors (and it looks that way for AMD processors as well).

Otherwise on the mobile side of manners, Google has announced that they’ve already rolled out Android updates with ARM’s recommended mitigations to supported Nexus and Pixel devices, but these updates don’t include all of the necessary upstream fixes from the Linux kernel.

The performance impact of these mitigations is unclear: Following up on the point about mitigating Meltdown, it’s not clear what the full performance impact of this will be. Individual operations and workloads could potentially be upwards of 30% slower, but it heavily depends on how often a task is context switching to the kernel. I expect the average real-world impact to be less, particularly for desktop users. However server users and their unique-but-narrowly-focused workloads could be much more affected if they're particularly unlucky.

Meanwhile, the performance impact of Spectre mitigations is even less understood, in part because efforts to mitigate Spectre are ongoing. Based on what the hardware vendors have published, the impact should be minimal. But besides the need for empirical testing, that could change if Spectre requires more dramatic mitigation efforts.

On our end we’re still waiting on some microcode updates, and even then it’s a process of trying to figure out what an average case will even entail. It’s definitely something we intend to dig into in true AnandTech style once we have the requisite updates.

Finally, for their part, Microsoft's Azure group has published their own figures in their advisory to customers, noting that the performance impact they've found thus far is quite limited.

The majority of Azure customers should not see a noticeable performance impact with this update. We’ve worked to optimize the CPU and disk I/O path and are not seeing noticeable performance impact after the fix has been applied. A small set of customers may experience some networking performance impact.

It’s not clear just what the full security ramifications of Spectre are: While Meltdown is the more immediate threat, how it works and how to mitigate it are fairly well documented. Spectre however is a definite wildcard right now. There are multiple proof of concept attacks as it stands, but more broadly speaking, Spectre attacks are a new class of attacks not quite like anything vendors have seen before. As a result no one is completely confident that they understand the full security ramifications of the exploit. There is a risk that Spectre attacks can be used for more than what’s currently understood.

It’s also not clear just how well Spectre can be mitigated: The corollary to not fully understanding the attack surface of Spectre is that defending against it is not fully understood either. The researchers behind the attack for their part are not convinced that software or microcode updates are enough to fully resolve the problem, and are advising that they should be treated as stop-gap solutions for now. Specific types of Spectre attacks can be mitigated with care, but those protections may not help against other types of Spectre attacks. It’s an area where a lot more research needs to be done.

What Can Users Do Right Now?

Finally, there’s the question of what system and device owners should do about the Meltdown and Spectre attacks. The fundamental weakness that allows these speculative execution attacks is in the hardware itself, so short of replacing devices, the problem cannot be truly solved. The only thing that can be done right now is to mitigate the problem with software and microcode patches that attempt to work-around the problem.

The solution then is a double-edged sword for users: there’s not much they can do, but there’s also not much they have to do. The software and microcode updates to mitigate these exploits will be distributed as software updates, so keeping your systems and mobile devices up-to-date with the latest OS version is the single most important step one can take. As mentioned earlier, everyone has or is in the process of rolling out the necessary software updates to help mitigate this. The flip side is that, short of a focused ion beam tool, there’s not much else for users to do. This is a problem whose resolution is going to be vendor-driven.

More to Come

Be sure to check back later today for some additional information on these attacks. In particular, I’m looking to further explore the importance of speculative execution, and why an attack against it may have some significant ramifications for CPU designs down the line. Speculative execution is a rather important feature for boosting performance on modern processors, but as these attacks have shown, even when (seemingly) ideally implemented, it can have some security tradeoffs. Which in this current era of computing, may no longer be acceptable tradeoffs.

I’ll also be taking a deeper look at Meltdown and Spectre, and how although both of them are attacks on speculative execution, they are very different kinds of attacks. Meltdown is the biggest concern right now due to its relative ease in exploiting it. But as the authors of the Spectre paper note, Spectre is a more fundamental vulnerability that will be with us a lot longer. Just what it means for system security remains to be seen, but from a computer science point of view, I expect these latest discoveries are going to have a big impact on how future CPUs are designed.

POST A COMMENT

210 Comments

View All Comments

  • linuxgeex - Saturday, January 06, 2018 - link

    It doesn't matter the methodology that the feature is implemented by. The failure is that speculative lookups are happening without access controls being validated before the code is executed. The validations are instead happening after the code is executed but before it is retired. This is done for latency reasons, which affects pipeline depth and overall CPU efficiency. It's easy to do the access control checks up front but CPUs will run slower and use more power... and that is how it will be from now on for general-purpose processors which may run untrusted code. This creates a new market for less secure, higher performing, more efficient general-purposes processors down the road. There's lots of tasks we can safely run on today's processors... just not an operating system or a web browser with a JIT. Reply
  • BillyONeal - Thursday, January 04, 2018 - link

    The timing attack isn't on when the exception is delivered for the invalid access. It's on an ordinary data access afterward Reply
  • linuxgeex - Saturday, January 06, 2018 - link

    Exactly, for example in x86 it's a result of TSC timestamps either side of a perfectly valid read, determining whether or not it is already cached, after using the side-channel attack to cache (or not) the target code. Hammer the cache and you can make it reveal code location down to a single cache line. Reply
  • crazylocha - Thursday, January 04, 2018 - link

    Thank you Ryan.
    I know you couldn't have had much sleep last night or this morning. The forum was wondering how long it would take.

    Now I need to grab my beach towel and put on my bathrobe and will be ready.
    Reply
  • watersb - Thursday, January 04, 2018 - link

    :-) Reply
  • mobutu - Thursday, January 04, 2018 - link

    Is there any list of the new parts that are 100% protected from this?
    Like from what intel cpu generation upwards are protected? amd? etc
    Reply
  • Pork@III - Thursday, January 04, 2018 - link

    VIA processors? :D Reply
  • Lord of the Bored - Thursday, January 04, 2018 - link

    Via's probably vulnerable. Just assume if it has cache and out-of-order execution that it is vulnerable. Reply
  • phoenix_rizzen - Friday, January 05, 2018 - link

    Quick, everyone replace their desktops with Intel Atom and ARM Cortex-A53/A55 CPUs! In-order CPUs for the win! :) Reply
  • linuxgeex - Saturday, January 06, 2018 - link

    It's speculation (branch/return prediction) that's the problem, not out-of-order, but yes the combination of the two, plus NUMA access delays, is what makes Spectre so powerful. big.LITTLE has large enough NUMA access delays that even the A53 is vulnerable because o-o-o isn't required if the speculation window is large enough. o-o-o just enhances the size of the speculation window by letting more work be done. Reply

Log in

Don't have an account? Sign up now