TSMC: Outbreak of Malware That Triggered Delays & Losses Caused by Software for New Tool

by Anton Shilov on August 9, 2018 3:00 PM EST

41 Comments | Add A Comment

41 Comments

TSMC announced this week that it suffered a computer malware outbreak, resulting in a roughly 3 day outage for parts of the fab while systems were restored. As a consequence of the downtime, the fab expects certain shipments delays and additional charges. Specifically, because of the interruptions and costs, the company’s Q3 revenue and gross margin will be 2% and 1% lower than anticipated respectively. TSMC later clarified that the outbreak was caused by “misoperation” during the software installation for a new piece of equipment.

What Happened?

TSMC’s personnel set up a new manufacturing tool on Friday, August 3, and then installed software for the device. The machine was not isolated and confirmed to be malware-free before connecting it to TSMC’s internal network. Consequently, the introduction of a malware-infected machine to TSMC's internal production network allowed the malware to quickly spread and infect computers, production equipment, and automated materials handling systems across TSMC’s fabs.

According to the chipmaker, the malware was a variant of the WannaCry ransomware cryptoworm. WannaCry, though over a year old at this point, still has the ability to propogate among any remaining unpatched systems, which is what happened here: the malware infected Windows 7-based machines “without patched software for their tool automation interface.” As a consequence, the affected equipment either crashed, or rebooted continuously, essentially being inoperable.

TSMC has been stressing that not all of its tools and automated materials handling systems were affected, and that degree of infection varied by fab. The company had to shut down infected equipment and apply patches. By 2 PM Taiwan time on Monday, 80% of the impacted tools had been recovered and TSMC said that it would mend all of them by Tuesday.

The Impact

Since the said tools are located across multiple fabs and are therefore are used to process wafers using a variety of process technologies for different customers, it is evident that the outbreak affected delivery schedules for many chips. As a consequence, the company had to notify its customers and reschedule their wafer delivery dates. Some of the delayed wafers will be delivered not on Q3, but in Q4, thus affecting product launch plans.

None of TSMC's well-known customers are currently commenting on the matter, but this event has occured with what's widely believed to be the ramp-up periods for new chips from Apple and NVIDIA. Since at least some of TSMC’s production tools were offline for four to five days, it is evident there will be impact, though it is hard to estimate how significant it will be.

What remains to be seen is how several-day outage of numerous semiconductor production tools is set to affect TSMC’s customers in general. After all, 2% of TSMC’s Q3 revenue is between $169 and $171 million and that is a lot of money. We will likely learn more about the effect of the malware outbreak in the coming months.

(ed: As an aside, I find it very interesting that this entire episode was essentially happenstance, rather than some kind of targeted attack as would typically be the case. WannaCry is over a year old and is self-propagating; so as a proper worm, it goes wherever it can, whenever it can. In fact with the release of patches over a year ago, WannaCry's primary function is done. So for TSMC this is the IT equivalent of stepping on a landmine from a long-forgotten war, and reinforcing the fact that advanced malware can be dangerous to the public long after it has done its job. -Ryan)

41 Comments

View All Comments

Alexvrb - Friday, August 10, 2018 - link
I was referring specifically to TSMC's systems. In addition I qualified that they were generally secure from OUTSIDE attacks. Unfortunately they rely too much on employees not failing at their job.

That said, airgapping a system from any outside network does boost security. By itself it does not secure a system, nor did I claim it magically secured the system by itself. If any system *at all* can be considered to be generally secure, then an offline system can be at least as secure, and then some. If you don't believe that is possible period (a different argument altogether), well then it can at least be built to be as secure as the most secure online system, and then some.
dshess - Thursday, August 9, 2018 - link
It's easy to say "OMG, you're running unpatched Windows 7!!!1!1!oneoneone!", but ... imagine the joy of having to qualify an individual patch on a $10B fab. You can't really canary it, you probably don't have a second preproduction system to qualify on, etc. And the individual bits and bobs running on Windows 7 can probably often have a wordwide installed base in the dozens, so you can't rely on bake time to prove things out, either. So once you've qualified things at a particular patch level, you most likely leave it at that patch level FOREVER, and introduce an elaborate procedure for vetting new systems to make sure they don't introduce any unknowns.

I think it's a real stretch to assume that they're just winging it on this. I wouldn't want to touch this problem with a ten foot pole.
Alexvrb - Thursday, August 9, 2018 - link
Agreed. For most use cases I would encourage people to keep their systems patched and run a fairly current (and supported) OS. But mission critical industrial systems running custom software? Yeah, it's a little more complicated. Especially when the systems are offline - the risk is very low. The only reason they got hit was their personnel didn't scan the machine before tossing it on their network. As you said, vette new systems. I'd be willing to bet they had such a procedure in place, and the human element failed.

I will say that if and when possible, run your software on VMs.
HollyDOL - Friday, August 10, 2018 - link
It has been a long time since I have been any near that topic, but are VMs these days capable of running hard/soft realtime requiring applications? It used to be quite an issue.
mapesdhs - Friday, August 10, 2018 - link
Someone in the movie industry told me this week that pro apps running on VMs are now doing a better job at allocating machine resources than the native OS, ie. it's actually slower running on bare metal. I guess it's easier to add nuance of a hw platform into a VM than it is into an OS.
HollyDOL - Friday, August 10, 2018 - link
You get me wrong, I don't speak about overall performance, but latency contract.
In industrial applications, you have requirement to perform certain operation at precise time (imagine assembly line operations for example) and you want to execute your device every x ms for y ms... in that case you need to have those operations scheduled at exact time intervals. Normal scheduler is not able to do that and you need realtime scheduler. Trouble I speak about is with VM you have pretty much one scheduler scheduling another scheduler which brings quite a bit issues on applications like these. I can only imagine high level chip manufacturing is significantly more demanding than petrochemical plant (where I have seen these requirements).
e_sandrs - Friday, August 10, 2018 - link
Not sure it solves all problems, but I know recent VMware has functionality to kinda skip the virtualization layer -- allowing target VMs to directly address hardware. I would think that would fix the double scheduler layer and latency, but again, I haven't had to implement it. Activating direct hardware access reduces some of the portability of the VM, but when the purpose is quicker restore from VM failure on the same or similar hardware, those compromises are probably tolerable.
JBrickley - Friday, August 10, 2018 - link
Um... VMs do not run as fast as bare metal. There is a performance hit due to the overhead in virtualizing hardware. It only works because Intel CPUs added a lot of black magic to make it work effectively. What you gain in using VMs is allocating all available physical resources such as RAM, CPU, and storage. Say you have a server running something and you only use 25% of the resources on that physical server. Well setup a hypervisor on the server and install multiple VMs each with optimized servers with just the necessary allocated RAM, CPU, & Storage and now you can take up all the physical servers resources with multiple VMs. Nothing wasted. This saves on rack space, cooling, electricity, heat, etc.

But with these manufacturing machines each with it's own operating system and custom software cannot be virtualized. Why do they run vulnerable Windows versions? Because it's easy to write code for them and once a machine is built and installed, you really never need to change the software so they go many years until the whole machine is replaced running an unmatched Windows version. You can't upgrade the OS or even patch it without potentially breaking the machine. This is a serious problem now that all these machines are networked together. The worm jumped from machine to machine and killed a whole lot of them on the chip fab assembly lines. It was a nightmare scenario. Personally, I think they should all be running an embedded Linux that's hardened and requires software to be signed before it's allowed to be executed, etc. The only reason Windows is being used is it is easier to find programmers for it. I bet they still have Visual Basic applications powering these machines. I know of other manufacturing environments where a real old version of WinXP is in use across older machines. They cannot be easily upgraded, the software is likely doing things outside the norm of best practices, etc. If you were to even patch them you risk breaking them.
Alexvrb - Friday, August 10, 2018 - link
"But with these manufacturing machines each with it's own operating system and custom software cannot be virtualized."
Why not? If you've got a Win7 machine driving software that's driving a piece of equipment, why would it be impossible to virtualize that Win7 machine (and have newer/more secure underlying software beneath the VM)?
Icehawk - Friday, August 10, 2018 - link
Real or virtual doesn’t affect the patchability of an OS. My work uses tons of outdated software either frozen at a certain version or no longer updateable. That has been my experience at the prior two companies I worked for - it gets very expensive in time and money to update.

TSMC: Outbreak of Malware That Triggered Delays & Losses Caused by Software for New Tool

What Happened?

The Impact

Post Your Comment

41 Comments

View All Comments

Alexvrb - Friday, August 10, 2018 - link

dshess - Thursday, August 9, 2018 - link

Alexvrb - Thursday, August 9, 2018 - link

HollyDOL - Friday, August 10, 2018 - link

mapesdhs - Friday, August 10, 2018 - link

HollyDOL - Friday, August 10, 2018 - link

e_sandrs - Friday, August 10, 2018 - link

JBrickley - Friday, August 10, 2018 - link

Alexvrb - Friday, August 10, 2018 - link

Icehawk - Friday, August 10, 2018 - link

Log in

Don't have an account? Sign up now