TSMC: Short Supply of HPC GPUs to Persist for 1.5 Yearsby Anton Shilov on September 7, 2023 1:30 PM EST
The reports about an insufficient supply of compute GPUs used for artificial intelligence (AI) and high-performance computing (HPC) servers became common in recent months as demand for GPUs to power generative AI applications exploded. TSMC admits that the biggest compute GPU supply bottleneck is its chip-on-wafer-on-substrate (CoWoS) packaging capacity, as it is used by virtually everyone in the AI and HPC business. The company is expanding CoWoS capacity but believes that its shortage will persist for 1.5 years.
"It is not the shortage of AI chips," said Mark Liu, the chairman of TSMC, in a conversation with Nikkei. "It is the shortage of our CoWoS capacity. […] Currently, we cannot fulfill 100% of our customers' needs, but we try to support about 80%. We think this is a temporary phenomenon. After our expansion of [advanced chip packaging capacity], it should be alleviated in one and a half years."
TSMC currently produces the vast majority of processors that power popular AI services, including compute GPUs (such as AMD's Instinct MI250 and NVIDIA's A100 and H100), FPGAs, and specialized ASICs from companies like d-Matrix and Tenstorrent as well as proprietary processors from cloud service providers, such as AWS's Trainium and Inferentia as well as Google's TPU.
It is noteworthy that compute GPUs, FPGAs, and accelerators from CSPs all use HBM memory to get the highest bandwidth possible and use TSMC's interposer-based chip-on-wafer-on-substrate packaging. While traditional outsourced semiconductor assembly and test (OSAT) companies like ASE and Amkor also offer similar packaging technologies, it looks like TSMC is getting the lion's share of the orders, which is why it can barely meet demand for its packaging services.
Industry analysts believe that OSATs are less motivated to offer advanced packaging services because it requires them to invest hefty amounts of capital and poses more financial risks than traditional packaging. For example, if something goes wrong with a mainstream processor that sits on an organic substrate, an OSAT loses only one chip, whereas if something goes wrong with a package carrying four chiplets and eight HBM memory stacks, the company loses hundreds if not thousands of dollars. Since OSATs do not get substantial margins making those chiplets, such risks slow down the expansion of advanced packaging capacity at OSATs, even though advanced packaging costs significantly more money than traditional packaging.
Just like its industry peers, TSMC is spending billions on upcoming advanced packaging facilities. For example, the company recently announced plans to spend nearly $2.9 billion on a packaging fab that is rumored to come online in 2027.
"We are increasing our capacity as quickly as possible," said C.C. Wei, chief executive of TSMC, at the company's earnings call earlier this year. "We expect these tightness somewhat be released in next year, probably towards the end of next year. […] I will not give you the exact number [in terms of processed wafers capacity], but CoWoS [capacity will be doubled in 2024 vs. 2023]."