One of the highlights of Hot Chips 2019 was the presentation of the Cerebras Wafer Scale Engine - an AI processor chip that was as big as a wafer, containing 1.2 trillion transistors and set at over 46225 square millimetres of silicon. This was enabled through breakthrough techniques in cross-reticle patterning, but with the level of redundancy built into the design, ensured a yield of 100%, every time. The first WSE system, the CS-1, was put out on display at Supercomputing 2019, where we got a chance to bite into the design with Andrew Feldman, the founder and CEO of Cerebras.

Unfortunately I never got around to writing up my discussions with Andrew, however what we did learn at the time is that the CS1 is a fully integrated 15U chassis that requires 20 kW of power to push to the chip through 12x 4 kW power supplies (some redundancy built-in). The chip is mounted vertically for the sake of ease of access, which is quite bizarre in the modern world of computing. Most of the chassis was custom built for the CS-1, including the tooling and a fair amount of commercial 3D printing. Andrew also said at the time that while there was no minimum order quantity for the CS-1, however each one would cost ‘a few million’.

Today’s announcement from the Pittsburgh Supercomputing Center (PSC) helps round that number down to perhaps ~$2 million. Through a $5 million grant from the National Science Foundation (NSF) to the PSC, a new AI supercomputer will be built, called Neocortex. At the heart of Neocortex will be hardware built in partnership with Cerebras and Hewlett Packard Enterprise.

Specifically, there will be two CS-1 machines at the heart of Neocortex. The CS-1 supports asynchronous models through TensorFlow and pyTorch, with the software platform able to optimize the size of the workloads for the available area on the CS-1 Wafer Scale Engine. 


Each front panel half is machined from a single piece of aluminium

The pair of CS-1 machines will be coupled with an ‘extreme’ shared-memory HPE Superdome Flex server, which contains 32 Xeon CPUs, 24 TB of DDR4, 205 TB of storage, and 1.2 Tbps of network interfacing. Neocortex is expected to be used to enable AI researchers to train their models, covering areas such as healthcare, disease, power generation, transportation, as well as pressing issues of the day.

The machine will be installed in late 2020. PSC has stated that access to Neocortex will be available to researchers in the US at no cost.

When we spoke to Cerebras last year, the company stated that they already had orders in the ‘strong double digits’. When pressed, I managed to get that from ’12 to several dozen’. A number of machines were ordered for the Argonne National Laboratories at the time, and I suspect others are now investing.

Interestingly enough, at Hot Chips 2020 this year, the company is set to disclose its second generation Wafer Scale Engine. At a guess, I would suggest that this is slightly further away to commercialization than WSE1 was when it was announced, but the company seems to have had substantial interest in their technology.

Related Reading

Comments Locked

11 Comments

View All Comments

  • Duncan Macdonald - Tuesday, June 9, 2020 - link

    Even with the high level of redundancy in the design, I am surprised that they get close to 100% yield, I would have expected a fatal error (eg a mask misalignment making an individual chip unusable) in a high proportion of wafers. From the earlier description on Anandtech, there is redundancy inside each chip but not between chips (ie a dead chip cannot be bypassed).
    There might have been a miscommunication is this article - what might have been meant is that each wafer supplied to customers is 100% functional not that every wafer produced in the Fab is functional.
  • jbrukardt - Tuesday, June 9, 2020 - link

    M1 is pretty damn big in a patterning environment, and thats where youd see discontinuity. Its not so hard to not mess up patterning across a whole wafer at that scale.

    Now down in the transistor layers where feature size is tiny? yeah, basically impossible
  • brucethemoose - Tuesday, June 9, 2020 - link

    IIRC there are redundant chips on each row, so they can indeed bypass dead chips with the interconnect. I assume some healthy chips are sacrificed to get a rectangular array.

    Also, its TSMC 16nm. Interconnect black magic aside, yields are probably pretty good by now.
  • yeeeeman - Wednesday, June 10, 2020 - link

    100% yield in this case I think it means that every single wafer scale chip is functional. The percentage is probably variable, but they must have set the specs for this while taking into account the losses incured by defects on the wafer.
  • Spunjji - Monday, June 15, 2020 - link

    That's the impression I got, too. Anything less would imply a painfully high cost for production.
  • ritawgoodman12 - Thursday, June 18, 2020 - link

    Make 6150 bucks every month… Start doing online computer-based work through our website. I have been working from home for 4 years now and I love it. I don’t have a boss standing over my shoulder and I make my own hours. The tips below are very informative and anyone currently working from home or planning to in the future could use this website. W­­W­W.iⅭ­a­s­h­68­.Ⅽ­O­Ⅿ
  • PeachNCream - Wednesday, June 10, 2020 - link

    Ian hungers for wafer!
  • Oxford Guy - Wednesday, June 10, 2020 - link

    How many wafers would it take to produce a perfect one, with not a single defect?

    Someone here said it's impossible but I don't agree. It has to be possible, however improbable.
  • FreckledTrout - Thursday, June 11, 2020 - link

    LOL Ian you have to take the CPU out of its wrapper before you eat it.
  • hafizmajid - Sunday, June 14, 2020 - link

    No wonder we have a chip shortage, it Ian eating them all

Log in

Don't have an account? Sign up now