Changing How Chips are Made: 3D Packaging with FOVEROS

Anyone who has followed any amount of chip design in semiconductors should well-aware that most of the CPUs and SoCs in production today are based on monolithic dies – single pieces of silicon the have all what is needed inside, before being placed on a package and into a system. While rarer, there are multi-chip packages, were multiple chips go in one package with a shared connection. Beyond this we have seen interposers or embedded bridges designed to bring different chips together with high-speed interconnects. Now Intel is ready to bring 3D stacking into the mass market.

One of the biggest challenges in modern chip design is minimizing die area. A small die decreases costs – typically power too – and can make it easier to implement in a system. When it comes down to extracting performance however, we are at the limits of scale – one of the downsides of big monolithic dies, or even multi-chip packages, is that memory is too far away. Intel today is now ready to talk about its Foveros technology, which involves active interposers in small form factors to bring differentiated technologies together.

Foveros: It’s Greek for Awesome, Apparently

The way Raja introduced this technology started with discussing process technologies. Intel has for many years/decades been focused on high performance process nodes, trying to extract as much as possible from its high-performance cores. Alongside this, Intel also runs an IO optimized process node on a similar cadence but more suitable for PCH or SoC-type functions.

The 126x and 127x are the internal numbering systems for Intel’s process node technologies, although they don’t differentiate between BKM updates for the "+" node variants it turns out. But the point here is that Intel already knows that it needs certain process optimizations in place depending on the type of transistors, performance, and power needed. Going forward, Intel is going to be expanding its node base out so it can cover more power and performance points.

So for this example, Raja pulled out the current set of process technologies for 2019. For a manufacturing process, compute has the 1274 process on 10nm, IO has the 1273 process (14nm), while this new special Foveros technology is under P1222. Alongside the manufacturing, Intel will be working on optimizations focusing on the compute aspects of the manufacturing node. There will also be a set of developments for future node technologies, and the final column shows that Intel has path-finding research to look into future technologies and determine what capabilities will be possible on future designs. This sounds pretty much what a company like Intel should be doing, so no argument from me so far. The goal here is that each type of transistor use case can be different, and there isn’t a one-size fits all approach.

One way to assist with this is through chiplets and packaging. By picking the best transistor for the job in each case, whether it is CPU, GPU, IO, FPGA, RF, or anything else, with the correct packaging, it can be put together to get the best optimizations available.

So here’s where Foveros fits in. Foveros is Intel’s new active interposer technology designed as a step above its own EMIB designs for small form-factor implementations, or those with extreme memory bandwidth requirements. For these designs the power per bit of data transferred is super low, however the packaging technology has to deal with the decreased bump pitch, the increased bump density, and also the chip stacking technology. Intel says that Foveros is ready for prime time, and they can produce it at scale.

So this Foveros ‘3D’ packaging just sounds like a silicon interposer, such as what we’ve seen on AMD’s Fiji or NVIDIA’s high-end datacenter GPUs. However Intel is going above what those products are doing by actually making the interposer part of the design. The interposer contains the through-silicon vias and traces required to bring power and data to the chips on top, but the interposer also carries the PCH or IO of the platform. It is, in effect, a fully working PCH, but with vias to allow chips to be connected on top.

The first iteration of this technology is less complicated that the slide above, just using a set of CPU cores attached to the PCH below, but the idea is that a large interposer can have select functions on it and those can be removed from the chips above to save space. This also lets Intel use the different transistor types in different chips – the example we were given uses an interposer built on the 22FFL process node, with a 10nm set of CPUs on the top die. Above this, DRAM is provided in a POP package. Sounds cool, right?

In actual fact, Intel had a Foveros chip or two working in the demo area. These, Intel explained, were hybrid x86 designs that combined a single big Core with four smaller Atom cores on the same 10nm piece of silicon. I’m sure I’ve heard about big.Little before, but I was shocked that Intel is actually going to do it! We managed to take a photo of the block diagram, which Intel removed from its slide deck before sending it out to the press after the presentations. All details on the next page.

Going Beyond Gen11: Announcing the XE Discrete Graphics Brand Intel’s First Fovoros and First Hybrid x86 CPU: Core plus Atom in 7 W on 10 nm
Comments Locked

148 Comments

View All Comments

  • Raqia - Thursday, December 13, 2018 - link

    Your point is taken and Keller did say it was in its infancy, but I am interested in whether what we're seeing here will be a competitive product or will remain an interesting science experiment. There are theoretical benefits of stacking high performance dies on low leakage ones like this but also substantial challenges and deficiencies which the current iteration doesn't show that it has overcome. What we might see in benefit in terms of better overall area, lower package level fab rejection rates, and better net power characteristic could be offset by a worse concentration of heat and hence more throttling when both elements are running or more expensive packaging. Perhaps in the end, a monolithic die is a better compromise despite losing out on some metrics for mobile.
  • nico_mach - Wednesday, December 12, 2018 - link

    So the GPU is going to be called ... Ten to the Eeth power? Is that right?

    I reject all these Xes used in unpredictable ways. The iPhones are pronounced exar and excess. This is ecksee, and I still use oh ess ecks on my emm bee eh at home.
  • Jon Tseng - Wednesday, December 12, 2018 - link

    >Intel actually says that the reason why this product came about is because a customer
    >asked for a product of about this performance but with a 2 mW standby power state.

    Huh wonder who the customer for that Core/Atom hybrid is. Seems a bit overpowered for a tablet. A bit underpowered for a MacBook (or for a car). Chromebooks maybe but most are too low volume to demand a custom part (maybe the education market is taking off?). PC OEMs don't normally take such custom parts for their laptops. But the graphics loadout implies some kind of PC-type application?

    Any ideas??
  • HStewart - Wednesday, December 12, 2018 - link

    From the diagram, it appears that hybrid cpu - has single Core CPU with 4 small (Atom) CPU's - such technology is done with Samsung Processors - this would mean it still lower power - but still have primary single thread core speed.

    Most interesting would be how the smaller core are used in scheduling system. Most like means and enhancement in OS for proper usage.
  • A5 - Wednesday, December 12, 2018 - link

    There aren't a ton of companies big enough to make Intel create a new product line just for them.

    The whole list is probably Apple/HP/Dell. Maybe Microsoft.
  • The_Assimilator - Wednesday, December 12, 2018 - link

    Microsoft Surface, obviously. It's become a very profitable line for MS but the current models are either too battery-hungry (Core CPUs) or too slow (Atom CPUs). Fovoros will give the best of both these worlds while also being x86... priced right, a Fovoros-based Surface will essentially end any argument for iPads in a business environment, especially considering most software remains firmly single-threaded. But it remains to be seen whether (a) Intel can get the power down even further (7W is still double most smartphones) and (b) whether their big.LITTLE implementation is good enough.
  • Raqia - Wednesday, December 12, 2018 - link

    Windows on ARM will do just fine now that Visual Studio emits ARM native code. Once Chrome gets ported (and that will be soon https://www.neowin.net/news/both-chromium-and-fire... the platform should address 95% of typical daily use cases and provide substantial compatibility with legacy software / file formats. This is better value than iPads and upcoming dedicated SoCs like the 8cx should offer better performance and battery / heat characteristics than what Intel has planned for next year in the same power envelope.
  • The_Assimilator - Thursday, December 13, 2018 - link

    I think you missed the part where Windows on ARM is horribly slow and therefore shitty. As a result, Microsoft has no plans to port anything useful (e.g. Office) to ARM, which means Windows on ARM is stuck being the lowest of the low-end. And that's not a space that Surface is intended to play in; Surface is an iPad competitor, and an iPad competitor can't be slow and shitty. Business devices can't be slow and shitty, and they absolutely need to be able to run Office.

    I expect that either Windows on ARM will be allowed to wither and die once Fovoros ships, or it will languish in a dead zone whereby only the cheapest of the cheap devices by no-name-brand OEMs (think $100 Lenovo tablets) use ARM chips and hence need it.

    So unless Qualcomm's 8cx is a game-changer in terms of performance, Fovoros should be the end of ARM on desktop, and thank fucking God for that.
  • Spunjji - Thursday, December 13, 2018 - link

    Microsoft already have an Office code base on ARM, so I'm not sure what you're talking about there.

    What would worry me about an Intel BIG.little style design is that if Windows doesn't assign your performance-critical application to the correct (big) core, performance will mostly suck just as hard as if all your cores were Atom.

    As such, I'd be cautious on calling a winner just yet.
  • gamerk2 - Thursday, December 13, 2018 - link

    Agreed with this; Microsoft has been let down by Intel not having a good mobile platform. If it were up to them, they wouldn't bother with ARM, but they have to due to battery/power/heat requirements.

Log in

Don't have an account? Sign up now