The Future of Lakefield

Lakefield as a product is a lateral move for Intel. The company is taking some of its new and popular IP, and placing it into a novel form factor that has required a significant amount of R&D from a manufacturing and construction perspective. The goal of Lakefield was to meet particular customer requirements, which we understand to be around battery life, performance, and multi-screen support, and according to Intel, those goals have been met, and they will be producing future generations of Lakefield products.

In particular, Intel has produced this slide at a couple of conferences.

This slide essentially states that Lakefield product in the yellow box has two silicon die – one optimized for compute on Intel’s P1274 process (10+ nm) and the Foveros layer (the active interposer layer) on Intel’s 22FFL process.

The next product with heterogeneous manufacturing integration will be Intel’s big Xe-HPC product, Ponte Vecchio, which will use Intel’s P1276 process (7nm) as a compute die and Intel’s P1274 (10+) process as a base interposer layer.

Beyond this, Intel looks to continue with its multi-layered products by having the compute layer on the most advanced process node, with the interposer layer one generation behind, on a ‘Foveros’ optimized variant.

So the first generation Lakefield is essentially a product that combines P1274 and 22FFL, and a future product is likely to be built on P1276 on the compute layer and P1274 for the interposer layer. Keeping this sort of cadence makes a lot of sense. However, Intel is going to have to learn from Lakefield in a number of ways, especially as we look at ways in which the heterogeneous layering concept can expand. I’ve split this into several areas that I feel is critical to where layered processors can really make a difference.

Growing a Stacked Die to Higher TDP and Core Count

I’ve combined these two points because they essentially go together. Implementing two simple silicon die together in a small form factor product, while is interesting on the power side of the equation, doesn’t probe the question of scaling the product up. It’s easy enough to scale the product out by adding in some form of connectivity to the stack and then connecting them together (which is what’s happening in Ponte Vecchio), but at some point the stack has to move to a higher level of power consumption if it wants to move upwards in power.

This means that thermals become a bigger issue if it wasn’t already. If we take the current Lakefield design, with one compute die over an active interposer, with the right routing then moving to a physically larger floorplan and a higher power shouldn’t be too much of an issue – if anything, making the base die larger should help spread a lot of that IO about, making the interposer a functionally less active interposer. Or Intel will implement the next generation of its die-to-die stacking technology, where the top dies can be larger than the base dies, in a cantilevered fashion.

The bigger deal with the thermals is going to be on the top, with the stacked PoP memory. We go more into the memory communications aspect in a bit, but ideally that memory needs to be on the side so the compute die can have access to a proper heatspreader. The only reason it is stacked in Lakefield is because of the size constraints and attempting to get everything into that small form factor. For anything larger, there needs to be a memory controller that looks outside the chip, which is kind of what we’re expecting from Ponte Vecchio with HBM. A desktop-class product would likely be in the middle.

Growing a Stacked Die to More Stacks

The other angle for a stacked silicon product is to put more stacks in place. This again brings about the question on cooling between the stacks, depending on what is actually there. Lakefield is only two stacks right now, with one high-powered stack and one low-powered stack. Intel would have to prove that it could manage multiple high-powered stacks in order to expand compute in the vertical dimension, but that brings about its own problems.

To start, with Lakefield, the main power to the top compute die is provided with TSVs going through the active interposer layer. For each compute die in a multi-die stack, there would have to be TSVs for each one in order to provide individual power. Unless the active interposer also acted as a PMIC, this could become difficult depending on what other TSVs or data paths need to be put in place between the layers.

Note, when we spoke with Intel’s Ramune Nagisetty at IEDM last year, when asked if Intel would ever discuss if a stacked product would use ‘dummy’ layers to help in cooling, we were told that this would unlikely be mentioned, focusing only on the layers that actually do any work. But ultimately there could be cause for dummy layers to aid in cooling, such that they can provide mass and distance between thermal hotspots between compute dies involved. As the number of layers increases, however, something like Lakefield would have to move the PoP memory off the top, as already mentioned.

Memory Communications

One element to the Lakefield design we haven’t really covered here is how the memory communicates. In the current Lakefield design, the compute cores and the memory controllers are located on the compute die. In order for a portion of main memory to be read into the compute die, the communication has to travel down through the active interposer, go into the package, and then loop back up to the stacked memory.

In the following diagram, on the left, we have (1) going from Compute Die to DRAM, and (2) DRAM back to Compute Die.

This path is a lot longer than simply going from the compute die straight up into the memory, which would be theoretical on the right hand side if the two were bonded and had appropriate pathways.

If a future Lakefield product wants to continue down the memory-on-top route, one optimization could be to bond that top memory die in a Foveros-like fashion. One could argue that it means Intel would have to bond the memory on at the manufacturing stage, but this already happens with the current generation of Lakefield designs. The only downside would be getting the bonding pads on the top of the compute die and the bottom of the memory die to line up, and then manage the communications from there. The power for the memory would have to also come through on TSVs.

But if we’re bonding the memory into the stack, then technically it could go at any layer – there are likely benefits to keeping the compute die/dies on top. This could lead to multiple layers of memory as needed.

Power Management

With the current Lakefield design, both the compute die and the active interposer die have their own power management IC (PMICs) in order to help deliver power. Based on Intel’s own diagrams, these PMIC designs take up more physical PCB space than Lakefield itself.

At some level, Intel is going to have to figure out to create a unified PMIC solution to cover every layer on the product. It likely reduces board space and would make things a lot simpler, as it does with laptops that can manage power to the CPU and GPU on the same die with an onboard power controller. A PMIC that can scale with layer counts is obviously going to be a plus.

Cooling

Through all of this, as I’ve mentioned several times, cooling is going to be a major concern. There’s no easy way around the physics of dissipating 5-10 W in such a small space, or over 100 W if the product scales up into something in a form factor that has a wider appeal. Previously in the article I mentioned that we had discussed this with Intel, and how areas such as microfluidic channels have obviously had some research put into, but nothing to the point where it could be done commercially and at scale. It’s a paradigm worth solving, because the benefits would be tremendous.

Beyond Windows and Enabling 5G

One thing to note is that Intel's Lakefield is only planned with Windows 10 support right now. Linux is currently not in the plan for this product, but it would have to be if Intel wants wider adoption of the technology.

Not only this, but as most people are comparing these devices to Qualcomm's hardware, appropriate 5G support will need to be applied - the current generation Lakefield is not part of Intel and Mediatek's collaboration on 5G, which only applies to Tiger Lake and beyond. Lakefield customers will have to rely on 4G as an optional extra, or 5G through an external modem.

The Future Of Lakefield

Even if this first generation version of Lakefield gets slammed pretty hard in performance-focused benchmark reviews for being slower than a dual-core Whiskey Lake, Lakefield marks some very big steps for Intel. Hybrid CPU designs, and stacked die-to-die connectivity, are going to feature in Intel’s future roadmaps – at what points will depend on how much Intel is willing to experiment but also how well Intel can execute. There have been discussions on Intel perhaps looking at an 8+8 hybrid CPU design in the future, although nothing we can substantiate, but we do know that Ponte Vecchio with stacked die is coming in late 2021.

One of the key ingredients in all of this is going to be at what points Intel’s technology portfolio is going to intersect its product portfolio. Some of these technologies might find their way better suited to aspects such as 5G networking, or automotive, rather than something we can consume on the desktop. As far as Lakefield goes, this first generation is going to be a rough challenge for Intel – they are pitching a low performance product in a high-cost segment based on technology (and to a certain extent, battery life). Die-to-die stacking will get easier to do as scale ramps, and hopefully new process node technologies will drive the power efficiency of those big cores lower to enable 2+4 or bigger designs when in a stacked form factor.

We eagerly await a chance to test 1st Gen Lakefield, but we’re also keeping an eye on what might be in the second and third generations.

Performance Numbers: How To Interpret Them
Comments Locked

221 Comments

View All Comments

  • Quantumz0d - Sunday, July 5, 2020 - link

    PC gaming marketcap is supposed to be at $40Bn by 2022, total gaming market is $120Bn including everything, and Consoles are built on AMD x86 technology and now DX12U and you think that is a niche ?

    ARM is not going to do anything just because Apple did, there are so many trials by so many companies and the best company which is known for it's ROI with R&D, Qualcomm abandoned all of it's Server ARM marketshare dreams with the death of their full custom Centriq. x86 runs blazingly fast and optimized with Linux which is what the world is powered just because ARM is good in thin and light garbage doesn't make it a superstar.

    ARM is not going to get into Desktop at all, no one is going to write their programs again to suppor that HW, and no company is going to invest in DIY market before Server/DC market. Supercomputer market is not the DIY or Enterprise, look at the Top Supercomputers, Chinese Tianhe and 2 positions are with Chinese only, AMD CRAY Zen based IF supercomputer is about to come as well.
  • Wilco1 - Sunday, July 5, 2020 - link

    The #1 supercomputer is Arm, and Arm servers beat x86 servers on performance, cost and power, so not a single "fact" in your post is correct.
  • lmcd - Sunday, July 5, 2020 - link

    That first statement is hilariously disconnected from the second. Fugaku at 3x the cost per flop of its next competitor hardly backs up your assertion.

    ARM servers might beat x86 servers on performance, cost, and power but it's not looking that good vs x86_64. The latter arch is commodity hardware, software, and talent hiring.
  • Wilco1 - Monday, July 6, 2020 - link

    Just looking at the peak FLOPS in comparisons is deceiving. Fugaku is a very different design as it does not use GPU accelerators like most supercomputers. That means it is far better than the rest in terms of ease of programming and efficiency. So even if the upfront cost is higher, they expect to get far more out of it than other super computers.

    I'd say Arm servers are doing really well in 2020, clearly companies want a change from the x86 duopoly. Much of the talent is at companies that do Arm designs. How else do you think Arm CPUs are getting 20-30% faster per year, and mobile phones already outperform the fastest x86 desktops?
  • Quantumz0d - Tuesday, July 7, 2020 - link

    No company wants to develop an in house IP, that R&D and ROI is not easy, Amazon did it because to chop off some costs and set up a plan for the low end AWS instances with Graviton 2, Altera is still yet to show, Centriq abandoned by Qcomm with so much of marketing done around Cloudflare and top class engineering work, the team which made 820's full custom core.

    AND What the fuck you are babbling on fastest x86 desktops (Like Threadripper 3990X, or 3950X, 10900K) outperformed by mobile phones ? Ooof, you are gulping down the AT's SPEC scores aren't you ?

    ARM servers LMAO, like how AMD upped their DC marketshare with EPYC7742, dude stop posting absolute rubbish. ARM marketshare in data centers is in 0.5% area where IBM also resides.
  • Quantumz0d - Monday, July 6, 2020 - link

    Tiahu is fucking Chinese Sunway Processor based Supercomputer and it's top #3 so what did they do ? jack off to Zen with Hygon or did they make all Chinese use Chinese made processors ? Stop that bullshit of Supercomputer nonsense, IBM has been there since ages and they had SMT8 with Power9 uarch which came in 2017 (Summit which is #2, it was first since 2018) what did they do ? x86 is consumer based and DC market is relying only on that. ARM DC market-share is less than fucking 2%, AMD is at 4.5%, Intel is at 95% that is 2019 Q4.

    I don't know why people hate x86 as if it's like their life is being threatened by them, the fact that x86 machines are able to run vast diverse rich software selection and more freedom based computing, people want ARM based proprietary dogshit more, Apple series trash wich their APIs or the Bootloader locked (much worse like chastity) or Unlocked Android phones, even with GNU GPL v2 and Qcomm's top OSS CAF the godddamned phones do not get latest updates or anything but a Core2Quad from decade ago can run a fucking Linux or Win7 / Win10 without any bullshit issue.

    Wait for the SPEC A series iPhone 12 benchmarks and then you be more proud of that garbage device which cannot compute anything outside what Apple deems it.
  • Wilco1 - Friday, July 3, 2020 - link

    It would be good to run benchmarks on the 2 variants of Galaxy Book S. One comparison I found:

    https://browser.geekbench.com/v5/cpu/compare/25848...

    So Lakefield wins by only 21% on single-threaded (that's a bad result given it is Cortex-A76 vs IceLake at similar clocks), and is totally outclassed on multithreaded...
  • lmcd - Sunday, July 5, 2020 - link

    Current scheduler doesn't even guarantee that's the Sunny Cove core.
  • Wilco1 - Monday, July 6, 2020 - link

    Given Tremont can't get anywhere near Cortex-A76 performance, we can be sure single-threaded result is the Sunny Cove core.
  • PaulHoule - Friday, July 3, 2020 - link

    This is an example of the "Innovator's Dilemma" scenario where it is harder to move upmarket (in terms of performance) than downmarket.

    Put a phone processor into a box with a fan and people will be blown away by how fast it is -- they've never seen an ARM processor cooled by a fan before.

    Put a desktop processor into a thin tablet with little thermal headroom and people will be blown away by how slow it is.

    So first it is a situation that Intel can't win, but second it is a disaster that this low performance (downmarket) chip is expensive to produce and has to be sold upmarket. Sure you can stick any number of dies together and "scale up" a package in a way that looks as if you scaled up the chip by reducing the feature size, but when you reduce the feature size the cost per feature goes down in the long term -- when you stick a bunch of cheap chips together you get an expensive chip.

Log in

Don't have an account? Sign up now