The Future of Lakefield

Lakefield as a product is a lateral move for Intel. The company is taking some of its new and popular IP, and placing it into a novel form factor that has required a significant amount of R&D from a manufacturing and construction perspective. The goal of Lakefield was to meet particular customer requirements, which we understand to be around battery life, performance, and multi-screen support, and according to Intel, those goals have been met, and they will be producing future generations of Lakefield products.

In particular, Intel has produced this slide at a couple of conferences.

This slide essentially states that Lakefield product in the yellow box has two silicon die – one optimized for compute on Intel’s P1274 process (10+ nm) and the Foveros layer (the active interposer layer) on Intel’s 22FFL process.

The next product with heterogeneous manufacturing integration will be Intel’s big Xe-HPC product, Ponte Vecchio, which will use Intel’s P1276 process (7nm) as a compute die and Intel’s P1274 (10+) process as a base interposer layer.

Beyond this, Intel looks to continue with its multi-layered products by having the compute layer on the most advanced process node, with the interposer layer one generation behind, on a ‘Foveros’ optimized variant.

So the first generation Lakefield is essentially a product that combines P1274 and 22FFL, and a future product is likely to be built on P1276 on the compute layer and P1274 for the interposer layer. Keeping this sort of cadence makes a lot of sense. However, Intel is going to have to learn from Lakefield in a number of ways, especially as we look at ways in which the heterogeneous layering concept can expand. I’ve split this into several areas that I feel is critical to where layered processors can really make a difference.

Growing a Stacked Die to Higher TDP and Core Count

I’ve combined these two points because they essentially go together. Implementing two simple silicon die together in a small form factor product, while is interesting on the power side of the equation, doesn’t probe the question of scaling the product up. It’s easy enough to scale the product out by adding in some form of connectivity to the stack and then connecting them together (which is what’s happening in Ponte Vecchio), but at some point the stack has to move to a higher level of power consumption if it wants to move upwards in power.

This means that thermals become a bigger issue if it wasn’t already. If we take the current Lakefield design, with one compute die over an active interposer, with the right routing then moving to a physically larger floorplan and a higher power shouldn’t be too much of an issue – if anything, making the base die larger should help spread a lot of that IO about, making the interposer a functionally less active interposer. Or Intel will implement the next generation of its die-to-die stacking technology, where the top dies can be larger than the base dies, in a cantilevered fashion.

The bigger deal with the thermals is going to be on the top, with the stacked PoP memory. We go more into the memory communications aspect in a bit, but ideally that memory needs to be on the side so the compute die can have access to a proper heatspreader. The only reason it is stacked in Lakefield is because of the size constraints and attempting to get everything into that small form factor. For anything larger, there needs to be a memory controller that looks outside the chip, which is kind of what we’re expecting from Ponte Vecchio with HBM. A desktop-class product would likely be in the middle.

Growing a Stacked Die to More Stacks

The other angle for a stacked silicon product is to put more stacks in place. This again brings about the question on cooling between the stacks, depending on what is actually there. Lakefield is only two stacks right now, with one high-powered stack and one low-powered stack. Intel would have to prove that it could manage multiple high-powered stacks in order to expand compute in the vertical dimension, but that brings about its own problems.

To start, with Lakefield, the main power to the top compute die is provided with TSVs going through the active interposer layer. For each compute die in a multi-die stack, there would have to be TSVs for each one in order to provide individual power. Unless the active interposer also acted as a PMIC, this could become difficult depending on what other TSVs or data paths need to be put in place between the layers.

Note, when we spoke with Intel’s Ramune Nagisetty at IEDM last year, when asked if Intel would ever discuss if a stacked product would use ‘dummy’ layers to help in cooling, we were told that this would unlikely be mentioned, focusing only on the layers that actually do any work. But ultimately there could be cause for dummy layers to aid in cooling, such that they can provide mass and distance between thermal hotspots between compute dies involved. As the number of layers increases, however, something like Lakefield would have to move the PoP memory off the top, as already mentioned.

Memory Communications

One element to the Lakefield design we haven’t really covered here is how the memory communicates. In the current Lakefield design, the compute cores and the memory controllers are located on the compute die. In order for a portion of main memory to be read into the compute die, the communication has to travel down through the active interposer, go into the package, and then loop back up to the stacked memory.

In the following diagram, on the left, we have (1) going from Compute Die to DRAM, and (2) DRAM back to Compute Die.

This path is a lot longer than simply going from the compute die straight up into the memory, which would be theoretical on the right hand side if the two were bonded and had appropriate pathways.

If a future Lakefield product wants to continue down the memory-on-top route, one optimization could be to bond that top memory die in a Foveros-like fashion. One could argue that it means Intel would have to bond the memory on at the manufacturing stage, but this already happens with the current generation of Lakefield designs. The only downside would be getting the bonding pads on the top of the compute die and the bottom of the memory die to line up, and then manage the communications from there. The power for the memory would have to also come through on TSVs.

But if we’re bonding the memory into the stack, then technically it could go at any layer – there are likely benefits to keeping the compute die/dies on top. This could lead to multiple layers of memory as needed.

Power Management

With the current Lakefield design, both the compute die and the active interposer die have their own power management IC (PMICs) in order to help deliver power. Based on Intel’s own diagrams, these PMIC designs take up more physical PCB space than Lakefield itself.

At some level, Intel is going to have to figure out to create a unified PMIC solution to cover every layer on the product. It likely reduces board space and would make things a lot simpler, as it does with laptops that can manage power to the CPU and GPU on the same die with an onboard power controller. A PMIC that can scale with layer counts is obviously going to be a plus.

Cooling

Through all of this, as I’ve mentioned several times, cooling is going to be a major concern. There’s no easy way around the physics of dissipating 5-10 W in such a small space, or over 100 W if the product scales up into something in a form factor that has a wider appeal. Previously in the article I mentioned that we had discussed this with Intel, and how areas such as microfluidic channels have obviously had some research put into, but nothing to the point where it could be done commercially and at scale. It’s a paradigm worth solving, because the benefits would be tremendous.

Beyond Windows and Enabling 5G

One thing to note is that Intel's Lakefield is only planned with Windows 10 support right now. Linux is currently not in the plan for this product, but it would have to be if Intel wants wider adoption of the technology.

Not only this, but as most people are comparing these devices to Qualcomm's hardware, appropriate 5G support will need to be applied - the current generation Lakefield is not part of Intel and Mediatek's collaboration on 5G, which only applies to Tiger Lake and beyond. Lakefield customers will have to rely on 4G as an optional extra, or 5G through an external modem.

The Future Of Lakefield

Even if this first generation version of Lakefield gets slammed pretty hard in performance-focused benchmark reviews for being slower than a dual-core Whiskey Lake, Lakefield marks some very big steps for Intel. Hybrid CPU designs, and stacked die-to-die connectivity, are going to feature in Intel’s future roadmaps – at what points will depend on how much Intel is willing to experiment but also how well Intel can execute. There have been discussions on Intel perhaps looking at an 8+8 hybrid CPU design in the future, although nothing we can substantiate, but we do know that Ponte Vecchio with stacked die is coming in late 2021.

One of the key ingredients in all of this is going to be at what points Intel’s technology portfolio is going to intersect its product portfolio. Some of these technologies might find their way better suited to aspects such as 5G networking, or automotive, rather than something we can consume on the desktop. As far as Lakefield goes, this first generation is going to be a rough challenge for Intel – they are pitching a low performance product in a high-cost segment based on technology (and to a certain extent, battery life). Die-to-die stacking will get easier to do as scale ramps, and hopefully new process node technologies will drive the power efficiency of those big cores lower to enable 2+4 or bigger designs when in a stacked form factor.

We eagerly await a chance to test 1st Gen Lakefield, but we’re also keeping an eye on what might be in the second and third generations.

Performance Numbers: How To Interpret Them
Comments Locked

221 Comments

View All Comments

  • Alistair - Thursday, July 2, 2020 - link

    The only point of this is if it is dirt cheap. You don't pay a premium for 1 core and 4 atom cores. DOA. I'll take a 7nm 4 core Zen 2 laptop instead thanks.

    Make this a $50 CPU? Then I'm interested.
  • lmcd - Thursday, July 2, 2020 - link

    I don't think it'll be $50 but if it is, I hope to see it on single-board computers. That'd be slick.
  • serendip - Thursday, July 2, 2020 - link

    It's meant for $1000 computers, not cheap sub-$500 devices. I wouldn't pay that much money for 4 Atom cores and a big core that sits around like an unwanted appendage.
  • lmcd - Friday, July 3, 2020 - link

    It's meant for $1000 tablets and ultralights, not traditional computer form factors.

    Glad you won't buy any smartphone then!
  • yeeeeman - Friday, July 3, 2020 - link

    Qualcomm is fighting in the same space with 8cx for the same huge prices and that one doesn't even run x64 apps. This market wants very light laptops with very good battery life and LKF does just that, wherever you like it or not.
  • Spunjji - Monday, July 6, 2020 - link

    We'll see what the market actually wants when this launches. I have a strong suspicion that the market doesn't want the absolutely miserable performance/$ on offer here, even for the quoted battery life benefits, but I've been wrong before.
  • justing6 - Thursday, July 2, 2020 - link

    Amazing article! I learned so much about Lakefield and 3D stacking in general. The technology and engineering is really incredible, but as a consumer product it looks extremely lacking.

    I'm a proud owner and heavy user of a Surface Pro X, and the 8cx/SQ1 is generally "good enough" when running ARM workloads. Going to a 15W Intel chip that can turbo to 25-40W feels noticeably snappier, but when considering the SQ1 is 7W-15W it's really impressive. The 4+4 also allows for very good multitasking performance, it takes 10GB+ of heavy web browser tabs running on an external 3440x1440p display before it really feels like it starts to slow down.

    However, that's when I live inside Chromium Edge running native ARM64 code. Performance is still laughable compared to Apple Silicon, especially for translated x86 code. On Geekbench the A12z on the dev kits running translated x86 code is just as fast as the 8cx running native ARM code, while the 8cx's performance really suffers when it has to run anything more complicated than a text editor or video player written in x86. I expect Apple's successor to the A12z to mop the floor with this whole market at the same price points, even for x86 code. On top of this, Apple has unparalleled leverage over developers by controlling its entire hardware stack. I wouldn't be surprised if in 2 years, all major MacOS applications will be compiled for native ARM64 code. On the other hand, Windows on ARM64 is almost 2 years old now and has very few natively compiled apps.

    I really prefer Windows, but it's going to be a hard choice for me and a lot of consumers if a Lakefield/8cx ultraportable running Windows costs the same as an Apple ultraportable on Silicon that has somewhere around double the performance for the same price, while still keeping a thin and light design with great battery life. Intel and Qualcomm will be fighting for a distant second place.
  • jeremyshaw - Thursday, July 2, 2020 - link

    So you're saying a desktop with desktop TDP outperforms a fanless tablet with tablet TDP?

    I do agree Apple has a stronger push (and will have to, since they are moving their entire ecosystem over, and anyone that isn't fully onboard will simply be left behind).

    Qualcomm got too greedy, Nvidia doesn't want to fight in that market anymore (remember the original Surface tablets with Nvidia Tegra chips?), and nobody else is really eyeing the laptop/consumer segment outside of Apple.

    Oh, well. Some people are propping up the PINE64 as if it's worth anything.
  • lmcd - Thursday, July 2, 2020 - link

    No one's come up with an exciting killer app beyond video decoding for smartphones or tablets so might as well "prop up" the PINE64 :)

    Hopefully Broadcom will get interested in SoCs again with the work they're doing with the RPi foundation. ARM is going toward powerful CPU cores anyway so it shouldn't take an Apple-sized company to come up with competitive ARM designs.
  • justing6 - Thursday, July 2, 2020 - link

    Considering an iPad Pro (a fanless tablet) running an A12z puts up Geekbench 5 scores 30% to 60% higher that the SQ1/8cx single/multi core respectively when running native ARM code, it's safe to say its a generation or two ahead of anything Qualcomm has. I also doubt they changed much with the TDP of the chip in the Arm transition Dev kit, if anything they made it less powerful by disabling the 4 small cores and leaving it only with the 4 large cores to give them more time to work out the big.LITTLE scheduling in MacOS. A 30% hit to performance when running x86 code sounds about right, its just that the chip has so much more raw power than the 8cx that it will be able to give users a much better experience.

    I'm not an Apple fan by any means, but I am a fan of innovation. Apple has been pretty stagnant on that front the past decade, but with the move to ARM they have a chance to really get ahead of the market like the Apple of the 2000s.

Log in

Don't have an account? Sign up now