SoC Tile, Part 1: Low-Power Island E-Cores, Designed for Ultimate Efficiency

Diving a little deeper into the SoC tile within Intel's Meteor Lake architecture, Intel hasn't just opted for a minor change but has made a significant leap forward, especially regarding I/O fabric scalability. The SoC tile itself isn't built on Intel 4 like the compute tile but is made by TSMC on their N6 node. Intel has ditched the old limitations of mesh routing by implementing a Network-On-Chip (NOC) on the silicon. This isn't just about making data lanes faster; it's about outlining smarter and more power-efficient access to memory. Likely an innovation from Intel's acquisition of NetSpeed back in 2018, which specialized in NoC and Fabric IP for SoCs, Intel opting for a physical NOC allows Intel to reduce the limits on bandwidth. Using EMIB and the nature of 2D scaling, the data paths are a lot shorter, translating into less power loss, but shorter wires also help reduce overall latency penalties.

Switching gears to low-power workload efficiency, Meteor Lake incorporates E-cores directly into the SoC tile, which Intel calls Low Power Island (LP) E-cores. Think of it as Intel's way of saying, "Why use a sledgehammer when a scalpel will do?". This means that the LP E-Cores are driven purely from a power-saving perspective. Having these LP E-cores available for workloads with the aid of Intel's Thread Director means lighter threads and background tasks that don't require the grunt of the P and E-cores on the compute tile can be directed onto the lower-powered LP E-cores.

While both the E and LP E-cores are based on the same Crestmont microarchitecture, the E-cores on the compute tile are built on Intel 4, along with the P-cores. The LP E-cores are made on TSMC's N6 node, like the rest of the SoC tile. These low-power island E-cores are tuned for finer-grained voltage control through an integrated Digital Linear Voltage Regulator (DLVR), and they also have a lower voltage-to-frequency (V/F) curve than the big E-cores on the compute tile, meaning they can operate with a lower power cost, thus saving power when transitioning low-intensity workloads off of the compute tile and onto the LP E-cores.

Part of the disaggregated nature of Foveros combined with individual power management controllers (PMC) within each tile means that IP blocks can be independently powered on or off when required.

SoC Tile: Bandwidth Scalability, Can't Stop The NOC

So adding a variety of tiles requires a highly competent pathing to ensure bandwidth is adequately structured. As I/O bandwidth bottlenecking was a major concern in previous iterations, Intel aims to solve bandwidth flow restrictions with a couple of solutions.

The first is through the scalability of the I/O bandwidth, which Intel does by adding what it calls 'Scalable Fabric,' which is configured for up to 128 GB/s of bandwidth throughput. All of the I/O ordering and address translation is fed through the IOC, while Intel has implanted a Network-on-Chip (NOC) to interconnect many of the different areas of the SoC.

The Network-on-Chip or NOC is designed to be coherent, and for Meteor Lake on Intel 4, this uses unordered processing, which moves data in an unordered fashion. Connecting all the tiles together through the NOC and independently through the IOC gives plenty of bandwidth headroom for devices or agents requiring it. The NOC is directly connected to the compute and graphics tile, while other elements, including the traffic fed directly through the LP E-cores on the SoC tile, media, display, the NPU, and the imaging processing unit (IPU), all going through the NOC. In terms of the connection to the I/O tile, this is connected directly to the IO fabric and then is fed through the IOC, which then goes directly to the NOC.

The SoC tile also integrates Wi-Fi 6E and can be made to support the latest Wi-Fi 7 standard. Having a future-proof method of including Wi-Fi 7 and Bluetooth 5.4 can add the next level of wireless connectivity to Intel's mobile platform. Wi-Fi 7 offers 320 MHz of bandwidth, doubling the channel width compared to its predecessor, Wi-Fi 6. It also uses 4096-QAM (4K QAM) to enable transmission speeds capable of hitting 5.8 Gbps.

We're still waiting for clarification on what this actually means. Whether it's supportive of Wi-Fi 7 or if there's some underlying compatibility within the Wi-Fi MAC integrated into the silicon. One option could be that Intel is adding a full external controller into the silicon to get to Wi-Fi 7 instead of CNVio splitting up parts of the radio stack. We have asked Intel for more details and will update you when we have a response.

That being said, Intel discloses 'support' for Wi-Fi 7 and BT 5.4, but there's a chance Intel could differentiate which wireless MAC is implemented into different chip segments. An example would be an Intel 9 Ultra SKU, bolstered by Wi-Fi 7 support, whereas a lower-end SKU like a Core 3 might utilize Wi-Fi 6E to save on cost.

Additional features include Multi-Resource Unit (RU) Puncturing and Military-grade security with WPA3 that supports GCMP-256 encryption to ensure both speed and security when connected to a wireless network. Unique features like Multi-Link Operation (MLO) in Wi-Fi 7 are designed to reduce latency and jitter by up to 60%, making it a competent solution for various user's connectivity needs. Adding Bluetooth 5.4 further complements this by improving audio quality, and it is claimed to offer up to 50% lower power consumption for longer battery life.

Also present on the SoC tile is the display controller and the media engine from the GPU. These are always-on (or at least, mostly-on) blocks that do not need to be built on a cutting-edge process node, making them good candidates to place on the SoC tile. Meteor Lake offers support for 8K HDR and AV1 video playback and contains native HDMI 2.1 and DisplayPort 2.1 connectivity.

Finally, the SoC tile also includes other key platform components, such as PCIe lanes, which are integral for connectivity to external devices such as discrete graphics cards and the platform's I/O capabilities, such as USB 4 and 3, as well as offering a direct interconnect to a separate I/O tile with Thunderbolt 4 and additional PCIe lanes. While we've touched on wireless connectivity, the SoC tiler also includes Ethernet support, although Intel hasn't disclosed yet which PHY will be included; it is likely to be capable of 2.5 GbE at the minimum.

A Note on Meteor Lake's Security Features: New Silicon Security Engine (ISSE)

Security has also been given closer attention in Meteor Lake. The architecture introduces the Intel Silicon Security Engine (ISSE), a dedicated component focused solely on securing things at a silicon level. Various vulnerabilities have been at the forefront of media over the last few years, including Meltdown, Spectre, and Foreshadow.

With real threats around the world, securing data is ever prevalent, and CPU architects and designers not only need to consider performance and efficiency, but security and doing some architecturally is just as important as a competent software stack. The Converged Security and Manageability Engine (CSME) has also been partitioned to further enhance platform security. These features collectively work to give a wide range of on-chip and off-chip securities designed to mitigate attacks on multiple fronts.

Compute Tile: New P and E-Cores on Intel 4 SoC Tile, Part 2: Neural Processing Unit (NPU) Adds AI Inferencing on Chip
Comments Locked

107 Comments

View All Comments

  • FWhitTrampoline - Wednesday, September 20, 2023 - link

    I'm more focused the on eGPU usage for OCuLink so I'm not stating that TB4/USB4 connectivity does not have its usage model for your use case. But pure PCIe is lowest latency for eGPU usage and can be easily adopted by more OEMs than just GPD for their handhelds as that OCuLink will work with any makers' GPUs as long as one is using an OCuLink capable eGPU adapter or enclosure.

    And ETA Prime has extensively tested OCuLink adapters with plenty of Mini Desktop PCs and even the Steam Deck(M.2 slot is only PCIe 3.0 capable). It's the 64Gbs on any PCIe 4.0/x4 connection(M.2/NVMe or other) that's what good for eGPUs via OCuLink relative to the current bandwidth of TB4/USB4 40Gbs.
  • Exotica - Wednesday, September 20, 2023 - link

    I’ve seen those videos and the performance advantages for EGPUs. But most of the EGPUs in the market use alpine ridge. A chipset known to reserve bandwidth for DP and have less available for PCIe (22 Gbps). Perhaps there may be one or two based on Titan ridge with slightly more pcie bandwidth. It’s hard to say how barlow ridge will perform in terms of the amount of pcie bandwidth made available to peripherals. But a 64 Gbps pcie connection will not saturate the 80 Gbps link so hopefully we can have most of the available 64 Gbps pcie bandwidth. Another problem with occulink is that there’s no power delivery so you need to have a separate wire for power.

    So Barlow ridge TB5 has the potential to be a one cable solution, power upto 240W, pcie up to 64 Gbps, and it will also tunnel DisplayPort. Occulink is cool. But thunderbolt tunnels more capabilities over the wire.
  • FWhitTrampoline - Wednesday, September 20, 2023 - link

    OCuLink is lower latency as was stated in the earlier posts! And TB4/TB# or USB4/USB# will not be able to beat Pure PCIe connectivity for low latency and latency is the bigger factor for gaming workloads. TB tunneling protocol encapsulation of PCIe/Any other Protocol will add latency the result of having to do the extra encoding/encapsulation and decoding/de-encapsulation steps there and back whereas OCuLink is just unadulterated PCIe passed over an external cable.

    More Device makers need to be adding OCuLink capability to their systems as that's simple to do and requires no TB#/USB4-V# controller chip to be hung off of MB PCI lanes as the OCuLink port is just passing PCIe signals outside of the device. And TB5/USB4-V2 is more than 64Gbs but that will require more PCIe lanes be attached to the respective TB5/USB4-V2 controller and use more overheard to do that whereas if one has the same numbers of PCIe lanes connected via OCuLink then that's always going to be lower overhead with more available/usable bandwidth and lower latency for OCULink.

    Most likely the PCIe lane counts will remain at 4 lanes Max and that will just go from PCIe 4.0 to PCIe 5.0 instead to support TB5 and USB4-V2 bandwidth but whatever PCIe standard utilized OCuLink will always have lower overhead and lower latency than TB/Whatever or USB4/Whatever as with OCuLink that's skipping the extra tunneling protocol steps required.

    Plus by extension and with any OCuLink Ports being pure PCIe Protocol Based, that opens up the possibility of OCuLink to TB/USB/Whatever Adapters being utilized for maximum flexibility for other use cases as well.
  • Exotica - Wednesday, September 20, 2023 - link

    OCulink has merit for sure, but again, it is clunky. Unlike thunderbolt, it doesn't tunnel displayport or provide power delivery. It also doesn't support hotplugging. That is why it will most likely remain a niche offering. Also you're saying OcCulink is lower latency, but by how much? Where is the test data to prove that ?

    And does it really matter? Operating systems can be run directly off of thunderbolt NVME storage, the latency is low enough for a smooth experience. And even if OcCulink is technically faster, a GPU such as a 4080 or 4090 or 7900XTX in a PCIe4x4 or even PCIe5x4 eGPU thunderbolt 5 enclosure will be much faster than the iGPU or even internal graphics. And if the eGPU enclosure is thunderbolt enabled, it can power the laptop or host device and probably act as a dock and provide additional downstream thunderbolt ports and possibly USB as well. Thunderbolt provides flexibility that OcCulink does not. Both standards have merit.

    But I have a feeling Thunderbolt 5, if implemented properly in terms of bug-free firmware NVMs from Intel, will gain mass market appeal. The mass market is hungry for the additional bandwidth. AsMedia will probably do extremely well as well with its USB4 and upcoming USB4v2 offerings.
  • TheinsanegamerN - Thursday, September 21, 2023 - link

    Dont waste your time, Trampoline is an OCUlink shill who will ignore any criticism for his beloved zuckertech. The idea that most people dont want to disassemble a laptop to use a dock is totally alien to him.
  • FWhitTrampoline - Thursday, September 21, 2023 - link

    LOL, OCuLink's creator PCI-SIG is a not for profit Standards Organization that's responsible for the PCIe standards so it's not like they are any Business Interest with a Fiduciary responsibility to any investors.

    OCuLink is just a Port/electrical PCIe extension cabling standard that was in fact originally intended to be used in consumer products but Intel, a member of PCI-SIG along with other industry members, had a vested interest in that Intel/Apple co-developed Thunderbolt IP, because of TB controllers and sales of TB controllers related interests.

    And TB4/Later and USB4/Later will never have as low latency owing to the fact that any PCIe signalling will have to be intercepted and encapsulated by the TB/USB/Whatever protocol controller in order to be sent down the TB cabling whereas over the OCuLink ports/cabling that's just the PCIe signalling/packets there and no extra delays there related to any extra tunneling protocol encoding/encapsulation and decoding/de-encapsulating steps required.

    So OCuLink represents the maximum flexibility as that's the better lowest latency solution for eGPUs being just pure unadulterated PCIe signaling. And because it's just PCIe that opens up the possibility of all sorts of external adapters that take in PCIe and can convert that to Display Port/HDMI/USB/TB/Whatever the end users need because all Motherboard external I/O, for the most part, is in the from of PCIe and OCulink just brings that PCIe directly out of devices via Ports/External cables.

    And to be so dogmitacilly opposed to OCulink is the same as being opposed to PCIe! And does any rational person think that that's logical! OCuLink is External PCie and that's all there is to that and it's the lowest latency method to interface with GPUs via any PCIe Slot or externally via an OCuLink connection(PCIe is PCIe).

    Give me a Laptop with at least One OCuLink PCIe X4/4.0 port and with that I can interface to an eGPU at 64Gbs bandwidth/lowest latency possible! And there can and will be adapters that can be plugged into that One OCulink port that can do what any other ports on the laptop can do because those ports are all just connected to some MB PCIe lanes in the first place.
  • Kevin G - Wednesday, September 20, 2023 - link

    The main advantage of the TB4 is that the form factor is USB-C which can be configured for various other IO. This is highly desirable in a portable form factor like laptops or tablets. Performance is 'good enough' for external GPU usage. OCuLink maybe faster but doesn't have the flexibility like TB4 over the USB-C connector does. OCuLink has its niche but a mainstream consumer IO solution is not one of them.
  • FWhitTrampoline - Thursday, September 21, 2023 - link

    OCuLink is just externally routed PCIe lanes and really there can be one OCuLink port on every laptop specifically for the best and lowest latency eGPU interfacing and even OCuLink to HDMI/Display Port/whatever adapters that can make the OCuLink port into any other port at the end users discretion. So for eGPUs/Enclosures that have OCuLink ports that's 64Gbs/Lowest latency there and for any Legacy TB4/USB only external eGPU devices just get an OCuLink to TB4/USB4 adapter in the interim and live with the lower bandwidth and higher latency.

    GPD already has a line of Handheld Gaming devices that utilize a dedicated OCuLink port and a portable eGPU that supports both OCUlink interfacing and TB4/USB4 interfacing. And I do hope that GPD Branches out into the regular laptop market as GPD's external portable eGPU works with other makers products and even products that have M.2/NVMe capable slots available via an M.2/NVMe to OCuLink adapter! LOL, only Vested Interests would Object to OCuLink in the consumer market space, specifically those Vested Interests with Business Models that do not like any competition.
  • TheinsanegamerN - Thursday, September 21, 2023 - link

    Because most people dont want to disassemble their laptop to plug in a m.2 adapter, you knucklehead.
  • FWhitTrampoline - Thursday, September 21, 2023 - link

    No one is forcing you to do that and for others that's an option, albeit and inconvenient one. But really the adapters are not meant for Laptops in the first place and even for Mini Desktop PCs is not an easy task there but still more manageable that doing that with a laptop. It would just better if there was more Mini Desktop PC OEMs/Laptops OEMs where those OEMs would adopt an OCuLink PCie 4.0/x4 Port for eGPU usage like GPD has done with their line of handheld gaming devices. And with mass adoption of OCuLink there could also be adapters as well to support all the other standards as OCuLink being PCIe based by extension will support that as well.

Log in

Don't have an account? Sign up now