Thunderbolt 3: Now on the CPU*

One of the big failures of the Thunderbolt technology since its inception has been its adoption beyond that Apple ecosystem. In order to use it, both the host and the device needed TB controllers supplied by Intel. It wasn’t until Thunderbolt 3 started to use USB Type-C, and Thunderbolt 3 having enough bandwidth to support external graphics solutions, that we started to see the number of available devices start to pick up. The issue still remains that the host and device need an expensive Intel-only controller, but the ecosystem was starting to become more receptive to its uses.

With Ice Lake, that gets another step easier.

Rather than bundle TB3 support into the chipset, Intel has integrated it on the die of Ice Lake, and it takes up a sizable amount of space. Each Ice Lake CPU can support up to four TB3 ports, with each TB3 port getting a full PCIe 3.0 x4 root complex link internally for full bandwidth. (For those keeping count, it means Ice Lake technically has 32 PCIe 3.0 lanes total).

Intel has made it so each side of the CPU can support two TB3 links direct from the processor. There is still some communication back and forth with the chipset (PCH), as the Type-C ports need to have USB modes implemented. It’s worth noting that TB3 can’t be directly used out of the box, however.

Out of the four ports, it will be highly OEM dependent on how many of those will actually make it into the designs – it’s not as simple as just having the CPU in the system, but other chips (redrivers) are needed to support the USB Type-C connector. Power delivery too requires extra circuitry, which costs money. So while Intel advertises TB3 support on Ice Lake, it still needs something extra from the OEMs. Intel states that a retimer for the integrated solution is only half the size compared to the ones needed with the TB3 external chips, as well as supporting two TB3 ports per retimer, therefore halving the number of retimers needed.

Here’s a more detailed schematic, showing the complexities of adding in TB3 into a chip, with the four PCIe x4 complexes shown moving out to each of the individual PHYs at the bottom, and connected back into the main SoC interconnect fabric. The display engine also has to control what mode the TB3 ports are in, and what signals are being sent. Wake up times for TB3 in this fashion, according to Intel, are actually slightly longer compared to a controller implementation, because the SoC is so tightly integrated. This sounds somewhat counterintuitive, given that the requisite hardware blocks are now closer together, but it all comes down to power domains – in a separate chip design, each segment has a separate domain with individual power up/down states. In an integrated SoC, Intel has unified the power domains to reduce complexity and die area, which means that more careful management is required but latency ultimately increases a little bit.

The other upside to the tightly coupled integration is that Intel stated that this method of TB3 is a lot more power efficient that current external chip implementations. However they wouldn’t comment on the exact power draw of the TB3 block on the chip as it corresponds to the full TDP of the design, especially in relation to localized thermal density (Intel was initially very confused by my question on this, ultimately saying that the power per bit was lower compared to the external chip, so overall system power was lower – they seemed more interested in discussing system power over chip power). Intel did state that the difference between an idle and a fully used link was 300 mW, which suggests that if all four links are in play, we’re looking at 1.2 W. When asked, Intel stated that there are three different power delivery domains within the TB3 block depending on the logic, that the system uses integrated voltage regulation, and the TB3 region has an internal power rail that is shared with some of the internal logic of the CPU. This has implications when it comes to time-to-wake and idle power, but Intel believes it has found a good balance.

Regarding USB4 support, Intel stated that it is in the design, and they are USB4 compliant at this point, but there might be changes and/or bugs which stop it from being completely certified further down the line. Intel said that it ultimately comes down to the device side of the specification, although they have put as much in as they were able given the time constraints of the design. They hope to be certified, but it’s not a guarantee yet.

Depending on who you speak to, this isn’t Intel’s first crack at putting TB3 into CPU silicon: the chip that Intel never wants to talk about, Cannon Lake, supposedly also had an early TB3 design built inside that never worked. But Intel is confident in its Ice Lake implementation, especially with supporting four ports. I wouldn’t be surprised if this comes to desktop when Intel releases its first generation 10nm desktop processors.

*The asterisk in the title of this page is because you still need external hardware in order to enable TB3.

Using Power More Efficiently: Dynamic Tuning 2.0 Wi-Fi 6: Implementing AX over AC*
POST A COMMENT

107 Comments

View All Comments

  • repoman27 - Tuesday, July 30, 2019 - link

    “Each CPU has 16 PCIe 3.0 lanes for external use, although there are actually 32 in the design but 16 of these are tied up with Thunderbolt support.”

    This isn’t quite right. The ICL-U/Y CPU dies do not expose any PCIe lanes externally. They connect to the ICL PCH-LP via OPI and the PCH-LP exposes up to 16 PCIe 3.0 lanes in up to 6 ports via HSIO lanes (which are shared with USB 3.1, SATA 6Gbps, and GbE functions). So basically no change over the 300 Series PCH.

    The integrated Thunderbolt 3 host controller may well have a 16-lane PCIe back end on-die, and I’m sure the CPU floorplan can accommodate 16 more lanes for PEG on the H and S dies, but that’s not what’s going on here.
    Reply
  • voicequal - Friday, August 2, 2019 - link

    The SoC architecture shows a direct path for the Thunderbolt3 PCIe lanes to the CPU, with only USB2 going across OPI.. Whatever PCIe lanes are available on the PCH are in addition those available via TB3.

    https://images.anandtech.com/doci/14514/Blueprint%...
    Reply
  • repoman27 - Tuesday, August 6, 2019 - link

    The Thunderbolt 3 controller is part of the CPU die. There are four PCIe 3.0 x4 root ports connected to the CPU fabric that feed the Thunderbolt protocol converters connected to the Thunderbolt crossbar switch (the Converged I/O Router block in that diagram). The CPU exposes up to three (for Y-Series) or four (for U-Series) Thunderbolt 3 ports. The only way you can leverage the PCIe lanes on the back-end of the integrated Thunderbolt 3 controller is via Thunderbolt.

    The PCH is a separate die on the same package as the CPU die. The two are connected via an OPI x8 link operating at 4 GT/s which is essentially the equivalent of a PCIe 3.0 x4 link. The PCH contains a sizable PCIe switch internally which connects to the back-ends of all of the included controllers and also provides up to 16 PCIe 3.0 lanes in up to 6 ports for connecting external devices. These 16 lanes are fed into a big mux which Intel refers to as a Flexible I/O Adapter (FIA) along with all the other high-speed signals supported by the PCH including USB 3.1, SATA 6Gbps, and GbE to create 16 HSIO lanes which are what is exposed by the SoC. So there are up to 16 PCIe lanes available from the Ice Lake SoC package, all of which are provided by the PCH die, but they come with the huge asterisk that they are exposed as HSIO lanes shared with all of the other high-speed signaling capabilities of the PCH and provisioned by a PCIe switch that effectively only has a PCIe 3.0 x4 connection to the CPU.

    This is not at all what Ian seemed to be describing, but it is the reality.

    And the USB 2.0 signals for the Thunderbolt 3 ports do indeed come from the PCH, but they do not cross the OPI, they're simply routed from the SoC package directly to the Thunderbolt port. The Thunderbolt 3 host controller integrated into the CPU includes a USB 3.1 xHCI/xDCI but does not include a USB 2.0 EHCI.
    Reply
  • poohbear - Tuesday, July 30, 2019 - link

    I was looking at buying Dell's XPS 15.6" (7590 model), but with Project Athena laptops a few months away, i think i'll wait. Intel parts for solid reliability and unified drivers, and "4 hours of battery life with <30min of charging", those 2 on their own make the wait worth it for me! Reply
  • repoman27 - Tuesday, July 30, 2019 - link

    “The connection to the chipset is through a DMI 3.0 x4 link...”

    Should be OPI x8 for U/Y Series.

    “...Ice Lake will support up to six ports of USB 3.1 (which is now USB 3.2 Gen 1 at 5 Gbps)...”

    They’re USB 3.1 Gen 2 ports, so it’s six USB 3.2 Gen 2 x 1 (10 Gbit/s) ports.
    Reply
  • Roel9876 - Tuesday, July 30, 2019 - link

    Well, for one, it is certainly not realistic to run single thread benchmarks on application that support multi threading. Realistically, most (all?) people will run the application multi threaded? Reply
  • HStewart - Tuesday, July 30, 2019 - link

    As developer for many years, multiple threads are useful for handling utility threads and such - but IO is typically area which still has to single thread. Unless it has significantly change in API, it is very difficult to multi-thread the actual screen. And similar for disk io as resource. Reply
  • Arnulf - Tuesday, July 30, 2019 - link

    "Our best guess is that these units assist Microsoft Cortana for low-powered wake-on voice inference algorithms ..."

    Our best guess is that these are designed for use by assorted three-letter agencies.
    Reply
  • PeachNCream - Tuesday, July 30, 2019 - link

    Open mics are totally okay. There is absolutely no privacy risk to you at all and you should never give it a second thought. Reply
  • ToTTenTranz - Tuesday, July 30, 2019 - link

    With 4x TB3 connections available, I wonder if the maker of an external GPU box could develop a multiplexer that combined two TB3 connections into a PCIe 3.0 8x.

    This would significantly decrease some problems that eGPU owners are having due to relatively low CPU-GPU bandwidth.
    Reply

Log in

Don't have an account? Sign up now