Tiger Lake IO and Power

As part of Tiger Lake, other enhancements have been made to the chip outside of the traditional CPU/GPU components. In this article, as they directly impact core performance, we’ve already discussed improvements to the fabric, enabling a doubling of bandwidth with the dual bidirectional ring design, and the new LPDDR5-5400 support on the memory controller – go back a couple of pages to find information on these.

PCIe 4.0 Support

We lightly touched upon it in the graphics section, but the four core Tiger Lake processor will be the first mobile processor to support PCIe 4.0 directly from the CPU. Intel hasn’t specifically stated how many lanes of PCIe 4.0 the processor will support, which is kind of frustrating at this time, but they have made it clear that they have not experienced a power penalty moving from PCIe 3.0 in Ice Lake to PCIe 4.0 in Tiger Lake.

As it stands, Intel expects the PCIe 4.0 lanes to be used in these mobile processors primarily for PCIe 4.0 storage, however given the status of the current PCIe 4.0 NVMe SSDs on the market today and the high power requirements of the Phison E16 controller (~8W), we might have to wait a bit for other controllers to come in volume.

Intel did state that the quantity of PCIe 4.0 lanes did have a direct correlation with the CPU count and the power of the chip, but refused to state what the scaling is. Based on comments made by Intel during the Architecture Day, such as Tiger Lake supporting 24 MiB of L3 cache which would require an 8-core CPU, we suspect that a full 16 PCIe 4.0 lane version (or more?) to align with that product instead. That would mean the 4-core Tiger Lake version would be more akin to an 8-lane processor, which would gel with what we’ve seen with other mobile processors in the past.

However, there is part of me that suspects that this processor has only four PCIe 4.0 lanes. Intel’s quote about remaining iso-power between Ice Lake (PCIe 3.0 x8) and Tiger Lake on PCIe 4.0 might actually be that tradeoff – moving down to four lanes keeps that iso-power. Even with four PCIe 4.0 lanes, that’s still enough for a discrete Thunderbolt graphics card and a super-fast NVMe SSD, or dual NVMe 4.0 x2 drives. The higher data rates on PCIe 4.0 do require more power per lane, assuming an iso-process, as we’ve presumed with other products, but there are always silicon improvements that might help with that.

Update: Another element to support the PCIe 4.0 x4 theory – in multiple places during Architecture Day, Intel states that devices accessing memory over PCIe will have ‘8 GB/s bandwidth. Each PCIe 4.0 x1 link is approx. ~2 GB/sec, which would imply there is only four.

Gaussian and Neural Accelerator 2.0 (GNA)

One of the accelerators that Intel offered in Ice Lake was the GNA - a simple low power inferencing engine that enables the system to offload basic analysis or workloads such as noise reduction for calls or voice recording. In a previous guise, the GNA built upon the Gaussian Mixture Model, which we believe was IP dedicated to accelerating Microsoft’s Cortana in voice recognition. With Tiger Lake, we now get GNA 2.0.

No specifics were necessarily given as to what has changed this time around, aside from having the benefits of the 10SF process technology. Intel did quote some handy numbers though, stating that GNA 2.0 can perform 1 GigaOP at 1 milliwatt, and this can scale linearly up to 38 GigaOPs for 38 milliwatts. Intel never released similar performance/efficiency numbers for Ice Lake, stating only that GNA 2.0 is ‘enhanced’ for Tiger Lake.

Display and Image Processing Unit

We’ve covered the Display aspects of the Tiger Lake in the graphics section, but to reiterate, there are four 4K display pipelines: DP1.4, HDMI 2.0, Thunderbolt 4, and USB4 Type-C can be used simultaneously. The display engine also supports HDR10, 12-bit BT2020 color, Adaptive Sync, and support for monitors up to 360 Hz, and Intel states that the display engine can support up to 64 GB/s to memory, suggesting there is some overhead or bottleneck compared to the 86.4 GB/s supported by LPDDR5-5400. Tiger Lake also supports direct-to-memory data transfer for the display engine, bypassing the CPU – a feature first introduced with Skylake.

For the image processing unit, Intel has used the 10SF transistor budget to increase the size of its imagine pipelines in hardware. There is still support for six cameras, the same as Ice Lake, but the Tiger Lake silicon will eventually be capable of 4K90 video and 42 MP imaging support. Notice the ‘will eventually be capable’ in that last sentence – Intel has specified that this four core Tiger Lake will only support 4K30 and 27MP for video and imaging respectively. It wasn’t clarified at the time why there was this discrepancy and what it means, but our best guess is one of two things: the larger 8-core version of Tiger Lake (the one with the 24 MiB L3 cache that Intel kept talking about) will have the full support, or the full support can only be enabled with faster memory such as LPDDR5-5400, which won’t be available until mid-way through Tiger Lake’s product cycle.

Thunderbolt 4

Tiger Lake will be Intel’s first deployment of Thunderbolt 4 hardware, and the company will follow up with TB4 controllers for non-TGL systems later this year. TB4 is a superset of the USB4 standard, and thus Tiger Lake will also support USB 4. The way the Tiger Lake chip is built, two Thunderbolt 4 ports will be supported on each side of the laptop, and each port will support the full 40 Gb/s bandwidth. In order to qualify for next generation Athena specifications, one of those will need to be a quick-charging port.

We covered Thunderbolt 4 a few weeks ago, as Intel wanted to discuss TB4 ahead of the Tiger Lake launch. One of the key requirements for TB4 certification is that the processor must support some form of DMA write protection to prevent physical attacks. Intel does this through its processors supporting VT-d instructions, and when TB4 controllers come out, other processor vendors will have to enable similar technologies. Another TB4 certification requirement is going to be supporting wake-from-sleep through any TB4 device, such as a dock.

Power Management and Frequency/Voltage Scaling

One of the most important drivers with mobile processors is idle and sleep power – the more parts of the chip that can be put into a low power state when not in use, the better the battery life.

At a high level this means that if a laptop is playing a video, on the CPU we have the display engine is on and the video decode on, but most/all of the cores are in a low power state or a deep sleep mode, and the graphics are essentially tuned off, and the fabric is powered down as much as possible. As we move to denser process nodes with bigger transistor budgets, more of those transistors are being used to create individual power and frequency domains in order to manage how a processor deals with sub-dividing its parts for low powered modes.

On top of that, logic needs to be applied to manage all the different domains, and it needs to be designed such that when the parts that are turned off are needed again, they can be powered up with no noticeable delay to the end user.

With every generation of laptop product, both Intel and AMD continually introduce new features and better control over the different compute and interconnect blocks within mobile processors where it matters the most. For Tiger Lake, Intel has an updated its autonomous dynamic voltage/frequency scaling (DVFS) algorithms to take into account bandwidth requirements for a given workload.

This is done on top of other power optimizations at an SoC level, such as even better clock gating for the CPU cores and better voltage regulator efficiency for the integrated regulators. With Tiger Lake, even the PCIe, USB and thermal sensors now occupy their own domains for sleep states. When a component needs to be put in sleep, if it contains important data that often needs to be ‘saved’ somewhere for when it is restored: Intel now has improved hardware-based save and restore logic for this purpose, going beyond Ice Lake’s offerings. Exactly how much change has been made hasn’t been quantified, but the idea is that all these small adjustments will add up over time.

What is Xe-LP? Tiger Lake Performance and Products
Comments Locked

71 Comments

View All Comments

  • Mark242 - Thursday, August 13, 2020 - link

    Is the SuperFin tech really a generational impovement of the 10nm process or is it a backport from 7nm?
  • Sahrin - Thursday, August 13, 2020 - link

    So basically Intel had to re-engineer the entire technology stack to get 10nm to work.

    Are they still using EUV on all layers?
  • IanCutress - Thursday, August 13, 2020 - link

    EUV for Intel is on 7nm. There's no EUV on 10 or 10SF.
  • trivik12 - Thursday, August 13, 2020 - link

    Does the comment that 10 SFE is optimized for DC and so Alderlake will not see any xtor improvements from TGL and its just microarchitecture changes to improve performance.
  • Thunder 57 - Thursday, August 13, 2020 - link

    "As for the L3 cache on a quad-core Willow Cove system, Intel has moved from an 8 MiB non-inclusive shared L3 cache to a 12 MiB shared L3 cache."

    Pretty sure you meant inclusive L3, which the "Cache Comparison" chart got right.
  • WaltC - Thursday, August 13, 2020 - link

    Ian eating more silicon...ah...like a breath of fresh air...;) I find silicon wafers are best enjoyed with a set of Unobtanium™ dentures topped with the diamond tooth inserts and platinum alloy tips--chews up nice, goes down smooth! I have to agree in this really nice write up making the most of the sparseness Intel supplied, that all of this stuff looks incremental to me. Bits and pieces improved. Reading between the lines it looks like Intel is still struggling with its process nodes--the fact that they cannot ship even this right now is certainly telling...nor can they even supply a ship date, apparently.
  • Eliadbu - Thursday, August 13, 2020 - link

    I sure hope to see TGL scaled to 8 cores CPUs. I feel like the biggest drawback of ICL is it was scaled up to 4 cores only making it underpowered to even comet lake u (with up to 6 cores).
  • harobikes333 - Saturday, August 15, 2020 - link

    "If you’ve skipped to the end of this article without reading the pages in between,...."
    ^ You caught me - I'm a sucker for summaries. If I have time, I go back & read through the full articles. Alas, there's only so much time in a day!
  • ksec - Saturday, August 15, 2020 - link

    In the previous driver update I was under the impression AV1 decode is only partly ASIC accelerated. But the slides here seems to imply it is fully Hardware Decoded.
  • Farfolomew - Saturday, August 15, 2020 - link

    What's Intel playing at here with ignoring the 8-core offering of Renoir Mobile and just going for 4-core with Tiger Lake? This will be in effect for an entire generation (11th Gen) of mobile products, that they'll have a 50% core deficit. I get that Tiger Lake will have ~20% better ST performance over Renoir, and it could be argued that 4 cores is all one needs nowadays on Mobile, but still, it seems like a calculated and potentially dangerous move by Intel to ignore AMD's core-count advantage.

Log in

Don't have an account? Sign up now