Intel’s Ice Lake and Sunny Cove

A Welcome Update, with Questions on Execution

When launching a new mobile product using a new processor design, all the parts have to come together to make it work. This means all of the following:

  • The microarchitecture design
  • SoC implementation
  • Manufacturing
  • Co-operation with partners/OEMs
  • Time to Market
  • Messaging
  • Marketing
  • Interactions with the press and consumers
  • Promotion
  • Price
  • Competitiveness

Any one of these segments could make or break the next wave of innovation (a key phrase Intel likes to use). With Intel’s 10nm manufacturing process, and the 10th Gen Core 'Ice Lake' design, we’re moving along each of these points in turn, and it is interesting to see what the final reflection will be several years down the line. From our perspective, we now have extensive details on the core and the SoC, and Intel is slowly moving into product phase by the end of the year. We can evaluate what we’ve seen.

The Core and the SoC

Having ‘not another Skylake’ is a huge sigh of relief. With every new microarchitecture update, especially significant ones, we like to see how the power budget is being used and what key structures within the microarchitecture have been improved and expanded upon. The team behind the Sunny Cove core have listed some impressive hard numbers, with doubling the L1 store bandwidth, making significant L1 and L2 cache adjustments, increasing the reorder buffer a whopping +57%, and moving to a 10-port execution unit design. The inclusion of AVX-512 is welcomed by parts of the community for sure, although I expect not as many as Intel would like (this leads onto Intel’s DLBoost strategy, I’ll mention later). Assuming we can get the same +18% IPC metrics that Intel does, this is a great uplift and it will be exciting to see where we go from here.

On the SoC, the big improvements to the graphics and the memory controller are my important ones here. Intel is taking integrated graphics seriously again, with an improved 64 EU design that promises to match AMD’s best at 25W. Intel has improved its memory controller significantly here, now supporting LPDDR4-3733 and DDR4-3200, which helps given that the graphics hardware is always craving memory bandwidth.

Adding Thunderbolt 3 to the SoC is going to help drive adoption in the mobile space, however speaking to a number of press, we think that because TB3 still requires an expensive controller on the device end, that the standard still has that high-cost to entry barrier, even if it ends up being cheaper for host implementation. Until non-Intel TB3 device controllers hit the market, the biggest benefit here is going to be potential support for the USB4 standard. With respect to the Wi-Fi 6 enablement, I’m not so enthused here, as it still requires an additional Intel-only proprietary CNVi module, forcing OEMs to go down an Intel route. If Intel made the CNVi standard open to all, I’d be making a bigger deal about this.

Manufacturing and Time To Market

The fact that Intel is going to end up competing against itself, with Ice Lake-U against Comet Lake-U, is akin to shooting yourself in the foot. Ultimately we expect Ice Lake-U to be focused on the premium market through Intel’s Project Athena, but Comet Lake-U is likely to span from budget to premium as well, especially if it ends up supporting more cores than Ice Lake. Ice Lake should have been a 2017 product, and Intel is only now at a point where it believes its 10nm products are viable for the market, which leads to questions on if the process is even profitable, and how many chips are going to be made available to OEMs – again, because of Project Athena, only key OEM customers going for those premium devices are going to launch initially, tailoring the messaging towards that premium feel.

Questions still surround Intel’s 10nm viability, especially given that the company has promised it is going to be producing high core count Xeons on this process node. Hopefully we can get an update on that.

Ice Lake Messaging, and Project Athena

As mentioned, the whole deal with Whiskey Lake and Comet Lake single thread CPU performance being similar to Ice Lake is going to be a mess. Part of this is down to the messaging for sure, whereby both Ice Lake and Comet Lake are going to be Intel 10th Gen, with Core i7 and Core i5/i3 variants. The only way to distinguish between the two is that Ice Lake has a G in the SKU and Comet Lake has a U, details that some of Intel’s own partners don’t disclose on product pages on websites.

One thing that Intel could lean on is the AVX-512 in Ice Lake, and its DL Boost strategy. Despite AVX-512 being inside Intel’s Xeons, the amount of ‘consumer-grade’ software that uses it is very minimal. By moving it into the consumer platforms, and enabling features like VNNI, Intel wants to drive AI-software solutions into the market. One example we were given was the ability for image software to sort images by what it detects in them – and having this all done locally (and securely) on the CPU, rather than the cloud. If Intel can execute on DL Boost for consumers, it could be a big win, and an easy benefit over the competition.

Turning to Project Athena – I’m all in favor of driving innovation forward and defining specifications that help push new and innovative form factors to enhance the user experience. But as you might have guessed, as it stands, I’m not its biggest fan. The Athena ecosystem is its own closed playground – in order for OEMs to play ball, they have to meet a number of requirements, a sizable chunk of which are only beneficial through Intel-only hardware. For example, Athena devices need Wi-Fi 6, which can be provided by an M.2 module or a CNVi RF chip. Athena devices also need substantial battery life, and the best way to achieve that is by using a CNVi RF option – but Intel is the only one that can make them because the CNVi standard is proprietary.

If an OEM doesn’t want to use Intel’s Wi-Fi 6 solution, then it makes it harder to develop an Athena certified device. It means that OEMs have to play Intel’s game in order to compete in the space. For me, that’s a high entry fee. Sure, the certified devices at the end of day are likely to be great, but if they all have Intel Wi-Fi 6, then where’s the variation? Perhaps I’m a little altruistic here: Intel is trying to create its own certification program and to play by the rules might require investing in Intel-only designed controllers. But I feel that an effort like this works best when there’s competition at every level.

Competition

On a core-to-core level, Intel's recent designs have the following configurations:

Microarchitecture Comparison
  Skylake Cannon Lake Sunny Cove*   Zen Zen 2
L1-D
Cache
32 KiB/core
8-way
32 KiB/core
8-way
48 KiB/core
12-way
  32 KiB/core
8-way
32 KiB/core
8-way
L1-I
Cache
32 KiB/core
8-way
32 KiB/core
8-way
32 KiB/core
8-way
  64 KiB/core
4-way
32 KiB/core
8-way
L2
Cache
256 KiB/core
4-way
256 KiB/core
4-way
512 KiB/core
8-way
  512 KiB/core
8-way
512 KiB/core
8-way
L3
Cache
2 MiB/core
16-way
2 MiB/core
16-way
2 MiB/core
16-way
  2 MiB/core 4 MiB/core
L3 Cache Type Inclusive Inclusive Inclusive   Non-Inclusive Non-Inclusive
Decode 4 + 1 4 + 1 4 + 1   4 4
uOP Cache 1.5k 1.5k 2.25k   2k 4k
Reorder Buffer 224 224 352   192 224
Execution Ports 8 8 10   10 11
AGUs 2 + 1 2 + 1 2 + 2   1 + 1 2 + 1
AVX-512 - 1 x FMA 1 x FMA   -  
* Sunny Cove numbers for Client. Server will have different L2/L3 cache and FMA, like Skylake

Where AMD has reduced the size of the L1-D cache (to fit in a 4k micro-op cache), Intel has increased it. Both AMD and Intel now sit with 512 KiB L2 caches, although AMD is 4 MiB of non-inclusive cache to Intel's 2 MiB of inclusive cache. Intel has a much larger re-order buffer, and made improvements to its address generation units to help feed the cores. It's becoming ever more important to feed the beast.

Final Thought of the Day

Looking through the Ice Lake and Sunny Cove design, I have to give kudos to Intel’s engineers. The core microarchitecture looks solid, and there is no doubt that Intel will have a raw single thread performance advantage in the mobile space. I’m glad that Intel is taking its graphics solutions seriously again, and between the CPU and GPU, it’s good to see that extra power budget going to good use. I can’t wait to have the hardware on hand.

If you’re listening Intel, please consider the following: I haven’t seen any plans to bring Ice Lake to the desktop, but can we get a quad-core Ice Lake-U at 35W in a desktop processor form factor, for $179?

Competing Against Itself: 3.9 GHz Ice Lake-U on 10nm vs 4.9 GHz Comet Lake-U on 14nm
Comments Locked

107 Comments

View All Comments

  • name99 - Wednesday, July 31, 2019 - link

    That’s an idiotic chain of reasoning.
    ARM Macs will ship with macOS, not iOS. To believe otherwise only reveals that you know absolutely nothing of how Apple thinks.

    As for comparison, the rough number is A12X gets ~5200 on GB4, Intel best (non-OC’d) gets ~5800. That’s collapsing lots of numbers down to one, but comparing benchmark by benchmark you see Apple does very well (almost matching Intel) across an awful lot.

    If Apple can maintain its past pace (and there is no reason why not...) we can expect A13X to be anywhere from 20% to 35% faster, which puts it well into “fastest [non-OC’d] CPU on earth” territory for most single-threaded use cases. Can they achieve this? Absolutely.
    Just process improvement can get them 10% frequency. I expect A13X to clock around 2.8GHz.
    Then there is LPDDR5 which I expect they will be using, so substantially improved memory bandwidth. Then I expect they'll have SVE (2x256) and accompanying that basically double the bandwidth all the way out from L1 to DRAM.
    These are just the obvious basics. There are a bunch of things they can still do that represent “fairly easy” improvements to get to that 25% or so. (These include more aggressive fusion, a double-pumped ALU, attached ALUs to load/store to allow load-ok and op-store fusion, a micro-op cache, long-term-parking, criticality prediction, ...)

    So, if it’s so easy, why doesn’t Intel also do it? Why indeed? That’s why I occasionally post my alternative rant about how INTC is no longer an engineering company, it is now pretty much purely a finance company...
  • ifThenError - Friday, August 2, 2019 - link

    Sorry, but both these comments seem mighty uninformed. The MacBooks Air and Pro currently and in the foreseeable future all run on Intel CPUs. The Apple Chips A12/13 are used in iPhone, iPad and the likes.

    And regarding your prediction, your enthusiasm seems way over the top. What are you even talking about? Micro-op cache on a RISC processor? Think again. Aren't RISC commands all micro ops already?
  • name99 - Sunday, August 4, 2019 - link

    Strong the Dunning-Kruger is with this one...
    Dude, seriously, learn something about MODERN CPU design, more than just buzz-words from the 80s.
    To get you started, how about you read
    https://www.anandtech.com/show/14384/arm-announces...
    and concentrate on understanding EVERY aspect of what's being added to the CPU and why.
    Note in particular that 1.5K Mop cache...

    More questions to ask yourself:
    - Why was 80s RISC obsessed with REDUCED instructions?
    - Why was ARM (especially ARMv8) NOT obsessed with that? Look at the difference between ARMv8 and, say, RISC-V.
    - Why is op-fusion so important a part of modern high performance CPUs (both x86 and ARM [and presumably RISC-V if they EVER ship a high-performance part, ha...])?
    - which are the fast (shallow logic, even if it's wide) and which are the slow (deep logic) parts of a MODERN pipeline?
  • ifThenError - Monday, August 5, 2019 - link

    Oh my, this is so entertaining you should charge for the reading.

    You demand to go beyond just buzz words (what would be good) while your posts look like entries to a contest on how many marketing phrases can be fit into a paragraph.
    Then you even manage to combine this with highly rude idiom. Plus you name a psychological effect but fail to transfer it to self-reflexion. And as cherry on the top you obviously claim for yourself to understand „EVERY aspect“ of a CPU (an unimaginably complex bit of engineering) but even manage to confuse micro- and macro-op cache and the conceptual differences of these.

    I'm really impressed by your courage. Publicly posting so boldly on such a thin basis is brave.
    Your comments add near zero information but are definately worth the read. Pure comedy gold!

    Please see this as an invitation to reply. I'm looking forwards to some more of your attempts to insult.
  • Techgeek43 - Tuesday, July 30, 2019 - link

    Fantastic article Ian, I for one, cannot wait for ice lake laptops
    Wonderful in-depth analysis, with an interesting insight into the Intel brand
  • repoman27 - Tuesday, July 30, 2019 - link

    "The high-end design with 64 execution units will be called Iris Plus, but there will be a ‘UHD’ version for mid-range and low-end parts, however Intel has not stated how many execution units these parts will have."

    Ah, but they have: Ice Lake-U Iris Plus (48EU, 64EU) 15 W, Ice Lake-U UHD (32EU) 15 W. So their performance comparisons may even be to the 15 W Iris Plus with 64 EUs, rather than the full fat 28 W version.

    I know you have access to the media slide decks, but Intel has also posted product briefs for the general public that contain a lot of this info: https://www.intel.com/content/www/us/en/products/d...

    "On display pipes, Gen11 has access to three 4K pipes split between DP1.4 HBR3 and HDMI 2.0b. There is also support for 2x 5K60 or 1x 4K120 with a 10-bit color depth."

    The three display pipes are not limited to 4K, and are agnostic of transport protocol—each of them can be output via the eDP 1.4b port, one of the 3 DDI interfaces which can support either DisplayPort 1.4 or HDMI 2.0b, or one of the up to 4 Thunderbolt 3 ports. Both HDMI and DP support HDCP 2.2, and DisplayPort also supports DSC 1.1. The maximum single pipe, single port resolution for HDMI is 4K60 10bpc (4:2:2), and for DisplayPort it's 4K120/5K60 10bpc (with DSC).

    Thunderbolt 3 integration for Ice Lake-Y is only up to 3 ports.
  • abufrejoval - Tuesday, July 30, 2019 - link

    What I personally liked most about the GT3e (48 EU) and GT4e (72 EU) Skylake variant SoCs was, that they didn't cost the extra money they should have, especially when you consider that the iGPU part completely dwarfs the CPU cores (which Intel makes you bleed for) and is much better than everything else combined together (have a look at the WikiChips layouts
    https://en.wikichip.org/wiki/intel/microarchitectu...

    Of course, a significantly better graphics performance is never a bad thing, especially when it also doesn't cost extra electrical power: The bigger iGPUs might have actually been more energy efficient than their GT2 brethren at a graphics load that pushed the GT2 towards its frequency limits. And in any case if you don't crunch it on graphics, the idle consumption is near perfect: One of the reasons most laptop dGPU designs won't even bother to run 2D on the dGPU any more but leave that to Intel.

    The biggest downside was that you couldn't buy them outside an Apple laptop or Intel NUC.

    But however much Intel goes into Apple mode (the major customer for these beefier iGPUs) in terms of "x time faster than previous", the result aren't going to turn ultrabooks with this configuration into "THD gaming machines".

    To have a good feel as to where these could go and whether they are worth the wait, just have a look at the Skull Canyon nuc6i7kyk review on this site: That SoC uses 72 EUs and 128MB of eDRAM and should put a pretty firm upper limit to what a 64 EU Ice Lake can do: Most of the games in that review are somewhat dated yet fail to reach 20FPS at THD.

    So if you want to game on the device, you'd be much better of with a dGPU however small and chose the smallest iGPU variant available. No reason to wait, Whisky + Nvidia will do better.

    If you want real gaming performance, you need to put real triple digit Watts and the bandwidth only GDDR5/6 or HBM can deliver to work even at THD, but with remote gaming perhaps it doesn't have to be on your elegant slim ultrabook. There again anything but the GT2 configuration is wasted, because only need the VPU part for decoding Google Stadia (or Steam Remote) streams, which is the same for all configurations.

    For some strange reason, Intel has been selling GT3/4 NUCs at little or no premium over GT2 variants and in that case I have been seriously tempted. And only once I even managed to find a GT3e laptop once for a GT2 price (while the SoC is literally twice as big and the die carrier even adds eDRAM at zero markup), which I stil cherish.

    But if prices are anywhere related to the surface area of the chip (as they are for the server parts), these high powered GTs are something that only Apple users would buy.

    That's another reaons, I (sadly) don't expect them to be sold in anything bug Macs and some NUCs, no ChuWi notebooks or Mini-ITX boards.
  • abufrejoval - Tuesday, July 30, 2019 - link

    ...(need edit)

    Judging from the first 10nm generation, GPUs where the part where obtaining economically feasible yields didn't work out. Unless they have really, really fixed 10nm it's not hard to imagine that Intel could be selling high-count EU SoCs to Apple below cost, to keep them for another generation as flagship customer and perhaps due to long-term contractual obligations.

    But maintaining GT2/3/4 price egality for the rest of the market seems suicidal even if you have a fab lead.

    Not that I expect we'll ever be told: In near monopoly situations the so called market ecnomy becomes surprisingly complex.
  • willis936 - Wednesday, July 31, 2019 - link

    What the hell is a THD in this context?
  • jospoortvliet - Monday, August 5, 2019 - link

    Probably full HD (True HD)?

Log in

Don't have an account? Sign up now