Instruction Sets: Alder Lake Dumps AVX-512 in a BIG Way

One of the big questions we should address here is how the P-cores and E-cores have been adapted to work inside a hybrid design. One of the critical aspects in a hybrid design is if both cores support different levels of instructions. It is possible to build a processor with an unbalanced instruction support, however that requires hardware to trap unsupported instructions and do core migration mid-execution. The simple way to get around this is to ensure that both types of cores have the same level of instruction support. This is what Intel has done in Alder Lake.

In order to get to this point, Intel had to cut down some of the features of its P-core, and improve some features on the E-core. The biggest thing that gets the cut is that Intel is losing AVX-512 support inside Alder Lake. When we say losing support, we mean that the AVX-512 is going to be physically fused off, so even if you ran the processor with the E-cores disabled at boot time, AVX-512 is still disabled.

Intel’s journey with AVX-512 has been long and fragmented. Some workloads can be vectorised – multiple bits of consecutive data all require the same operation, so you can pack them into a single register and perform it all at once with a single instruction. Designed as its third generation of vector instructions (AVX is 128-bit, AVX2 is 256-bit, AVX512 is 512-bit), AVX-512 was initially found on server processors, then mobile, and we found it in the previous version of desktop processors. At the time, Intel stated that by enabling AVX-512 on its processor line from top to bottom, it would encourage greater adoption, and they were leaning hard into this missive.

But that all changes with Alder Lake. Both desktop processors and mobile processors will now have AVX-512 disabled in all scenarios. But the silicon will still be physically present in the core, only because Intel uses the same core in its next generation server processors called Sapphire Rapids. One could argue that if the AVX-512 unit was removed from the desktop cores that they would be a lot smaller, however Intel has disagreed on this point in previous launches. What it means is that for the consumer parts we have some extra dark silicon in the design, which ultimately might help thermals, or absorb defects.

But it does mean that AVX-512 is probably dead for consumers.

Intel isn’t even supporting AVX-512 with a dual-issue AVX2 mode over multiple operations - it simply won’t work on Alder Lake. If AMD’s Zen 4 processors plan to support some form of AVX-512 as has been theorized, even as dual-issue AVX2 operations, we might be in some dystopian processor environment where AMD is the only consumer processor on the market to support AVX-512.

On the E-core side, Gracemont will be Intel’s first Atom processor to support AVX2. In testing with the previous generation Tremont Atom core, at 2.9 GHz it performed similarly to a Haswell 2.9 GHz Celeron processor, i.e. identical in non-AVX2 situations. By adding AVX2, plus fundamental performance increases, we’re told to expect ‘Skylake-like performance’ from the new E-cores. Intel also stated that both the P-core and E-core will be at ‘Haswell-level’ AVX2 support.

By enabling AVX2  on the E-cores, Intel is also integrating support for VNNI instructions for neural network calculations. In the past VNNI (and VNNI2) were built for AVX-512, however this time around Intel has done a version of AVX2-VNNI for both the P-core and E-core designs in Alder Lake. So while AVX-512 might be dead here, at least some of those AI acceleration features are carrying over, albeit in AVX2 form.

For the data center versions of these big cores, Intel does have AVX-512 support and new features for matrix extensions, which we will cover in that section.

Gracemont Microarchitecture (E-Core) Examined Conclusions: Through The Cores and The Atoms
Comments Locked

223 Comments

View All Comments

  • TristanSDX - Thursday, August 19, 2021 - link

    "decreasing the manufacturing cost for Alder Lake, by using all the defect chips and reserving the good ones for Sapphire Rapids."
    Alder Lake and Shapire Rapids are two totally diffrerent chips
  • mode_13h - Thursday, August 19, 2021 - link

    > Designed as its third generation of vector instructions

    Depends on how you're counting. First is definitely MMX. That was extended in a few subsequent CPUs, but they didn't call those extensions MMX2 or anything. MMX was strictly integer, however, and total vector width was 64 bits. MMX had the annoying feature of reusing the FPU registers, which complicated mixing it with x87 code and basically requiring a state reset, when going from MMX -> x87 code.

    Then, SSE came along and added single-precision floating-point. It also added a distinct set of vector registers, which were 128 bits. Finally, it included scalar single-precision arithmetic operations, beginning the era of x87's obsolescence.

    SSE2 followed with double-precision and integer operations, making MMX obsolete and further replacing x87 functionality.

    SSE3, the wondefully-named SSSE3, and a couple rounds of SSE4 came along, but all were basically just rounds of various additions to flesh out what SSE/SSE2 introduced.

    Then, AVX was introduced as something of a replacement for SSE. AVX registers are 256 bits. Like SSE, AVX was initially just including single-precision floating-point support. And like SSE2, AVX2 added double-precision and integer operations.

    Then, Xeon Phi (2nd gen) and Skylake-SP introduced the first variations on AVX-512 support. You can see what a mess AVX-512 is, here:

    https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AV...

    Anyway, AVX-512 should be considered Intel's FOURTH family of vector computing instructions, in x86. I think the first time they dabbled with vector instructions was in the venerable i860 - a very cool, but also fairly problematic step in the history of computing.

    > (AVX is 128-bit, AVX2 is 256-bit, AVX512 is 512-bit),

    No, not at all. The register width for AVX and AVX2 is 256 bits, as I explained above.

    However, even that is a slight simplification. AVX introduced some refinements in vector programming, such as a more compiler-friendly 3-operand format. Therefore, it was meant to subsume SSE usage, and included support for 128-bit operations. Similarly, AVX-512 introduced further refinements and the capability to use it on 128-bit and 256-bit operands.

    For more, see: https://en.wikipedia.org/wiki/AVX-512#Encoding_and...
  • mode_13h - Thursday, August 19, 2021 - link

    One more correction:

    > Some workloads can be vectorised – multiple bits of consecutive data all require
    > the same operation, so you can pack them into a single register and perform it
    > all at once with a single instruction.

    Intel's vector instruction extensions aren't strictly SIMD. They include horizontal operations that you don't see in classical SIMD processors or most GPUs.
  • mode_13h - Thursday, August 19, 2021 - link

    > One could argue that if the AVX-512 unit was removed from the desktop
    > cores that they would be a lot smaller

    That's what I thought, but the area overhead it added to a Skylake-SP core was estimated at a mere 11%.

    https://www.realworldtech.com/forum/?threadid=1932...

    Of course, we can't yet know how much of Golden Cove it occupies, but still probably somewhere in that ballpark.
  • mode_13h - Thursday, August 19, 2021 - link

    > Intel isn’t even supporting AVX-512 with a dual-issue

    Perhaps because AVX-512 doubled the number and size of vector registers. So, just the vector register file alone would grow 4x in size.
  • Schmide - Thursday, August 19, 2021 - link

    64bit packed doubles are in avx as are some 64bit ints. AVX2 filled in a lot of gaps such as full vector operands and reorders. So as much as AVX2 finished off the 32 and 64bit ints (epi) functions. There was already a fair amount in avx.
  • Schmide - Thursday, August 19, 2021 - link

    not to be misleading. There were really no usable int functions in avx other than load and store.
  • maroon1 - Thursday, August 19, 2021 - link

    Gracemont beats skylake ???? Really ? I'm reading the article correctly

    So these small cores are actually very powerful !!
  • vegemeister - Thursday, August 19, 2021 - link

    The hypothetical 8% increase in peak performance seems like wishful thinking to me. The chart looks like "graphic design" marketing wank, not plotted data. I would only go by the printed numbers. That is, at an operating point that matches Skylake peak performance, Gracemont cores use less than 60% of Skylake's power, and if you ran Skylake at that same power, it would have less than 60% of Gracemont's performance.
  • mode_13h - Thursday, August 19, 2021 - link

    > I would only go by the printed numbers.

    Okay, so are those numbers you used hypothetical, or where did you see 60%?

    Also, there's no fundamental reason why the ISO-power and ISO-performance deltas should match.

Log in

Don't have an account? Sign up now