SoC Tile, Part 2: NPU Adds a Physical AI Engine

The last major block on the SoC tile is a full-featured Neural Processing Unit (NPU), a first for Intel's client-focused processors. The NPU brings AI capabilities directly to the chip and is compatible with standardized program interfaces like OpenVINO. The architecture of the NPU itself is multi-engine in nature, which is comprised of two neural compute engines that can either collaborate on a single task or operate independently. This flexibility is crucial for diverse workloads and potentially benefits future workloads that haven't yet been optimized for AI situations or are in the process of being developed. Two primary components of these neural compute engines stand out: the Inference Pipeline and the SHAVE DSP.

The Inference Pipeline is primarily responsible for executing workloads in neural network execution. It minimizes data movement and focuses on fixed-function operations for tasks that require high computational power. The pipeline comprises a sizable array of Multiply Accumulate (MAC) units, an activation function block, and a data conversion block. In essence, the inference pipeline is a dedicated block optimized for ultra-dense matrix math.

The SHAVE DSP, or Streaming Hybrid Architecture Vector Engine, is designed specifically for AI applications and workloads. It has the capability to be pipelined along with the Inference Pipeline and the Direct Memory Access (DMA) engine, thereby enabling parallel computing on the NPU to improve overall performance. The DMA Engine is designed to efficiently manage data movement, contributing to the system's overall performance.

At the heart of device management, the NPU is designed to be fully compatible with Microsoft's new compute driver model, known as MCDM. This isn't merely a feature, but it's an optimized implementation with a strong emphasis on security. The Memory Management Unit (MMU) complements this by offering multi-context isolation and facilitates rapid and power-efficient transitions between different power states and workloads.

As part of building an ecosystem that can capitalize on Intel's NPU, they have been embracing developers with a number of tools. One of these is the open-source OpenVINO toolkit, which supports various models such as TensorFlow, PyTorch, and Caffe. Supported APIs include Windows Machine Learning (WinML), which also includes the DirectML component of the library, the ONNX Runtime accelerator, and OpenVINO.

One example of the capabilities of the NPU was provided through a demo using Audacity during Intel's Tech Tour in Penang, Malaysia. During this live demo, Intel Fellow Tom Peterson, used Audacity to showcase a new plugin called Riffusion. This fed a funky audio track with vocals through Audacity and separated the audio tracks into two, vocals and music. Using the Riffusion plugin to separate the tracks, Tom Peterson was then able to change the style of the music audio track to a dance track.

The Riffusion plugin for Audacity uses Stable Diffusion, which is an open-source AI model that traditionally generates images from text. Riffusion goes one step further by generating images of spectrograms, which can then be converted into audio. We touch on Riffusion and Stable Diffusion because this was Intel's primary showcase of the NPU during Intel's Tech Tour 2023 in Penang, Malaysia. 

Although it did require resources from both the compute and graphics tile, everything was brought together by the NPU, which processes multiple elements to spit out an EDM-flavored track featuring the same vocals. An example of how applications pool together the various tiles include those through WinML, which has been part of Microsoft's operating systems since Windows 10, typically runs workloads with the MLAS library through the CPU, while those going through DirectML are utilized by both the CPU and GPU.

Other developers include Microsoft, which uses the capability of the NPU in tandem with the OpenVINO inferencing engine to provide cool features like speech-to-text transcripts of meetings, audio improvements such as suppressing background noise, and even enhancing backgrounds and focusing capabilities. Another big gun using AI and is supported through the NPU is Adobe, which adds a host of features for adopters of Adobe Creative applications use. These features include generative AI capabilities, including photo manipulative techniques in Photoshop such as refining hair, editing elements, and neural filters; there's a lot going on.

SoC Tile, Part 1: Low Power Island (LP) E-Cores For Efficiency SoC Tile, Part 3: Disaggregating Xe Media and Display Engine From Graphics
Comments Locked

107 Comments

View All Comments

  • GeoffreyA - Saturday, September 23, 2023 - link

    I agree with most of what you're saying. What I was trying to get at is that there seems to be a belief that Apple has superior engineering ingenuity than Intel and AMD, when really, it is the difference between fixed- and variable-length instruction sets and all that entails. What I'd like to see is all of them on the same playing field and where each then stands, from a CPU point of view. Quite likely, there won't be much of a difference because good design principles are always the same. It's trying to be out of the ordinary that leads to Pentium 4s and Bulldozers.
  • GeoffreyA - Saturday, September 23, 2023 - link

    And yes, I'd like to see RISC-V winning in the end, rather than ARM.
  • GeoffreyA - Saturday, September 23, 2023 - link

    The thing is, ARM is almost fully ready on the Windows side of the coin. Windows on ARM appears to be working well, x64 emulation is up and running, increasingly more programs are getting ARM compiles, and Microsoft's VS and compilers now have ARM on an equal footing with x64. So, if Intel or AMD decided to make an ARM CPU, people could go over quite easily, similar to the early days of x64.
  • FWhitTrampoline - Thursday, September 21, 2023 - link

    Edit: royalist/encumberments to royalty/encumberments!

    And Firefox's Spell Checker is so bad that The Mozilla Foundation should be stripped of their Tax Exempt status until they fully comply and fix that.
  • Bluetooth - Saturday, September 23, 2023 - link

    Intel has proposed X86-S ISA, to get rid of all the legacy code and boot directly into 64 bit, (the proposal is available on their website). But I don't know, if this is enough to allow them to build wider decoders to improve the single thread performance.
  • GeoffreyA - Saturday, September 23, 2023 - link

    I took a look at x86-S and it certainly would be welcome, getting rid of unnecessary legacy features. From my understanding, I don't think it would help to build wider decoders. The problem in x86 is that the length of each instruction varies and is not known beforehand. At execution time, length has got to be worked out in predecode, and I imagine this constrains how much can be sent through the decoders, as well as taking up a great deal of power. In the fixed-width ISA, it is trivial to know where each instruction starts and send them off to the decoders in mass. A bit like comparing a linked list with an array.
  • FWhitTrampoline - Tuesday, September 19, 2023 - link

    up to clocked 2Ghz+ should read: Clocked up to.
  • Bluetooth - Saturday, September 23, 2023 - link

    He may overstate the power, but don't diss his remark by only focusing on that error, as the mobile processor is running at much lower frequencies.
  • tipoo - Tuesday, September 19, 2023 - link

    It sounds like you carried forward 3W from 2008. The A17 Pro draws more power than ever.

    https://www.youtube.com/watch?v=TX_RQpMUNx0
  • StevoLincolnite - Tuesday, September 19, 2023 - link

    He is nothing but a liar.

Log in

Don't have an account? Sign up now