ASUS Zenbook 14 OLED UX3405MA: AI Performance

As technology progresses at a breakneck pace, so do the demands of modern applications and workloads. As artificial intelligence (AI) and machine learning (ML) become increasingly intertwined with our daily computational tasks, it's paramount that our reviews evolve in tandem. To this end, we have AI and inferencing benchmarks in our CPU test suite for 2024. 

Traditionally, CPU benchmarks have focused on various tasks, from arithmetic calculations to multimedia processing. However, with AI algorithms now driving features within some applications, from voice recognition to real-time data analysis, it's crucial to understand how modern processors handle these specific workloads. This is where our newly incorporated benchmarks come into play.

Given makers such as AMD with Ryzen AI and Intel with their Meteor Lake mobile platform feature AI-driven hardware, aptly named Intel AI Boost within the silicon, AI, and inferencing benchmarks will be a mainstay in our test suite as we go further into 2024 and beyond. 

The Intel Core Ultra 7 155H includes the dedicated Neural Processing Unit (NPU) embedded within the SoC tile, which is capable of providing up to 11 TeraOPS (TOPS) of matrix math computational throughput. You can find more architectural information on Intel's NPU in our Meteor Lake architectural deep dive. While both AMD and Intel's implementation of AI engines within their Phoenix and Meteor Lake architectures is much simpler than true AI inferencing hardware, these NPUs are more designed to provide a high efficiency processor for handling light-to-modest AI workloads, rather than a boost to overall inferencing performance. For all of these mobile chips, the GPU is typically the next step up for truly heavy AI workloads that need maximum performance.

(6-3) TensorFlow 2.12: VGG-16, Batch Size 16 (CPU)

(6-3b) TensorFlow 2.12: VGG-16, Batch Size 64 (CPU)

(6-3d) TensorFlow 2.12: GoogLeNet, Batch Size 16 (CPU)

(6-3e) TensorFlow 2.12: GoogLeNet, Batch Size 64 (CPU)

(6-3f) TensorFlow 2.12: GoogLeNet, Batch Size 256 (CPU)

Looking at performance in our typical TensorFlow 2.12 inferencing benchmarks from our CPU suite, using both the VGG-16 and GoogLeNet models, we can see that the Intel Core Ultra 7 155H is no match for any of the AMD Phoenix-based chips.

(6-4) UL Procyon AI Computer Vision: MobileNet V3 (int)

(6-4) UL Procyon AI Computer Vision: Inception V4 (int)

(6-4) UL Procyon AI Computer Vision: ResNet 50 (int)

Meanwhile, looking at inference performance on the hardware actually optimized for it – NPUs, and to a lesser extent, GPUs – UL's Procyon Computer Vision benchmark collection offers support for multiple execution backends, allowing it to be run on CPUs, GPUs, or NPUs. For Intel chips we're using the Intel OpenVINO backend, which enables access to Intel's NPU. Meanwhile AMD does not offer a custom execution backend for this test, so while Windows ML is available as a fallback option to access the CPU and the GPU, it does not have access to AMD's NPU.

With Meteor Lake's NPU active and running the INT8 version of the Procyon Computer Vision benchmarks, in Inception V4 and ResNet 50 we saw good gains in AI inferencing performance compared to using the CPU only. The Meteor Lake Arc Xe LPG graphics also did well, although the NPU is designed to be more power efficient with these workloads, and more often as not significantly outperforms the GPU at the same time.

This is just one test in a growing universe of NPU-accelerated appliations. But it helps to illustrate why hardware manufactuers are so interested in NPUs: they deliver a lot of performance for the power, at least as long as a workload is restricted enough that it can be run on an NPU.

That all being said, even with the dedicated Intel AI Boost NPU within the SoC tile, the use case is very specific. Even trying generative AI within Adobe Photoshop using Neural fillers, Adobe was relying much more on the CPU than it was the GPU or the NPU, which shows that just because it's related to generative AI or inferencing, the NPU isn't always guaranteed to be used. This is still the very early days of NPUs, and even just benchmarking them for an active task remains an interesting challenge.

ASUS Zenbook 14 OLED UX3405MA: Graphics Performance (Arc vs Radeon) ASUS Zenbook 14 OLED UX3405MA: Battery & Thermal Performance
POST A COMMENT

69 Comments

View All Comments

  • Gavin Bonshor - Friday, April 12, 2024 - link

    I refer to it as a major gain; the other victories weren't huge. That was my point Reply
  • sjkpublic@gmail.com - Thursday, April 11, 2024 - link

    Comparing the 155H to the 7940HS is apples to oranges. A better comparison would be the 185H. Reply
  • Bigos - Thursday, April 11, 2024 - link

    What happened to SPECint rate-N 502.gcc_r results? I do not believe desktop Raptor Lake is 16-40x faster than the mobile CPUs... Reply
  • SarahKerrigan - Thursday, April 11, 2024 - link

    No, something is clearly wrong there. I deal with SPEC a lot for work and that's an abnormally low result unless it was actually being run in rate-1 mode (ie, someone forgot to properly set the number of copies.) Reply
  • Gavin Bonshor - Friday, April 12, 2024 - link

    I am currently investigating this. Thank you for highlighting it. I have no idea how I missed this. I can only apologize Reply
  • mode_13h - Monday, April 15, 2024 - link

    While we're talking about SPEC2017 scores, I'd like to add that I really miss the way your reviews used to feature the overall cumulative SPECfp and SPECint scores. It was useful in comparing overall performance, both of the systems included in your review and those from other reviews.

    To see what I mean, check out the bottom of this page: https://www.anandtech.com/show/17047/the-intel-12t...
    Reply
  • Ryan Smith - Monday, April 15, 2024 - link

    That's helpful feedback. It's a bit late to add it to this article, but that's definitely something I'll keep in mind for the next one. Thanks! Reply
  • mode_13h - Wednesday, April 17, 2024 - link

    You're quite welcome!

    BTW, I assume there's a "standard" way that SPEC computes those cumulative scores. Might want to look up how they do it, if the benchmark doesn't just compute them for you. If you come up with your own way of combining them, your cumulative scores probably won't be comparable to anyone else's.
    Reply
  • Ryan Smith - Wednesday, April 17, 2024 - link

    "BTW, I assume there's a "standard" way that SPEC computes those cumulative scores."

    Yes, there is. We don't run all of the SPEC member tests for technical reasons, so there is some added complexity there.
    Reply
  • mczak - Thursday, April 11, 2024 - link

    Obviously, the cluster topology description is wrong in the core to core latency measturements section, along with the hilarious conclusion the first two E-cores having only 5 ns latency to each other. (First two threads of course belong to a P-core, albeit I have no idea why the core enumeration is apparently 1P-8E-5P-2LPE.) Reply

Log in

Don't have an account? Sign up now