NVIDIA's Carmel CPU Core - SPEC2006 Speed

While the Xavier’s vision and machine processing capabilities are definitely interesting, it’s use-cases will be largely out-of-scope for the average AnandTech reader. One of the aspects of the chip that I was personally more interested in was NVIDIA’s newest generation Carmel CPU cores, as it represents one of the rare custom Arm CPU efforts in the industry.

Memory Latency

Before going into the SPEC2006 results, I wanted to see how NVIDIA’s memory subsystem compares against some comparable platform in the Arm space.

In the first logarithmic latency graph, we see the exaggerated latency curves which make it easy to determine the various cache hierarchy levels of the systems. As NVIDIA advertises, we see the 64KB L1D cache of the Carmel cores. What is interesting here is that NVIDIA is able to achieve quite a high performance L1 implementation with just under 1ns access times, representing a 2-cycle access which is quite uncommon. The second hierarchy cache is the L2 that continues on to the 2MB depth, after which we see the 4MB L3 cache. The L3 cache here looks to be of a non-uniform-access design as its latency steadily rises the further we go.

Switching back to a linear graph, NVIDIA does have a latency advantage over Arm’s Cortex-A76 and the DSU L3 of the Kirin 980, however it loses out at deeper test depths and latencies at the memory controller level. The Xavier SoC comes with 8x 32bit (256bit total) LPDDR4X memory controller channels, representing a peak bandwidth of 137GB/s, significantly higher than the 64 or 128bit interfaces on the Kirin 980 or the Apple A12X. Apple overall still has an enormous memory latency advantage over the competition as its massive 8MB L2 cache as well as the 8MB SLC (System level cache) allows for significant lower latencies across all test depths.

SPEC2006 Speed Results

A rarity for whenever we're looking at Arm SoCs and products built around them, NVIDIA’s Jetson AGX comes with a custom image for Ubuntu Linux (18.04 LTS). On one hand, including a Linux OS gives us a lot of flexibility in terms of test platform tools; but on the other hand, it also shows the relatively immaturity of Arm on Linux. One of the more regretful aspects of Arm on Linux is browser performance; to date the available browsers are still lacking optimised Javascript JIT engines, resulting in performance that is far worse than any commodity mobile device.

While we can’t really test our usual web workloads, we do have the flexibility of Linux to just simply compile whatever we want. In this case we’re continuing our use of SPEC2006 as we have a relatively established set of figures on all relevant competing ARMv8 cores.

To best mimic the setup of the iOS and Android harnesses, we chose the Clang 8.0.0 compiler. To keep things simple, we didn’t use any special flags other than –Ofast and a scheduling model targeting the Cortex-A53 (It performed overall better than no model or A57 targets). We also have to remind readers that SPEC2006 has been retired in favour of SPEC2017, and that the results published here are not officially submitted scores, rather internal figures that we have to describe as estimates.

The power efficiency figures presented for the AGX, much like all other mobile platforms, represent the active workload power usage of the system. This means we’re measuring the total system power under a workload, and subtracting the idle power of the system under similar circumstances. The Jetson AGX has a relatively high idle power consumption of 8.92W in this scenario, much that can be simply be attributed from a relatively non-power optimised board as well as the fact that we’re actively outputting via HDMI while having the board connected to GbE.

In the integer workloads, the Carmel CPU cores' performance is quite average. Overall, the performance across most workloads is extremely similar to that of Arm’s Cortex-A75 inside the Snapdragon 845, with the only outlier being 462.libquantum which showcases larger gains due to Xavier’s increased memory bandwidth.

In terms of power and efficiency, the NVIDIA Carmel cores again aren’t quite the best. The fact that the Xavier module is targeted at a totally different industry means that its power delivery possibly isn’t quite as power optimised as on a mobile device. We also must not forget that the Xavier has an inherent technical disadvantage of being manufactured on a 12FFN TSMC process node, which should be lagging behind Samsung’s 10LPP processes of the Exynos 9810 and the Snapdragon 845, and most certainly represents a major disadvantage against the newer 7nm Kirin 980 and Apple A12.

On the floating point benchmarks, Xavier fares overall better because some of the benchmarks are characterised by their sensitivity to the memory subsystem; in 433.milc this is most obvious. 470.lbm also sees the Carmel cores perform relatively well. In the other workloads however, again we see Xavier having trouble to differentiate itself much from the performance of a Cortex A75.

Here’s a wider performance comparison across SPEC2006 workloads among the most recent and important ARMv8 CPU microarchitectures:

Overall, NVIDIA’s Carmel core seems like a big step up for NVIDIA and their in-house microarchitecture. However when compared against most recent cores from the competition, we see the new core having trouble able to really distinguish itself in terms of performance. Power efficiency of the AGX also lags behind, however this is something that was to be expected given the fact that the Jetson AGX is not a power optimised platform, beyond the fact that the chip’s 12FFN manufacturing process is a generation or two behind the latest mobile chips.

The one aspect which we can’t quantize NVIDIA’s Carmel cores is its features: This is a shipping CPU with ASIL-C functional safety features that we have in our hands today. The only competition in this regard would be Arm’s new Cortex A76AE, which we won’t see in silicon for at least another year or more. When taking this into account, it could possibly make sense for NVIDIA to have gone with its in-house designs, however as Arm starts to offer more designs for this space I’m having a bit of a hard time seeing a path forward in following generations after Xavier, as competitively, the Carmel cores don’t position themselves too well.

Machine Inference Performance & What's it For? NVIDIA's Carmel CPU Core - SPEC2006 Rate
POST A COMMENT

51 Comments

View All Comments

  • xype - Friday, January 4, 2019 - link

    AnandTech is my reminder to turn the ad blocker back on if I turned it off for some reason. It’s insane how big of improvement in experience it is to block ads on AnandTech. Reply
  • Cellar Door - Friday, January 4, 2019 - link

    It is just a matter of time before we will get a message 'turn of your adblocker to proceed' - at that point I will abandon this site. For now, ublock origin keeps this site in check for me. Reply
  • DanNeely - Friday, January 4, 2019 - link

    FYI, 99% of the time I've found I could block notice complaining about having blocked various 3rd party malware distribution domains and still read the site with my crap blockers running. Reply
  • TheinsanegamerN - Friday, January 4, 2019 - link

    Or just use the anti ad blocker blocker in ublock origin. Reply
  • HollyDOL - Friday, January 4, 2019 - link

    I have to admit, AT taught me to install adblock, the level of ad annoyance climbed too high for me.
    I am still willing to pay a sub for a spam-free AT access.
    Reply
  • linuxgeex - Friday, November 8, 2019 - link

    It was THG that got me using AdBlock, but these days I turn off AdBlock on most of the sites I frequent and instead rely on ScriptSafe and Stylus to selectively disable the cruft. It's a little more work for me, but it allows sites I care about to still get revenue from the less annoying ad content, and I cross my fingers that they will learn to insert less annoying ads. Animated = blocked. Sound = bocked. Video = blocked. Causes content to jump around while loading = blocked. Inserts ads that look like navigation features = blocked (I'm looking at You, Google) Reply
  • Ryan Smith - Friday, January 4, 2019 - link

    "Why are there video ads automatically playing on each one of the Anandtech pages?"

    Our publisher (Future) has decided that they want to have this ad unit on every page. Unfortunately there's not much more I can say than that; it's their call.
    Reply
  • thesavvymage - Friday, January 4, 2019 - link

    :( Reply
  • thesavvymage - Friday, January 4, 2019 - link

    Could you at least speak to them on ad appropriateness? Mine are the usual low effort clickbait spam ads, or "The One Thing All Cheaters Have In Common" and "Seattle: Cable Companies are furious over this tiny device".

    Like I understand your publishers have to advertise, but crappy advertising like this gets the adblock treatment, point blank. Its an extremely frustrating experience for what is supposed to be a professional site.
    Reply
  • Ryan Smith - Saturday, January 5, 2019 - link

    "Could you at least speak to them on ad appropriateness?"

    It's something we discuss on a regular basis. Like any other ad-supported operation we're largely at the whims of the overall advertising market: who is willing to buy ads and at what price. On the whole, advertisers are being very cautious right now, especially with written publications.

    Future's size helps a lot with this, since they're a top publisher and can move some very large deals. Not that it's a dire situation or anything nearly like that, but continual erosion in ad rates makes it difficult to get any ads rolled back.
    Reply

Log in

Don't have an account? Sign up now