NVIDIA's Carmel CPU Core - SPEC2006 Speed

While the Xavier’s vision and machine processing capabilities are definitely interesting, it’s use-cases will be largely out-of-scope for the average AnandTech reader. One of the aspects of the chip that I was personally more interested in was NVIDIA’s newest generation Carmel CPU cores, as it represents one of the rare custom Arm CPU efforts in the industry.

Memory Latency

Before going into the SPEC2006 results, I wanted to see how NVIDIA’s memory subsystem compares against some comparable platform in the Arm space.

In the first logarithmic latency graph, we see the exaggerated latency curves which make it easy to determine the various cache hierarchy levels of the systems. As NVIDIA advertises, we see the 64KB L1D cache of the Carmel cores. What is interesting here is that NVIDIA is able to achieve quite a high performance L1 implementation with just under 1ns access times, representing a 2-cycle access which is quite uncommon. The second hierarchy cache is the L2 that continues on to the 2MB depth, after which we see the 4MB L3 cache. The L3 cache here looks to be of a non-uniform-access design as its latency steadily rises the further we go.

Switching back to a linear graph, NVIDIA does have a latency advantage over Arm’s Cortex-A76 and the DSU L3 of the Kirin 980, however it loses out at deeper test depths and latencies at the memory controller level. The Xavier SoC comes with 8x 32bit (256bit total) LPDDR4X memory controller channels, representing a peak bandwidth of 137GB/s, significantly higher than the 64 or 128bit interfaces on the Kirin 980 or the Apple A12X. Apple overall still has an enormous memory latency advantage over the competition as its massive 8MB L2 cache as well as the 8MB SLC (System level cache) allows for significant lower latencies across all test depths.

SPEC2006 Speed Results

A rarity for whenever we're looking at Arm SoCs and products built around them, NVIDIA’s Jetson AGX comes with a custom image for Ubuntu Linux (18.04 LTS). On one hand, including a Linux OS gives us a lot of flexibility in terms of test platform tools; but on the other hand, it also shows the relatively immaturity of Arm on Linux. One of the more regretful aspects of Arm on Linux is browser performance; to date the available browsers are still lacking optimised Javascript JIT engines, resulting in performance that is far worse than any commodity mobile device.

While we can’t really test our usual web workloads, we do have the flexibility of Linux to just simply compile whatever we want. In this case we’re continuing our use of SPEC2006 as we have a relatively established set of figures on all relevant competing ARMv8 cores.

To best mimic the setup of the iOS and Android harnesses, we chose the Clang 8.0.0 compiler. To keep things simple, we didn’t use any special flags other than –Ofast and a scheduling model targeting the Cortex-A53 (It performed overall better than no model or A57 targets). We also have to remind readers that SPEC2006 has been retired in favour of SPEC2017, and that the results published here are not officially submitted scores, rather internal figures that we have to describe as estimates.

The power efficiency figures presented for the AGX, much like all other mobile platforms, represent the active workload power usage of the system. This means we’re measuring the total system power under a workload, and subtracting the idle power of the system under similar circumstances. The Jetson AGX has a relatively high idle power consumption of 8.92W in this scenario, much that can be simply be attributed from a relatively non-power optimised board as well as the fact that we’re actively outputting via HDMI while having the board connected to GbE.

In the integer workloads, the Carmel CPU cores' performance is quite average. Overall, the performance across most workloads is extremely similar to that of Arm’s Cortex-A75 inside the Snapdragon 845, with the only outlier being 462.libquantum which showcases larger gains due to Xavier’s increased memory bandwidth.

In terms of power and efficiency, the NVIDIA Carmel cores again aren’t quite the best. The fact that the Xavier module is targeted at a totally different industry means that its power delivery possibly isn’t quite as power optimised as on a mobile device. We also must not forget that the Xavier has an inherent technical disadvantage of being manufactured on a 12FFN TSMC process node, which should be lagging behind Samsung’s 10LPP processes of the Exynos 9810 and the Snapdragon 845, and most certainly represents a major disadvantage against the newer 7nm Kirin 980 and Apple A12.

On the floating point benchmarks, Xavier fares overall better because some of the benchmarks are characterised by their sensitivity to the memory subsystem; in 433.milc this is most obvious. 470.lbm also sees the Carmel cores perform relatively well. In the other workloads however, again we see Xavier having trouble to differentiate itself much from the performance of a Cortex A75.

Here’s a wider performance comparison across SPEC2006 workloads among the most recent and important ARMv8 CPU microarchitectures:

Overall, NVIDIA’s Carmel core seems like a big step up for NVIDIA and their in-house microarchitecture. However when compared against most recent cores from the competition, we see the new core having trouble able to really distinguish itself in terms of performance. Power efficiency of the AGX also lags behind, however this is something that was to be expected given the fact that the Jetson AGX is not a power optimised platform, beyond the fact that the chip’s 12FFN manufacturing process is a generation or two behind the latest mobile chips.

The one aspect which we can’t quantize NVIDIA’s Carmel cores is its features: This is a shipping CPU with ASIL-C functional safety features that we have in our hands today. The only competition in this regard would be Arm’s new Cortex A76AE, which we won’t see in silicon for at least another year or more. When taking this into account, it could possibly make sense for NVIDIA to have gone with its in-house designs, however as Arm starts to offer more designs for this space I’m having a bit of a hard time seeing a path forward in following generations after Xavier, as competitively, the Carmel cores don’t position themselves too well.

Machine Inference Performance & What's it For? NVIDIA's Carmel CPU Core - SPEC2006 Rate
Comments Locked

51 Comments

View All Comments

  • CheapSushi - Friday, January 4, 2019 - link

    This is very minor but I'm surprised the ports/connectors aren't more secure on something meant to be in a car. I would expect cables to be screwed in like classic DVI or twist locked in or some other implementation. I feel like the vibration of the car, or even a minor accident, could loosen the cables. Or maybe I got the wrong impression from the kit.
  • KateH - Friday, January 4, 2019 - link

    afaik the generic breakout boards included in dev kits are just for the "dev" part- development and one-offs. a final design would probably use a custom breakout board with just the interfaces needed and in a more rugged form factor thats integrated into the product.
  • mode_13h - Friday, January 4, 2019 - link

    Would've loved to see a Denver2 (Tegra TX2) in that comparison. According to this, they're actually faster than Carmel:

    https://openbenchmarking.org/result/1809258-RA-180...

    Note that the benchmark results named "TX2-6cores-enabled-gcc-5.4.0" refer to the fact that TX2 had the Denver2 cores disabled by default! Out of the box, it just ran everything on the quad-A57 cluster.
  • edatech - Saturday, January 5, 2019 - link

    Same results also says TX2 is running with higher frequency (TX2 @ 2.04GHz while Jetson Xavier @ 1.19GHz), so not quite an apple to apple comparison.
  • mode_13h - Saturday, January 5, 2019 - link

    I'm not sure how much to read into that number. Would they really run the A57 and Denver2 cores at the same frequency? Is the Xavier figure really the boost, and not just the base clock?

    There's also this (newer) result:

    https://openbenchmarking.org/result/1812170-SK-180...

    Again, my point is that I wish the article had looked at Denver2. It sounds like an interesting, if mysterious core.

    Jetson TX2 boards are still available - and at much lower prices than Xavier. So, it's still a worthwhile and relevant question how it compares - especially for those not needing Xavier's Volta and Tensor cores.
  • LinuxDevice - Monday, January 7, 2019 - link

    It isn't so much that the cores are "disabled" (which to me would be something not intended to be turned on) as it is offering multiple power consumption profiles. The whole Jetson market started with the intent to offer it as an OEM reference board, but the reference boards were rather good all by themselves and ended up being a new market. The TX2 Denver cores are simple to turn off or on...but default is off.

    Xavier has something similar with the "nvpmodel" tool for switching around various profiles. To see full performance you need to first run "sudo nvpmodel -m 0", and the max out the clocks with the "~nvidia/jetson_clocks.sh" script.
  • SanX - Saturday, January 5, 2019 - link

    Change the publisher asap. The most stupid and insulting ads you will find only at AT. Smells dirt and cheap. Yuck...

    I don't have such bad impression from YouTube for example, talk to Google guys.
  • TheJian - Sunday, January 6, 2019 - link

    Double the gpu side at 7nm and throw it in an 100-250w box the size of an xbox/ps and I'm in for a new game console. Was hoping they'd re-enter mobile space with Intel/Qcom/samsung modem at 10 or 7nm since they can be included easily without the same watt issues before. NV doesn't need their own modem today (please come back, mobile gaming is getting great!). We need NV gpus in mobile :)

    Also, I refuse to buy old tech in your android tv system. Upgrade the soc, or no sale. COMPETE with msft/sony dang it! It's already a great streamer, but you need the gaming side UP and it needs to be a 150w+ box today or just another streamer (sonly msft are going 250w+ in their next versions probably) or why not just buy a $35-50 roku? Sure you can turn off most of it while streaming (or playing bluray), but power needs to be there for the gaming side. The soc is the only thing holding me back from AndroidTV box from NV for years now. I wanted 2 socs in it when it first launched, then they shrunk it and gave no more power. You're turning me off NV, you should be turning me ON...LOL. I have no desire for another msft/sony console, but I'd buy a HIGH WATT android model. None of this 15-25w crap is worth it. Roku take note too, as in add a gaming soc (call NV!) and gamepad support or no more sales to anyone in our family (we're going HTPC, because streamers suck as anything but streaming). We need multi-function at this point or you don't make it to our living room. HTPC fits everything I guess (thus we're building 3...LOL). Streaming, gaming, ripping, well, heck, EVERYTHING in one box with mass storage inside too. ShieldTV units will sell a LOT better (roku too) if you get better gaming in them. Angry birds alone doesn't count Roku!

    A 7nm Tegra without all the crap for cars, etc, would be VERY potent. You have the money to make a great gaming box today. Move it into mobile (a single soc one of course) if the tech takes off by adding a modem. Either way, ShieldTV needs an soc upgrade ASAP. Not looking for RTX type stuff here, just a great general android gaming machine that streams. You have to start here to make a gaming PC on ARM stuff at some point. Use cheap machines to make the bigger ones once entrenched. Make sure it can take a discrete NV card at some point as an upgrade (see what I did there, selling more gpu cards, with no wintel needed). At some point it turns into a full PC :)

    That said, I can’t wait for my first car that will drive me around while drinking ;) Designated drivers for all  Oh and, our tests are completely invalidated by testing a 12nm vs. 10 & 7nm (and outputting with Ethernet hooked up), but but but….Look at our dumb benchmarks. Note also, cars want a MUCH longer cycle than pc’s or worse, mobile devices. These people don’t upgrade their soc yearly (more like 5-7 tops). So a box you plop in with most of the software done, is great for many car models. We are talking ~81-90mil sold yearly globally (depending on who you believe). Even 10mil of those at $100 a box would be a great add to your bottom line and I’m guessing they get far more than that, but you have to make a point at some price here ;) We are talking 1B even if it’s just $100 Net INCOME per box. That would move NV’s stock price for sure. Something tells me it’s 30%+ margins (I’d guess 50%+ really), but I could be wrong. Has anyone else done this job for less than $1500? Also note, as more countries raise incomes, more cars will be sold yearly.
    https://www.statista.com/statistics/200002/interna...
    Just as you see here, and the world still needs more cars (heck roads in some places still needed…LOL). Growth. There is room for more than one player clearly for years. Until L5 becomes a commodity there is good money to be had by multiple companies in this space IMHO. Oh and 35mil of those are cars are EU/USA (17.5ea for both). Again, much growth to come as more places get roads/cars, and how many of them have driverless so far? Not many.

    At $1500 or under anyone can add this on to a car, as that is cheaper than the $7500 subsidy they have to add to an electric car just to even JOKE about making a dime on them right? And this would NOT be a subsidy. Electric cars are for the rich or stupid. I don’t remember voting for $7500 per car giveaways to make green people happy either! Please KILL THIS ASAP TRUMP! That is 1.5B per car maker (200K cars can be subsidized by each maker). I want a freaking WALL NOW not renewable subsidy crap for products that can’t make money on their own and I am UN-interested in completely as long as gas is available cheaper overall! Screw 5B, tell them $25B or the govt shuts down completely (still a joke, most stays open anyway) for your next 2yrs. Let them pound sand in discretionary spending. :) Only NON-ESSENTIAL people even go home. Well heck, why do I need a NON-essential employee anyway in govt? Let private sector take on all their crap, or just leave it state to state, where they are much better able to handle problems they are versed in.

    “The one aspect which we can’t quantize NVIDIA’s Carmel cores is its features: This is a shipping CPU with ASIL-C functional safety features that we have in our hands today. The only competition in this regard would be Arm’s new Cortex A76AE, which we won’t see in silicon for at least another year or more.”
    “the Carmel cores don’t position themselves too well.”

    Er, uh, would you be saying that at 7nm vs. 7nm?? I’m guessing NV could amp the speeds a bit if they simply took the EXACT core and 7nm’d it right (a new verb?)? Can’t see a way forward? Nobody will have its safety features for a year in the segment it targets DIRECTLY, but you can’t see a way forward?...LOL. Never pass up a chance for an AMD portal site to knock NV. Pause for a sec, while I test it with my 2006 tests that well, aren’t even the target market…Jeez. Possibly make sense to go IN-HOUSE? So you’re saying on the one hand that there was NO OTHER CHOICE for a YEAR, but it’s only POSSIBLY a good idea they went in house? I think you mean, it was ONLY POSSIBLE to go in-house, and thus a BRILLIANT decision to go IN HOUSE, and I can see how this chip really goes FORWARD. There, fixed it. Intel keeps offering GPU designs, and they keep failing correct (adopting AMD tech even)? You go in house until someone beats you at your own game, just ask apple. No reason to give a middle man money unless he is soundly beating you, or you are not making profit as is.

    So it’s really good at what it was designed to do, and is a plop in component for cars for ~$1000-1500 with software done pretty much for most? But NV has challenges going forward making money on it…LOL. Last I checked NV has most of the car market sewn up (er, signed up? Pays to be early in many things). Cars are kind of like Cuda. It took ~7yrs before that really took off, but look at it now. Owning everything else, and OpenCL isn’t even on the playing field as AMD can’t afford to FORCE it onto the field alone.

    “But for companies looking to setup more complex systems requiring heavy vision processing, or actually deploying the AGX module in autonomous applications (no spellchecker before hitting the website?) for robotics or industrial uses, then Xavier looks quite interesting and is definitely a more approachable and open platform than what tends to exist from competing products.”

    Translation: When you use it as it was designed, nobody has a competing offering…LOL. You could have just put the last P as the whole article and forgot the rest. Pencils work great as a writing tool, but when we try to run games on them, well, they kind of suck. I’m shocked. Pencils can’t run crysis? WTH?? I want my money back…LOL. Don’t the rest of the guys have the challenge, of trying to be MORE OPEN and APPROACHABLE? Your article is backwards. You have to dethrone the king, not the other way around. Where will NV be in a year when the competition finally gets something right? How entrenched will they be by then? Cars won’t switch on a dime like I will for my next vid card/cpu…LOL. They started this affair mid 2015 or so, and it will pay off 2021+ as everyone wants a autonomous cars on the road by then.

    https://www.thestreet.com/investing/stocks/nvidia-...
    https://finance.yahoo.com/news/nvidia-soars-ai-mar...
    “we believe that the company is well poised to grow in the driverless vehicle technology space”
    Arm makes under 500m (under 400 actually), NV makes how much (9x-10x this?)? Good luck. I do not believe off the shelf will beat a chip designed for auto, so someone will have to CUSTOM their way to victory over NV here IMHO.
    https://www.forbes.com/sites/moorinsights/2018/09/...
    BMW chose Intel, Tesla switches (and crashes, ½ a million sold so far?? Who cares), but I wonder for how long. I guess it depends on how much work they both want to do, or just plop in Nvidia solutions. I’ll also venture to guess Tesla did it merely to NOT be the same as Volvo, Toyota etc who went with NV. Can’t really claim your different using what everyone else uses. MOOR Insights isn’t wrong much. They have covered L2-L4 and even have built the chip to handle L5 (2 socs in Pegasus). How much further forward do you need to go? It seems they’re set for a bit, though I’m sure they won’t sit idle while everyone else catches up (they don’t have a history of that). TL:DR? It's a sunday morning, I had time and can type 60wpm...LOL.
  • gteichrow - Sunday, January 6, 2019 - link

    FWIW and I know this has been discussed internally at your fine operation (no sarc): But pay option? I'd pay $1-$2/mo to be ad-free. I fully realize it's a PITA to manage that model. I already do this on Medium (barely, barely, barely worth it) and Patreon for others. The time is right, me thinks. Let's pick the Winners from the Losers and be done with it. You folks are in the winning camp, IMO.
    It almost goes without saying, but, you'all do a great job and thanks for all the work you folks do!
  • gteichrow - Sunday, January 6, 2019 - link

    Sorry, but meant this to be in the comments below under the discussion about ads that had started. Oops. But thoughts still apply. Cheers.

Log in

Don't have an account? Sign up now