NVIDIA's Carmel CPU Core - SPEC2006 Rate

Something that we have a tough time testing on other platforms is the multi-threaded performance of CPUs. On mobile devices thermal constraints largely don’t allow these devices to showcase the true microarchitectural scaling characteristics of a platform, as it's not possible to maintain high clockspeeds. NVIDIA’s Jetson AGX dev kit doesn’t have these issues thanks to its oversized cooler, and this makes it a fantastic opportunity to test out Xavier's eight Carmel cores, as all are able to run at peak frequency without worries about thermals (in our test bench scenario).

SPEC2006 Rate uses the same workloads as the Speed benchmarks, with the only difference being that we’re launching more instances (processes) simultaneously.

Starting off with SPECin2006 rate:

Tegra Xavier AGX - SPECint2006 Speed vs Rate Estimate 

The way that SPEC2006 rate scores are calculated is that it measures the execution time of all instances and uses the same reference time scale as the Speed score- with the only difference here being is that the final score is multiplied by the amount of instances.

I chose to run the rate benchmarks with both 4 and 8 instances to showcase the particularity of NVIDIA’s Carmel CPU core configuration. Here we see the four core run perform roughly as expected, however the eight core run scores are a bit less than you’d expect. To showcase this better, let’s compare the scaling factor in relation to the single-core Speed results:

Tegra Xavier AGX - SPECint2006 Speed vs Rate Scaling

The majority of the 4C workloads scaling near the optimal 4x factor that we’d come to expect, with only a few exceptions having a little worse performance scaling figures. The 8C workloads however showcase significantly worse performance scaling factors, and it’s here where things get interesting:

Because Xavier’s CPU complex consists of four CPU clusters, each with a pair of Carmel cores and 2MB L2 cache shared among each pair, we have a scenario where the CPU cores can be resource constrained. In fact, by default what the kernel scheduler on the AGX does is to try to populate one core within all clusters first, before it schedules anything on the secondary cores within a cluster.

In effect what this means that in the 4C rate scenarios, each core within a cluster essentially has exclusive use of the 2MB L2 cache, while on the 8C rate workloads, the L2 cache has to be actively shared among the two cores, resulting in resource contention and worse performance per core.

The only workloads that aren’t nearly as affected by this, are the workloads which are largely execution unit bound and put less stress on the data plane of the CPU complex, this can be seen in the great scaling of some workloads such as 456.hmmer for example. On the opposite end, workloads such as 462.libquantum suffer dramatically under this kind of CPU setup, and the CPU cores are severely bandwidth starved and can’t process things any faster than if there were just one core per cluster active.

Tegra Xavier AGX - SPECfp2006(C/C++) Speed vs Rate EstimateTegra Xavier AGX - SPECfp2006(C/C++) Speed vs Rate Scaling  

The same analysis can be applied to the floating point rate benchmarks: Some of the less memory sensitive workloads don’t have all that much issue in scaling well with core counts, however others such as 433.milc and 470.lbm again showcase quite bad performance figures when run on all cores on the system.

Overall, NVIDIA's design choices for Xavier’s CPU cluster are quite peculiar. It’s possible that NVIDIA either sees a majority of workloads targeted on the AGX to not be an issue in terms of their scaling, or actual use of the platform in automotive use-cases we would see each core in a cluster operating in lock-step with each other, theoretically eliminating possible resource contention issues on the level of the shared L2 cache.

I didn’t get to measure full power figures for all rate benchmarks, but in general the power of an additional core within a separate cluster will scale near linearly with the power figures of the previously discussed single-core Speed runs, meaning the 4-core rate benchmarks showcase active power usage of around ~12-15W. Because performance doesn’t scale linearly with the additional secondary cluster cores, the power increase was also more limited. Here I saw up to a total system power consumption of up to ~31W for workloads such as 456.hmmer (~22W active), while more bottlenecked workloads ran around ~21-25W for the total platform (~12-16W active).

NVIDIA's Carmel CPU Core - SPEC2006 Speed Closing Thoughts
Comments Locked

51 Comments

View All Comments

  • CheapSushi - Friday, January 4, 2019 - link

    This is very minor but I'm surprised the ports/connectors aren't more secure on something meant to be in a car. I would expect cables to be screwed in like classic DVI or twist locked in or some other implementation. I feel like the vibration of the car, or even a minor accident, could loosen the cables. Or maybe I got the wrong impression from the kit.
  • KateH - Friday, January 4, 2019 - link

    afaik the generic breakout boards included in dev kits are just for the "dev" part- development and one-offs. a final design would probably use a custom breakout board with just the interfaces needed and in a more rugged form factor thats integrated into the product.
  • mode_13h - Friday, January 4, 2019 - link

    Would've loved to see a Denver2 (Tegra TX2) in that comparison. According to this, they're actually faster than Carmel:

    https://openbenchmarking.org/result/1809258-RA-180...

    Note that the benchmark results named "TX2-6cores-enabled-gcc-5.4.0" refer to the fact that TX2 had the Denver2 cores disabled by default! Out of the box, it just ran everything on the quad-A57 cluster.
  • edatech - Saturday, January 5, 2019 - link

    Same results also says TX2 is running with higher frequency (TX2 @ 2.04GHz while Jetson Xavier @ 1.19GHz), so not quite an apple to apple comparison.
  • mode_13h - Saturday, January 5, 2019 - link

    I'm not sure how much to read into that number. Would they really run the A57 and Denver2 cores at the same frequency? Is the Xavier figure really the boost, and not just the base clock?

    There's also this (newer) result:

    https://openbenchmarking.org/result/1812170-SK-180...

    Again, my point is that I wish the article had looked at Denver2. It sounds like an interesting, if mysterious core.

    Jetson TX2 boards are still available - and at much lower prices than Xavier. So, it's still a worthwhile and relevant question how it compares - especially for those not needing Xavier's Volta and Tensor cores.
  • LinuxDevice - Monday, January 7, 2019 - link

    It isn't so much that the cores are "disabled" (which to me would be something not intended to be turned on) as it is offering multiple power consumption profiles. The whole Jetson market started with the intent to offer it as an OEM reference board, but the reference boards were rather good all by themselves and ended up being a new market. The TX2 Denver cores are simple to turn off or on...but default is off.

    Xavier has something similar with the "nvpmodel" tool for switching around various profiles. To see full performance you need to first run "sudo nvpmodel -m 0", and the max out the clocks with the "~nvidia/jetson_clocks.sh" script.
  • SanX - Saturday, January 5, 2019 - link

    Change the publisher asap. The most stupid and insulting ads you will find only at AT. Smells dirt and cheap. Yuck...

    I don't have such bad impression from YouTube for example, talk to Google guys.
  • TheJian - Sunday, January 6, 2019 - link

    Double the gpu side at 7nm and throw it in an 100-250w box the size of an xbox/ps and I'm in for a new game console. Was hoping they'd re-enter mobile space with Intel/Qcom/samsung modem at 10 or 7nm since they can be included easily without the same watt issues before. NV doesn't need their own modem today (please come back, mobile gaming is getting great!). We need NV gpus in mobile :)

    Also, I refuse to buy old tech in your android tv system. Upgrade the soc, or no sale. COMPETE with msft/sony dang it! It's already a great streamer, but you need the gaming side UP and it needs to be a 150w+ box today or just another streamer (sonly msft are going 250w+ in their next versions probably) or why not just buy a $35-50 roku? Sure you can turn off most of it while streaming (or playing bluray), but power needs to be there for the gaming side. The soc is the only thing holding me back from AndroidTV box from NV for years now. I wanted 2 socs in it when it first launched, then they shrunk it and gave no more power. You're turning me off NV, you should be turning me ON...LOL. I have no desire for another msft/sony console, but I'd buy a HIGH WATT android model. None of this 15-25w crap is worth it. Roku take note too, as in add a gaming soc (call NV!) and gamepad support or no more sales to anyone in our family (we're going HTPC, because streamers suck as anything but streaming). We need multi-function at this point or you don't make it to our living room. HTPC fits everything I guess (thus we're building 3...LOL). Streaming, gaming, ripping, well, heck, EVERYTHING in one box with mass storage inside too. ShieldTV units will sell a LOT better (roku too) if you get better gaming in them. Angry birds alone doesn't count Roku!

    A 7nm Tegra without all the crap for cars, etc, would be VERY potent. You have the money to make a great gaming box today. Move it into mobile (a single soc one of course) if the tech takes off by adding a modem. Either way, ShieldTV needs an soc upgrade ASAP. Not looking for RTX type stuff here, just a great general android gaming machine that streams. You have to start here to make a gaming PC on ARM stuff at some point. Use cheap machines to make the bigger ones once entrenched. Make sure it can take a discrete NV card at some point as an upgrade (see what I did there, selling more gpu cards, with no wintel needed). At some point it turns into a full PC :)

    That said, I can’t wait for my first car that will drive me around while drinking ;) Designated drivers for all  Oh and, our tests are completely invalidated by testing a 12nm vs. 10 & 7nm (and outputting with Ethernet hooked up), but but but….Look at our dumb benchmarks. Note also, cars want a MUCH longer cycle than pc’s or worse, mobile devices. These people don’t upgrade their soc yearly (more like 5-7 tops). So a box you plop in with most of the software done, is great for many car models. We are talking ~81-90mil sold yearly globally (depending on who you believe). Even 10mil of those at $100 a box would be a great add to your bottom line and I’m guessing they get far more than that, but you have to make a point at some price here ;) We are talking 1B even if it’s just $100 Net INCOME per box. That would move NV’s stock price for sure. Something tells me it’s 30%+ margins (I’d guess 50%+ really), but I could be wrong. Has anyone else done this job for less than $1500? Also note, as more countries raise incomes, more cars will be sold yearly.
    https://www.statista.com/statistics/200002/interna...
    Just as you see here, and the world still needs more cars (heck roads in some places still needed…LOL). Growth. There is room for more than one player clearly for years. Until L5 becomes a commodity there is good money to be had by multiple companies in this space IMHO. Oh and 35mil of those are cars are EU/USA (17.5ea for both). Again, much growth to come as more places get roads/cars, and how many of them have driverless so far? Not many.

    At $1500 or under anyone can add this on to a car, as that is cheaper than the $7500 subsidy they have to add to an electric car just to even JOKE about making a dime on them right? And this would NOT be a subsidy. Electric cars are for the rich or stupid. I don’t remember voting for $7500 per car giveaways to make green people happy either! Please KILL THIS ASAP TRUMP! That is 1.5B per car maker (200K cars can be subsidized by each maker). I want a freaking WALL NOW not renewable subsidy crap for products that can’t make money on their own and I am UN-interested in completely as long as gas is available cheaper overall! Screw 5B, tell them $25B or the govt shuts down completely (still a joke, most stays open anyway) for your next 2yrs. Let them pound sand in discretionary spending. :) Only NON-ESSENTIAL people even go home. Well heck, why do I need a NON-essential employee anyway in govt? Let private sector take on all their crap, or just leave it state to state, where they are much better able to handle problems they are versed in.

    “The one aspect which we can’t quantize NVIDIA’s Carmel cores is its features: This is a shipping CPU with ASIL-C functional safety features that we have in our hands today. The only competition in this regard would be Arm’s new Cortex A76AE, which we won’t see in silicon for at least another year or more.”
    “the Carmel cores don’t position themselves too well.”

    Er, uh, would you be saying that at 7nm vs. 7nm?? I’m guessing NV could amp the speeds a bit if they simply took the EXACT core and 7nm’d it right (a new verb?)? Can’t see a way forward? Nobody will have its safety features for a year in the segment it targets DIRECTLY, but you can’t see a way forward?...LOL. Never pass up a chance for an AMD portal site to knock NV. Pause for a sec, while I test it with my 2006 tests that well, aren’t even the target market…Jeez. Possibly make sense to go IN-HOUSE? So you’re saying on the one hand that there was NO OTHER CHOICE for a YEAR, but it’s only POSSIBLY a good idea they went in house? I think you mean, it was ONLY POSSIBLE to go in-house, and thus a BRILLIANT decision to go IN HOUSE, and I can see how this chip really goes FORWARD. There, fixed it. Intel keeps offering GPU designs, and they keep failing correct (adopting AMD tech even)? You go in house until someone beats you at your own game, just ask apple. No reason to give a middle man money unless he is soundly beating you, or you are not making profit as is.

    So it’s really good at what it was designed to do, and is a plop in component for cars for ~$1000-1500 with software done pretty much for most? But NV has challenges going forward making money on it…LOL. Last I checked NV has most of the car market sewn up (er, signed up? Pays to be early in many things). Cars are kind of like Cuda. It took ~7yrs before that really took off, but look at it now. Owning everything else, and OpenCL isn’t even on the playing field as AMD can’t afford to FORCE it onto the field alone.

    “But for companies looking to setup more complex systems requiring heavy vision processing, or actually deploying the AGX module in autonomous applications (no spellchecker before hitting the website?) for robotics or industrial uses, then Xavier looks quite interesting and is definitely a more approachable and open platform than what tends to exist from competing products.”

    Translation: When you use it as it was designed, nobody has a competing offering…LOL. You could have just put the last P as the whole article and forgot the rest. Pencils work great as a writing tool, but when we try to run games on them, well, they kind of suck. I’m shocked. Pencils can’t run crysis? WTH?? I want my money back…LOL. Don’t the rest of the guys have the challenge, of trying to be MORE OPEN and APPROACHABLE? Your article is backwards. You have to dethrone the king, not the other way around. Where will NV be in a year when the competition finally gets something right? How entrenched will they be by then? Cars won’t switch on a dime like I will for my next vid card/cpu…LOL. They started this affair mid 2015 or so, and it will pay off 2021+ as everyone wants a autonomous cars on the road by then.

    https://www.thestreet.com/investing/stocks/nvidia-...
    https://finance.yahoo.com/news/nvidia-soars-ai-mar...
    “we believe that the company is well poised to grow in the driverless vehicle technology space”
    Arm makes under 500m (under 400 actually), NV makes how much (9x-10x this?)? Good luck. I do not believe off the shelf will beat a chip designed for auto, so someone will have to CUSTOM their way to victory over NV here IMHO.
    https://www.forbes.com/sites/moorinsights/2018/09/...
    BMW chose Intel, Tesla switches (and crashes, ½ a million sold so far?? Who cares), but I wonder for how long. I guess it depends on how much work they both want to do, or just plop in Nvidia solutions. I’ll also venture to guess Tesla did it merely to NOT be the same as Volvo, Toyota etc who went with NV. Can’t really claim your different using what everyone else uses. MOOR Insights isn’t wrong much. They have covered L2-L4 and even have built the chip to handle L5 (2 socs in Pegasus). How much further forward do you need to go? It seems they’re set for a bit, though I’m sure they won’t sit idle while everyone else catches up (they don’t have a history of that). TL:DR? It's a sunday morning, I had time and can type 60wpm...LOL.
  • gteichrow - Sunday, January 6, 2019 - link

    FWIW and I know this has been discussed internally at your fine operation (no sarc): But pay option? I'd pay $1-$2/mo to be ad-free. I fully realize it's a PITA to manage that model. I already do this on Medium (barely, barely, barely worth it) and Patreon for others. The time is right, me thinks. Let's pick the Winners from the Losers and be done with it. You folks are in the winning camp, IMO.
    It almost goes without saying, but, you'all do a great job and thanks for all the work you folks do!
  • gteichrow - Sunday, January 6, 2019 - link

    Sorry, but meant this to be in the comments below under the discussion about ads that had started. Oops. But thoughts still apply. Cheers.

Log in

Don't have an account? Sign up now