Rosetta2: x86-64 Translation Performance

The new Apple Silicon Macs being based on a new ISA means that the hardware isn’t capable of running existing x86-based software that has been developed over the past 15 years. At least, not without help.

Apple’s new Rosetta2 is a new ahead-of-time binary translation system which is able to translate old x86-64 software to AArch64, and then run that code on the new Apple Silicon CPUs.

So, what do you have to do to run Rosetta2 and x86 apps? The answer is pretty much nothing. As long as a given application has a x86-64 code-path with at most SSE4.2 instructions, Rosetta2 and the new macOS Big Sur will take care of everything in the background, without you noticing any difference to a native application beyond its performance.

Actually, Apple’s transparent handling of things are maybe a little too transparent, as currently there’s no way to even tell if an application on the App Store actually supports the new Apple Silicon or not. Hopefully this is something that we’ll see improved in future updates, serving also as an incentive for developers to port their applications to native code. Of course, it’s now possible for developers to target both x86-64 and AArch64 applications via “universal binaries”, essentially just glued together variants of the respective architecture binaries.

We didn’t have time to investigate what software runs well and what doesn’t, I’m sure other publications out there will do a much better job and variety of workloads out there, but I did want to post some more concrete numbers as to how the performance scales across different time of workloads by running SPEC both in native, and in x86-64 binary form through Rosetta2:

SPECint2006 - Rosetta2 vs Native Score %

In SPECint2006, there’s a wide range of performance scaling depending on the workloads, some doing quite well, while other not so much.

The workloads that do best with Rosetta2 primarily look to be those which have a more important memory footprint and interact more with memory, scaling perf even above 90% compared to the native AArch64 binaries.

The workloads that do the worst are execution and compute heavy workloads, with the absolute worst scaling in the L1 resident 456.hmmer test, followed by 464.h264ref.

SPECfp2006(C/C++) - Rosetta2 vs Native Score %

In the fp2006 workloads, things are doing relatively well except for 470.lbm which has a tight instruction loop.

SPECint2017(C/C++) - Rosetta2 vs Native Score %

In the int2017 tests, what stands out is the horrible performance of 502.gcc_r which only showcases 49.87% performance of the native workload – probably due to high code complexity and just overall uncommon code patterns.

SPECfp2017(C/C++) - Rosetta2 vs Native Score %

Finally, in fp2017, it looks like we’re again averaging in the 70-80% performance scale, depending on the workload’s code.

Generally, all of these results should be considered outstanding just given the feat that Apple is achieving here in terms of code translation technology. This is not a lacklustre emulator, but a full-fledged compatibility layer that when combined with the outstanding performance of the Apple M1, allows for very real and usable performance of the existing software application repertoire in Apple’s existing macOS ecosystem.

SPEC2017 - Multi-Core Performance Conclusion & First Impressions
Comments Locked

682 Comments

View All Comments

  • andrewaggb - Tuesday, November 17, 2020 - link

    Pretty much. There's no reason to think the cores will be better on a chip with more of them. The only thing that is a possibility (certainly not a given) is that the clock speed will be substantially higher which should put Apple in the lead. That said, the previous review showed a very modest IPC improvement this time around even with huge reorder buffers and an 8-wide design. So I suspect apple's best course for improved performance is higher clocks but that always runs counter to power usage so we'll see. AMD and Intel will probably have to go wider to compete with Apple for single thread IPC in the long run.

    GPU-wise it's pretty decent for integrated graphics but if you want to play games you shouldn't be running Mac OS or using integrated graphics. It'll be interesting to see if Apple's market share jumps enough to pull in some game development.
  • Eric S - Tuesday, November 17, 2020 - link

    I’m don’t think any of these benchmarks are optimized for TBDR. Memory bound operations could be significantly faster if optimized for the chip. Many render pipelines could run 4X faster. I’m curious to see iOS graphics benchmarks run on this that are more representative. Of course I hope we see apps and games optimized for TBDR as well.
  • Spunjji - Thursday, November 19, 2020 - link

    @andrewaggb - Agreed entirely. The cores themselves aren't going to magically improve, and it's not clear from the meagre scaling between A14 at 5-10W and M1 at 10-25W that they can make them a lot faster with clock speed increases. But a chip with 12 Firestorm cores and 4 Icestorm cores would be an interesting match for the 5900X, and if they beef the GPU up to 12 cores with a 192bit memory interface and/or LPDDR5 then they could have something that's actually pretty solid for the vast majority of workloads.

    I don't think games are going to be moving en-masse from Windows any time soon, but I guess we'll see as time goes on.
  • Stephen_L - Tuesday, November 17, 2020 - link

    I feel very lucky that I didn’t use your mindset when I decided to buy AMD R5-1600X instead of an Intel i5 for my pc.
  • Spunjji - Thursday, November 19, 2020 - link

    @YesYesNo - you responded to a comment about how they *will* be releasing faster chips by talking about how they haven't done so yet. This is known. You're kind of talking past the people you're replying to - nobody's asking you to reconsider how you feel about the M1 based on whatever comes next, but it doesn't make sense to assume this is the absolute best they can do, either.
  • andreltrn - Tuesday, November 17, 2020 - link

    This is not their High-end chip! This a chip for low-end devices such as fan-less laptops. They attacked that market first because this where they will make the most money. High end Pro won't go for a new platform until it is proven and that they are 100% sure that they will be able to port their workflow to it. They are starting with the low-end and follow up with probably a 10 or 12 core chip in the spring for the high-end laptop and the iMac.
  • vlad42 - Tuesday, November 17, 2020 - link

    I just do not see Apple using any but a low power mobile chip for consumer devices.

    Think about it, about half the time we did not see Apple release a tablet optimized A#X chip for the iPad. In their recent earnings reports the combined iPad and Mac revenue is still only half that of the iPhone. By using the same chip for the iPad and all Mac machines, except the Mac Pro, maybe Apple will actually update the soc every year.

    If apple were to provide a higher performing chip for consumer devices, then it would probably be updated only once every few years. Apple just does not make enough money from high end laptops and the iMac to justify dedicated silicon for those products without pulling an Intel and reusing the soc for far too many product cycles. Just look at the Mac Pros. The engineering resources needed to design the most recent x86 Mac Pro is a drop in the bucket compared to designing and taping out a new soc. Despite this, Apple has only been updating the Mac Pro lineup once every 5-7 years!

    The problem, is that by the time they are willing to update those theoretical high end consumer chips, they will have been long since been made obsolete. Who in their right mind would purchase a "high end" laptop or an iMac if it is out performed by an entry level Air or an iPad or was lacking in important features (hardware codec support, the next stupid version of HDCP needed for movies/TV shows, etc.). Even worse for Apple is if their customers by a non-Apple product instead. Much of Apple's current customer base does not actually need a Mac. They would be fine with any decent quality high end laptop or any all-in-one with a screen that is not hot garbage.
  • Eric S - Tuesday, November 17, 2020 - link

    They are working on updates for the high end. I expect they will be amazing. At least two higher end chips are in late design or early production.
  • Eric S - Tuesday, November 17, 2020 - link

    You are probably right in that they may only be updated every few years, but the same can be said of the Xeon which also skips generations.
  • vlad42 - Tuesday, November 17, 2020 - link

    But the Xeon chips are a bad example because Intel shot themselves in the foot through a combination of complacency, tying their next gen products too tightly to the manufacturing process and a shortage of 14nm capacity. We used to get new Xeons if not every year, then at least every time there was an architecture update.

    A better more recent comparison would be with AMD which has always updated the Threadripper lineup. Granted, we technically do not know if the Threadripper Pro lineup will be updated every year, but it very likely will be.

Log in

Don't have an account? Sign up now