SoC Analysis: On x86 vs ARMv8

Before we get to the benchmarks, I want to spend a bit of time talking about the impact of CPU architectures at a middle degree of technical depth. At a high level, there are a number of peripheral issues when it comes to comparing these two SoCs, such as the quality of their fixed-function blocks. But when you look at what consumes the vast majority of the power, it turns out that the CPU is competing with things like the modem/RF front-end and GPU.


x86-64 ISA registers

Probably the easiest place to start when we’re comparing things like Skylake and Twister is the ISA (instruction set architecture). This subject alone is probably worthy of an article, but the short version for those that aren't really familiar with this topic is that an ISA defines how a processor should behave in response to certain instructions, and how these instructions should be encoded. For example, if you were to add two integers together in the EAX and EDX registers, x86-32 dictates that this would be equivalent to 01d0 in hexadecimal. In response to this instruction, the CPU would add whatever value that was in the EDX register to the value in the EAX register and leave the result in the EDX register.


ARMv8 A64 ISA Registers

The fundamental difference between x86 and ARM is that x86 is a relatively complex ISA, while ARM is relatively simple by comparison. One key difference is that ARM dictates that every instruction is a fixed number of bits. In the case of ARMv8-A and ARMv7-A, all instructions are 32-bits long unless you're in thumb mode, which means that all instructions are 16-bit long, but the same sort of trade-offs that come from a fixed length instruction encoding still apply. Thumb-2 is a variable length ISA, so in some sense the same trade-offs apply. It’s important to make a distinction between instruction and data here, because even though AArch64 uses 32-bit instructions the register width is 64 bits, which is what determines things like how much memory can be addressed and the range of values that a single register can hold. By comparison, Intel’s x86 ISA has variable length instructions. In both x86-32 and x86-64/AMD64, each instruction can range anywhere from 8 to 120 bits long depending upon how the instruction is encoded.

At this point, it might be evident that on the implementation side of things, a decoder for x86 instructions is going to be more complex. For a CPU implementing the ARM ISA, because the instructions are of a fixed length the decoder simply reads instructions 2 or 4 bytes at a time. On the other hand, a CPU implementing the x86 ISA would have to determine how many bytes to pull in at a time for an instruction based upon the preceding bytes.


A57 Front-End Decode, Note the lack of uop cache

While it might sound like the x86 ISA is just clearly at a disadvantage here, it’s important to avoid oversimplifying the problem. Although the decoder of an ARM CPU already knows how many bytes it needs to pull in at a time, this inherently means that unless all 2 or 4 bytes of the instruction are used, each instruction contains wasted bits. While it may not seem like a big deal to “waste” a byte here and there, this can actually become a significant bottleneck in how quickly instructions can get from the L1 instruction cache to the front-end instruction decoder of the CPU. The major issue here is that due to RC delay in the metal wire interconnects of a chip, increasing the size of an instruction cache inherently increases the number of cycles that it takes for an instruction to get from the L1 cache to the instruction decoder on the CPU. If a cache doesn’t have the instruction that you need, it could take hundreds of cycles for it to arrive from main memory.


x86 Instruction Encoding

Of course, there are other issues worth considering. For example, in the case of x86, the instructions themselves can be incredibly complex. One of the simplest cases of this is just some cases of the add instruction, where you can have either a source or destination be in memory, although both source and destination cannot be in memory. An example of this might be addq (%rax,%rbx,2), %rdx, which could take 5 CPU cycles to happen in something like Skylake. Of course, pipelining and other tricks can make the throughput of such instructions much higher but that's another topic that can't be properly addressed within the scope of this article.


ARMv3 Instruction Encoding

By comparison, the ARM ISA has no direct equivalent to this instruction. Looking at our example of an add instruction, ARM would require a load instruction before the add instruction. This has two notable implications. The first is that this once again is an advantage for an x86 CPU in terms of instruction density because fewer bits are needed to express a single instruction. The second is that for a “pure” CISC CPU you now have a barrier for a number of performance and power optimizations as any instruction dependent upon the result from the current instruction wouldn’t be able to be pipelined or executed in parallel.

The final issue here is that x86 just has an enormous number of instructions that have to be supported due to backwards compatibility. Part of the reason why x86 became so dominant in the market was that code compiled for the original Intel 8086 would work with any future x86 CPU, but the original 8086 didn’t even have memory protection. As a result, all x86 CPUs made today still have to start in real mode and support the original 16-bit registers and instructions, in addition to 32-bit and 64-bit registers and instructions. Of course, to run a program in 8086 mode is a non-trivial task, but even in the x86-64 ISA it isn't unusual to see instructions that are identical to the x86-32 equivalent. By comparison, ARMv8 is designed such that you can only execute ARMv7 or AArch32 code across exception boundaries, so practically programs are only going to run one type of code or the other.

Back in the 1980s up to the 1990s, this became one of the major reasons why RISC was rapidly becoming dominant as CISC ISAs like x86 ended up creating CPUs that generally used more power and die area for the same performance. However, today ISA is basically irrelevant to the discussion due to a number of factors. The first is that beginning with the Intel Pentium Pro and AMD K5, x86 CPUs were really RISC CPU cores with microcode or some other logic to translate x86 CPU instructions to the internal RISC CPU instructions. The second is that decoding of these instructions has been increasingly optimized around only a few instructions that are commonly used by compilers, which makes the x86 ISA practically less complex than what the standard might suggest. The final change here has been that ARM and other RISC ISAs have gotten increasingly complex as well, as it became necessary to enable instructions that support floating point math, SIMD operations, CPU virtualization, and cryptography. As a result, the RISC/CISC distinction is mostly irrelevant when it comes to discussions of power efficiency and performance as microarchitecture is really the main factor at play now.

SoC Analysis: Apple A9X SoC Analysis: CPU Performance
Comments Locked

408 Comments

View All Comments

  • lilmoe - Friday, January 22, 2016 - link

    ok......
  • Sc0rp - Friday, January 22, 2016 - link

    Well, I have to disagree with you on one thing here. I don't think Apple has any blame here when it comes to software. iOS9 is faaaaaaaar more powerful and capable than Mac OS 8 and 9 that I used to run on my power PC's back in the late 90's. Those computers were certainly productive. There's nothing on a software level that's really stopping developers from making productive software for the iPad Pro or even the Air. There is an interface challenge, much as there was an interface challenge when GUI's first came out. As I recall, people lambasted GUI's and mouses as being toys and not for serious work back then. The endless whining over the iPad Pro is just a reverberation of that. People don't like change and they don't like things that rub against their doctrine. But, consider this... While many adults actually have some difficulty adapting to this new computing paradigm, youngsters adapt to it like a fish to water.

    I think it is a wild boast to call an iPad Pro a 'useless toy'. I certainly have made a ton of use of mine. Of course, I'm an artist so there's that. Not to mention that my iPads have been my primary communication hub for the last five years.
  • Jumangi - Friday, January 22, 2016 - link

    iOS blows as an actual productivity system. It is made for smartphones first(Apple's cash cow) and everything else second. Put a version of Mac OSX on this and you have something. Right now this is an expensive artists toy.
  • strangis - Friday, January 22, 2016 - link

    > While many adults actually have some difficulty adapting to this new computing paradigm, youngsters adapt to it like a fish to water.

    That's why I, as someone of the Commodore Vic 20 era, has to show relatives and clients 25 years younger than me how to use their phones, tablets and computers every week. Regardless of age, some people get it, some don't.

    Similarly, I've never seen the value of an iPad Pro when, as an artist), I need to finish in Photoshop or After Effects. The creative tools available on the iPad Pro are limiting for those of us used to more, and considering its price, better to buy something that will get the job done.
  • Murloc - Saturday, January 23, 2016 - link

    I have no doubt people will only use tablets once they'll be able to interact with the interface with their brains.
  • Relic74 - Saturday, February 27, 2016 - link

    Yea but at least Mac OS had a proper file-system, allowed it's users to select their own default apps, appsdidn't require API's in order to talk to the system, all applications used the same resolution, when a new feature was added to the system every app was able to utilize it immediately and didn't require it's developer to update their apps, the user was ablue to customize their desktop and even the UI, supported widgets, applications were windowed and ran desktop software. Actually, I take it back, Mac OS's UI was a lot more powerful, the system not so much, which is reversed in iOS, the UI isn't very powerful, it's actually pretty vanilla, though it's BSD underpinnings are extremely powerful. If I was able to access the BSD system, I would dump iOS's UI in a heart beat and install a X desktop environment like Gnome 3, which actually works fairly well as a tablet OS. Than maybe the iPad Pro would actually be a Pro device. I'm running Arch Linux on a Xiaomi MiPad 2, love it.
  • NEDM64 - Friday, January 22, 2016 - link

    Dude!

    If you were in the 80's, you'll be advocating text user interfaces instead of graphical user interfaces.

    If you were in the 70's, you'll be advocating separate terminals connected to computers, as opposed to "all-in-ones" or "intelligent terminals" like the Apple II, Commodore PET, TRS-80.

    Opinions like yours, with due respect, don't matter, because people like you, already have their rigs in place, and aren't in the market.

    Apple's market position is for people that want the next thing, not the same ol' thing…
  • RafaelHerschel - Saturday, January 23, 2016 - link

    Apparently the next thing is a larger iPad. I'm going to be bold and predict the next next thing. It's going to be a slightly thinner version of the larger iPad. Awesome.
  • Murloc - Saturday, January 23, 2016 - link

    you aren't understanding tilmoe's posts.

    You can spend millions developing software for a superpowerful tablet.

    You will still never be able to fit Photoshop's whole interface and abundance of options and menus into the tablet in a way that the user is easily able to reach them, without scrolling through pages of big buttons.

    At the end of the day, you'll get a crippled version of photoshop and the user will have to get on a traditional computer (a WORKstation, not because it's more powerful, not because software houses invest more in it, but because it has human interaction devices and a big screen that enable humans to get work done faster) to get stuff done.

    Tablets are mostly content consumption products exactly because of the limited interfaces. They have the advantage of portability and ease of use, you just open apps while on the couch, and that's why they master content consumptions better than say laptops.
  • Constructor - Saturday, January 23, 2016 - link

    It's by now become a quasi-religious belief system for some that "mobile devices cannot ever be used for any professional purposes whatsoever!".

    At the same time more and more people (and businesses!) don't care about such beliefs in the slightest and simple use those devices very much professionally and in many cases with more success and higher productivity than they'd had with conventional computers.

    Part of the reason is that agility and flexibility often beats feature count, all the more so since professional workflows very often just can't afford to even consider most of the myriad theoretical options some desktop programs offer. Heck, most professional uses actually don't need much more than a browser interface anyway!

    Yes, there are some uses for which desktop or mainframe computers will be the only really viable option. But what you and many others didn't seem to have noticed is that those domains have been shrinking rapidly over the last decade(s).

Log in

Don't have an account? Sign up now