It’s been nearly 10 years since Arm had first announced the Armv8 architecture in October 2011, and it’s been a quite eventful decade of computing as the instruction set architecture saw increased adoption through the mobile space to the server space, and now starting to become common in the consumer devices market such as laptops and upcoming desktop machines. Throughout the years, Arm has evolved the ISA with various updates and extensions to the architecture, some important, some maybe glanced over easily.

Today, as part of Arm’s Vision Day event, the company is announcing the first details of the company’s new Armv9 architecture, setting the foundation for what Arm hopes to be the computing platform for the next 300 billion chips in the next decade.

The big question that readers will likely be asking themselves is what exactly differentiates Armv9 to Armv8 to warrant such a large jump in the ISA nomenclature. Truthfully, from a purely ISA standpoint, v9 probably isn’t an as fundamental jump as v8 was over v7, which had introduced a completely different execution mode and instruction set with AArch64, which had larger microarchitectural ramifications over AArch32 such as extended registers, 64-bit virtual address spaces and many more improvements.

Armv9 continues the usage of AArch64 as the baseline instruction set, however adds in a few very important extensions in its capabilities that warrants an increment in the architecture numbering, and probably allows Arm to also achieve a sort of software re-baselining of not only the new v9 features, but also the various v8 extensions we’ve seen released over the years.

The three new main pillars of Armv9 that Arm sees as the main goals of the new architecture are security, AI, and improved vector and DSP capabilities. Security is a very big topic for v9 and we’ll go into the new details of the new extensions and features into more depth in a bit, but getting DSP and AI features out of the way first should be straightforward.

Probably the biggest new feature that is promised with new Armv9 compatible CPUs that will be immediately visible to developers and users is the baselining of SVE2 as a successor to NEON.

Scalable Vector Extensions, or SVE, in its first implementation was announced back in 2016 and implemented for the first time in Fujitsu’s A64FX CPU cores, now powering the world’s #1 supercomputer Fukagu in Japan. The problem with SVE was that this first iteration of the new variable vector length SIMD instruction set was rather limited in scope, and aimed more at HPC workloads, missing many of the more versatile instructions which still were covered by NEON.

SVE2 was announced back in April 2019, and looked to solve this issue by complementing the new scalable SIMD instruction set with the needed instructions to serve more varied DSP-like workloads that currently still use NEON.

The benefit of SVE and SVE2 beyond addition various modern SIMD capabilities is in their variable vector size, ranging from 128b to 2048b, allowing variable 128b granularity of vectors, irrespective of what the actual hardware is running on. Purely from a view of vector processing and programming, it means that a software developer would only ever have to compile his code once, and if in the future a CPU would come out with say native 512b SIMD execution pipelines, the code would be able to already take advantage of the full width of the units. Similarly, the same code would be able to run on more conservative designs with a lower hardware execution width capability, which is important to Arm as they design CPUs from IoT, to mobile, to datacentres. It also does this all whilst remaining within the 32b encoding space of the Arm architecture, whereas alternative implementations such as on x86 have to add on new extensions and instructions depending on vector size.

Machine learning is also seen as an important part of Armv9 as Arm sees more and more ML workloads to become common place in the next years. Running ML workloads on dedicated accelerators naturally will still be a requirement for anything that is performance or power efficiency critical, however there still will be vast new adoption of smaller scope ML workloads that will run on CPUs.

Matrix multiplication instructions are key here and will represent an important step in seeing larger adoption across the ecosystem as being a baseline feature of v9 CPUs.

Generally, I see SVE2 as probably the most important factor that would warrant the jump to a v9 nomenclature as it’s a more definitive ISA feature that differentiates it from v8 CPUs in every-day usage, and that would warrant the software ecosystem to go and actually diverge from the existing v8 stack. That’s actually become quite a problem for Arm in the server space as the software ecosystem is still baselining software packages on v8.0, which unfortunately is missing the all-important v8.1 Large System Extensions.

Having the whole software ecosystem move forward and being able to assume new v9 hardware has the capability of the new architectural extensions would help push things ahead, and probably solve some of the current situation.

However v9 isn’t only about SVE2 and new instructions, it also has a very large focus on security, where we’ll be seeing some more radical changes.

Introducing the Confidential Compute Architecture
POST A COMMENT

74 Comments

View All Comments

  • mdriftmeyer - Thursday, April 1, 2021 - link

    Considering EPYC Genoa is 96 cores /192 threads and will include Xilinx specialty processors for Zen 4 I would have just left that as the comment. Intel's new CEO will ratchet up specialty processing onto future Intel solutions as well. Reply
  • mdriftmeyer - Thursday, April 1, 2021 - link

    Sorry, but that's actually not even remotely close. Just head over to Phoronix and see how bad Milan whips the competition across the board. And yes, Phoronix has a much large process suite of applications than Anandtech. Reply
  • Wilco1 - Friday, April 2, 2021 - link

    Anandtech is one of the few sites that produces accurate benchmark results across different ISAs. SPEC is an industry standard benchmark to compare servers, and I don't see anything like it on Phoronix. Phoronix just runs a bunch of mostly unknown benchmarks without even checking that the results are meaningful across ISAs (they are not in many cases). Quantity does not imply quality. Reply
  • RSAUser - Saturday, April 3, 2021 - link

    Spec is quite flawed, you can go read up on it, it basically only cares about cache and cache latency, it is not an accurate representation of how stuff performs between different architectures.

    It's actually quite difficult to compare between architectures unless you know the specific use case,and Apple has done really well with the interpretation layer and I think dotnet core/5 from MS will also help MS quite a bit with that over the next few years when they start moving a lot of their products to their own architecture.
    Reply
  • Wilco1 - Saturday, April 3, 2021 - link

    SPEC consists of real applications like the GCC compiler. More cache, lower latency memory and higher IPC*frequency give better scores just like any other code. SPEC is not perfect by any means, but it is the best cross-ISA benchmark that exists today.

    What Phoronix does is testing how well code is optimized. If you see x86 being much faster than AArch64 then clearly that code hasn't been optimized for AArch64. SimdJson treated AArch64 as first-class from the start and thus has had similar optimization effort as x86, and you can see that in the results. But that's not the case for many other random projects that are not popular (yet) on AArch64. So Phoronix results are completely useless if you are interested in comparing CPU performance.
    Reply
  • mdriftmeyer - Thursday, April 1, 2021 - link

    Considering EPYC Genoa is 96 cores /192 threads and will include Xilinx specialty processors for Zen 4 I would have just left that as the comment. Intel's new CEO will ratchet up specialty processing onto future Intel solutions as well. Reply
  • Wilco1 - Saturday, April 3, 2021 - link

    Genoa is 2022, Altra Max has 128 cores in 2021. Reply
  • abufrejoval - Wednesday, March 31, 2021 - link

    I just hope they put CCA also in client side SoCs. So far all those 'realm', 'enclave' or VM encryption enhancements have only targeted server-side chips, but I don't think the vendor-favored walled garden approach has much of a future, there is an urgent need for more federation. Reply
  • bobwya - Wednesday, March 31, 2021 - link

    "The benefit of SVE and SVE2 beyond addition various modern SIMD capabilities is in their variable vector size" - que?!! :-) Reply
  • Matthias B V - Thursday, April 1, 2021 - link

    Glad to see. At least with the new arch they finally have to update their small cores. Was so tired of A55... Where only big cores are in focus though in my opinion the small ones are as or even more important.

    SVE 2 is great wonder how Intel and AMD react to this. They should work on similar features and also create a Lean86 getting rid of legacy if they want to defend market share. That and more flexible features like SVE would benefit them a lot.

    I am quite excited what ARM v9.x can do in tablets and Ultrabooks etc.
    Reply

Log in

Don't have an account? Sign up now