Today’s Arm announcement is a bit out of the norm for the company, as it’s the first in a series of staggered releases of information. For this first announcement Arm is publicly unveiling “Project Trillium” – a group of software solutions as well IP for object detection and machine learning.

Machine learning is indeed the hot new topic in the semiconductor business and has particularly seen a large focus in the mobile world over the last couple of months, with announcements from various IP companies as well as consumer solutions from the likes of Huawei. We’ve most recently had a more in-depth look and exploration of the topic of machine learning and neural network processing in a dedicated section of our review of the Kirin 970.

Whilst we had a great amount of noise from many industry players on the topic of machine learning IPs. Arm was conspicuously absent from the news and until now the focus has been on the CPU ISA extensions of Armv8.2, which introduce specialised instructions which simplify and accelerate implementations of neural networks with the help of half-precision floating point and integer dot products.

Alongside the CPU improvements we've also seen GPU improvements for machine learning in the G72. While both of these improvements help, they are insufficient in use-cases where maximum performance and efficiency are required. For example, as we’ve seen in the our test of the Kirin 970’s NPU and Qualcomm’s DSP – the efficiency of running inferencing on specialized IPs is above an order of magnitude higher than running it on a CPU.

As Arm explains it, the Armv8.2 and GPU improvements were only the first results towards establishing solutions for machine learning, while in parallel they’ve examined the need for dedicated solutions. Industry pressure from partners made it clear that the performance and efficiency requirements made dedicated solutions inevitable and started work on its machine learning (ML) processors.

Today’s announcement covers the new ML processors as well as object detection processors (OD). The latter IP is a result of Arm’s Apical acquirement in 2016 which saw the company add solutions for the display and camera pipelines to their IP portfolio.

Starting with the ML processor – what we’re talking about here is a dedicated IP for neural network model inferencing acceleration. As we’ve emphasised in our NN related announcements of late, Arm also emphasises that having an architecture which is specifically designed for such workloads can have significant advantages over traditional CPU and GPU architectures. Arm also made a great focus on the need to design an architecture which is able to do optimised memory management of the data that flows through a processor when executing ML workloads. These workloads have high data reusability and minimising the in- and out-bound data through the processor is a key aspect of reaching high performance and high efficiency.

Arm’s ML processor promises to reach theoretical throughput of over 4.6TOPs (8-bit integer) at target power envelopes of around 1.5W, advertising up to 3TOPs/W. The power and efficiency estimates are based on a 7nm implementation of the IP.

In regards to the performance figures, Arm agrees with me that the TOPs figure alone might not be the best figure to represent performance of an IP; however it’s still useful until the industry can work towards some sort of standardisation for benchmarking on popular neural network models. The ML processor can act as a fully dedicated and standalone IP block with its own ACE-Lite interface for incorporation into a SoC, or it can be integrated within DynamiQ cluster, which is a lot more novel in terms of implementation. Arm wasn’t ready to disclose more architectural information of the processor and reserves that for future announcements.

An aspect that seemed confusing is Arm’s naming of the new IP. Indeed Arm doesn’t see that the term “accelerator” is appropriate here as traditionally accelerators for Arm meant things such as packet handling accelerators in the networking space. Instead Arm sees the new ML processor as a more fully-fledged processor and therefore deserving of that naming.

The OD processor is a more traditional vision processor and is optimised for the task of object detection. There is still a need for such IP as while the ML processor could do the same task via neural networks, the OD processor can do it faster and more efficiently. This showcases just how far the industry is going to make dedicated IP for extremely specialised tasks to be able to extract the maximum amount of efficiency.

Arm envisions use-cases where the OD and ML processors are integrated together, where the OD processor would isolate areas of interest within an image and forward them to the ML processor where more fine-grained processing is executed on. Arm had a slew of fun examples as ideas, but frankly we still don’t know for sure how use-cases in the mobile space will evolve. The same can’t be said about camera and surveillance systems where we see the opportunity and need for continuous use of OD and ML processing.

Arm’s first generation of ML processors is targeted at mobile use while variants for other spaces will follow on in the future. The architecture of the IP is said to be scalable both upwards and downwards from the initial mobile release.

As part of Project Trillium, Arm also makes available a large amount of software that will help developers implement their neural network models into different NN frameworks. These are going to be available starting today on Arm’s developer website as well as Github.

The OD processor is targeted for release to partners in Q1 while the ML processor is said to be ready mid 2018. Again this is highly unusual for Arm as usually public announcements happen far after IP availability to customers. Due to the nature of SoC development we should thus not expect silicon based on the new IP until mid to late 2019 at the earliest, making Arm one of the slow-adopters among the semiconductor IP vendors who offer ML IP.

Related Reading



View All Comments

  • jjj - Tuesday, February 13, 2018 - link

    Long way to go if we are gonna have glasses with virtual keyboards but it's a start.

    Hopefully they announce this now because they have bigger news at MWC. They got to show us the big core aimed at 7nm and ARM must be under pressure to deliver a bigger core (maybe even an entire mm!2 lol). Even in mobile their partners need more since Samsung leaped ahead this year and ofc server needs to start pushing next year.
  • Wardrive86 - Tuesday, February 13, 2018 - link

    While I'm not saying you are wrong, you very well could be right. I'm withholding judgement on Samsung's Mongoose 3 core until we have actual is very wide, but it's proportionally bigger than A75 as Mongoose 2 was to A73 and we know how that went.. sometimes better performance but not always, but always higher energy consumption. Personally I enjoy sustainable performance for my use cases. However I hope Mongoose 3 is everything it is hyped to be, I really do! Reply
  • jjj - Tuesday, February 13, 2018 - link
    Single-Core Score 3717
    Crypto Score 1915
    Integer Score 3735
    Floating Point Score 3440
    Memory Score 4543

    Integer and FP added up are about 2x vs Exynos 8895 so they deliver on their 2x claim, in Geekbench at least.
    If they do it with 1 core turbo at 5W , remains to be seen but they do get there somehow.
    Oh and I was not saying that I favor huge cores, would rather see more focus on efficiency but ARM could use a bigger core in server and some other markets and , from a marketing perspective, in mobile too.
    We also had A73 and A75 both from the Sophia family and that's unusual as ARM tends to alternate Sophia and Austin so the Austin team had an extra year to work on w/e they are working on.
  • Threska - Tuesday, February 13, 2018 - link

    Wouldn't be surprised if information is locked behind NDAs for those who want to use all this wiz-bang technology. Reply
  • Wardrive86 - Tuesday, February 13, 2018 - link

    It's impressive that they can add it inside a DynamiQ cluster, DynamiQ is more advanced than I originally thought. I wonder what else can be modified to be included inside the cluster Reply

Log in

Don't have an account? Sign up now