ARM moves at an aggressive pace, pushing out new processor IP on a yearly cadence. It needs to move fast partly because it has so many partners across so many industries to keep happy and partly because it needs to keep up with the technology its IP comes into contact with, everything from new process nodes to higher quality displays to artificial intelligence. To keep pace, ARM keeps multiple design teams in several different locations all working in parallel.

At its annual TechDay event last year, held at one such facility in Austin, Texas, ARM introduced the Mali-G71 GPU—the first to use its new Bifrost GPU architecture—and the Cortex-A73 CPU—a new big core to replace the A72 in mobile. Notably absent, however, was a new little core.

Another year, another TechDay, and another ARM facility (this time in Cambridge, UK)—can only mean new ARM IP. Over the span of several days, we got an in-depth look at its latest technologies, including DynamIQ, the Mali-G72 GPU, the Cortex-A75, and (yes, finally) the successor to the A53: Cortex-A55.

The A53 was announced alongside the A57 and has been in use for several years, both on its own or as the little core in a big.LITTLE configuration. It’s been hugely successful, with more than 40 licensees and 1.7 billion units shipped in just 3 years. But during this time ARM introduced new big cores on a yearly cadence, moving from A57 to A72 to A73. The A53 remained unchanged, however, even as the performance gap between the big and little cores continued to grow.

Predictably then, the focus for A55 was on improving performance. The A53’s dual-issue, in-order core, which serves as the starting point for A55, already delivers good throughput, so ARM focused on improving the memory system. A new data prefetcher, an integrated L2 cache that reduces latency by 50%, and an extra level of L3 cache (among other changes) give the A55 significantly better memory performance—quantified by a nearly 2x improvement in the LMBench memory copy test. The numbers provided by ARM also show an 18% performance gain in SPECint 2006 and an even bigger 38% gain in SPECfp 2006 relative to the A53. These numbers, as well as the others shown in the chart, comparing the A55 and A53 are at the same frequency, same L1/L2 cache sizes, same compiler, etc. and are meant to be a fair comparison. The actual gains should actually be a little higher, because partner SoCs will benefit from adding the L3 cache, which these numbers do not include.

The additional performance does not come for free, however. Power consumption is up 3% relative to the A53 (iso-process, iso-frequency), but power efficiency still improves by 15% when running SPECint 2000 because of its higher performance.

The A55 includes several new features too that will help it expand into new markets. Virtual Host Extensions (VHE) are very important for the automotive market and the advanced safety and reliability features, including architectural RAS support and ECC/parity for all levels of cache are critical for many applications, including automotive and industrial. There’s new features for infrastructure applications too, including a new Int8 dot product instruction (useful for accelerating neural networks). Because A55 is compatible with DynamIQ, it also gets cache stashing and access to a 256-bit AMBA 5 CHI port.

When ARM announced the A73 last year, it talked a lot about improving sustained performance and working within a tight thermal envelope. In other words, the A73 was all about improving power efficiency. The A75 goes in a different direction: Taking advantage of the A73’s thermal headroom, ARM focused on improving performance while maintaining the same efficiency as the A73.

Our previous performance testing revealed mixed results when comparing the A73 to the A72—not too surprising given the significant differences in microarchitecture—with the A73 generally outpacing the A72 by a small margin for integer tasks but falling behind the older CPU in floating point workloads. Things look better for the A75, at least based on ARM’s numbers, which show noticeable gains over the A73 in both integer and floating-point workloads as well as memory streaming.

The graph above shows that the A75 operating at 3GHz on a 10nm node achieves better performance and the same efficiency as an A73 operating at 2.8GHz on a 10nm node, which means the A75 consumes more power. How much more is difficult to tell based on this one simple graph. We know that the A73 is thermally limited when using 4 cores (albeit less so than the A72), so the A75 definitely will be as well. This is not a common scenario, however. Most mobile workloads only fire up 1-2 cores at a time and usually only in short bursts. ARM obviously felt comfortable enough using the A73’s extra thermal headroom to boost performance without negatively impacting sustained performance.

ARM wants to push the A75 into larger form-factor devices with power budgets beyond mobile’s 750mW/core too by pushing frequency higher. Something like a Chromebook or a 2-in-1 ultraportable come to mind. At 1W/core the A75 delivers 25% higher performance than the A73 and at 2W/core the A75’s advantage bumps up to 30% when running SPECint 2006. If anything, these numbers highlight why it’s not a good idea to push performance with frequency alone, as dynamic power scales exponentially.

ARM targeted the A73 specifically at mobile by focusing on power efficiency and removing some features useful for other applications to simplify the design, including no ECC on the L1 cache and no option for a 256-bit AMBA 5 CHI port. With A75, there’s now a clear upgrade path from A72. For the server and infrastructure markets, A75 supports ECC/parity for all levels of cache and AMBA 5 CHI for connecting to larger CCI, CCN, or CMN fabrics, and for automotive and other safety critical applications there’s architectural RAS support, protection against data poisoning, and improved error management.

On the next few pages, we’ll dive deeper into the technical details and features of ARM’s new IP, including DynamIQ (the next iteration of big.LITTLE), Cortex-A75, and Cortex-A55.

DynamIQ
Comments Locked

104 Comments

View All Comments

  • Matt Humrick - Wednesday, May 31, 2017 - link

    The L1/L2 cache sizes for A53/A55 are stated in the article.
  • Great_Scott - Tuesday, May 30, 2017 - link

    Fantastic article, Matt. Best CPU tech article I've read in years, and I read most of them.
  • Alexvrb - Tuesday, May 30, 2017 - link

    "ARM wants to push the A75 into larger form-factor devices with power budgets beyond mobile’s 750mW/core too by pushing frequency higher. Something like a Chromebook or a 2-in-1 ultraportable come to mind. At 1W/core the A75 delivers 25% higher performance than the A73 and at 2W/core the A75’s advantage bumps up to 30% when running SPECint 2006. If anything, these numbers highlight why it’s not a good idea to push performance with frequency alone, as dynamic power scales exponentially."

    Perhaps, but it gives it a lot more headroom for use in things like tablets... and laptops. I'm thinking Windows on ARM could use an even faster SoC than the SD 835, and 2W is perfect. Right in Atom ULP territory, and there's no modern Atoms left to compete in the lower-price territory. Perhaps Intel will be forced to release cheaper gimped Core-based "Atoms" in the future? Or Celerons/Pentiums. ;)
  • LiverpoolFC5903 - Wednesday, May 31, 2017 - link

    Meh.

    Incremental update with no radical changes. Would LOVE to see a huge fat ARM core with a 5+ wide front end for premium devices, with single threaded throughput approaching that of the Core M series. Now that would be progress.

    No reason why a dual core with two fat cores cannot work great on android, especially given the idea of race to sleep. Off load background tasks to DSPs, Microcontrollers etc or even use a third big core clocked at about half the frequency of the main two cores.

    Sure, will be expensive and big, but you can be sure there will be customers for it, especially in the 700 USD plus market segment. As of now, manufacturers barely have any choice apart from qualcomm chipsets.
  • lizanosi - Wednesday, May 31, 2017 - link

    I ask you, why have Samsung and Apple continued to have great success deviating from ARM's reference designs, http://www.promocodeway.com/coupons/ubereats-promo... while Qualcomm has been married to them and paying the performance price (specifically looking at you, 808)
  • melgross - Wednesday, May 31, 2017 - link

    For the most part, Samsung's designs were straight from ARM. They didn't have an architectural license. It's only very recently that they've gotten one.

    But Snapdragon has been Qualcomm's own designs, because they do have an architectural license, as does Apple. But, like the rest of the industry, they were discombobulated when Apple came out with the 64 bit A7.

    They've never gotten totally back into the race. Their fist one was an ARM design, and it had heat problems. The second was their design, but performance was fairly poor. The 835 is not much better than the preceding model. Samsung has faired no better. The problem they all have is that Apple is two years ahead there, and likely took their time with the A7, because there was no competition. These guys are rushing to catch up, and they are likely restrained by the expectation by Android buyers that more cores are better, rather than having better cores.
  • StrangerGuy - Friday, June 2, 2017 - link

    Now that Apple's GPU is a mostly fully custom part, expect the A11 to start another A7-esque domination over Android SoCs on graphics. i also expect a Apple custom LTE baseband to debut this year too, since Apple is definitely too paranoid to depend solely on Qualcomm and Intel's baseband proved to be donkey balls.

    Besides, iPhones probably outsell everyone's else flagships combined yearly in a single launch quarter. The economics of scale for a Android flagship SoC makes far less sense.
  • Suraj tiwari - Thursday, June 1, 2017 - link

    Dynamiq is a welcome move, it should be adopted by SOC manufacturers immediately. No other cpu manufacturer (intel, AMD) has a technology like this!
  • Anato - Saturday, June 3, 2017 - link

    I would prefer 2+2 over 8 A55 cores any day and pay for it, but marketing disagrees :-(
  • slee915 - Wednesday, June 28, 2017 - link

    This article shows A73 has a 3-stage AGU LD/ST memory pipeline but last year's A73 article http://www.anandtech.com/show/10347/arm-cortex-a73... shows it has a 4-stage AGU LD/ST. So which one is correct ?

Log in

Don't have an account? Sign up now