CPU Performance & Efficiency: SPEC2006

We’re moving on to SPEC2006, analysing the new single-threaded performance of the new Cortex-A77 cores. As the new CPU is running at the same clock as the A76-derived design of the Snapdragon 855, any improvements we’ll be seeing today are likely due to the IPC improvements of the core, the doubled L3 cache, as well as the enhancements to the memory controllers and memory subsystem of the chip.

Disclaimer About Power Figures Today:

The power figures presented today were captured using the same methodology we generally use on commercial devices, however this year we’ve noted a large discrepancy between figures reported by the QRD865’s fuel-gauge and the actual power consumption of the device. Generally, we’ve noted that there’s a discrepancy factor of roughly 3x. We’ve reached out to Qualcomm and they confirmed in a very quick testing that there’s a discrepancy of >2.5x. Furthermore, the QRD865 phones this year again suffered from excessive idle power figures of >1.3W.

I’ve attempted to compensate the data as best I could, however the figures published today are merely preliminary and of lower confidence than usual. For what it’s worth, last year, the QRD855 data was within 5% of the commercial phones’ measurements. We’ll be naturally re-testing everything once we get our hands on final commercial devices.

In the SPECint2006 suite, we’re seeing some noticeable performance improvements across the board, with some benchmarks posting some larger than expected increases. The biggest improvements are seen in the memory intensive workloads. 429.mcf is DRAM latency bound and sees a massive improvement of up to 46% compared to the Snapdragon 855.

What’s interesting to see is that some execution bound benchmarks such as 456.hmmer seeing a 28% upgrade. The A77 has an added 4th ALU which represents a 33% throughput increase in simple integer operations, which I don’t doubt is a major reason for the improvements seen here.

The improvements aren’t across the board, with 400.perlbench in particular seeing even a slight degradation for some reason. 403.gcc also saw a smaller 12% increase – it’s likely these benchmarks are bound by other aspects of the microarchitecture.

The power consumption and energy efficiency, if the numbers are correct, roughly match our expectations of the microarchitecture. Power has gone up with performance, but because of the higher performance and smaller runtime of the workloads, energy usage has remained roughly flat. Actually in several tests it’s actually improved in terms of efficiency when compared to the Snapdragon 855, but we’ll have to wait on commercial devices in order to make some definitive conclusions here.

In the SPECfp2006 suite, we’re seeing also seeing some very varied improvements. The biggest change happened to 470.lbm which has a very big hot loop and is memory bandwidth hungry. I think the A77’s new MOP-cache here would help a lot in regards to the instruction throughput, and the improved memory subsystem makes the massive 65% performance jump possible.

Arm actually had advertised IPC improvements of ~25% and ~35% for the int and FP suite of SPEC2006. On the int side, we’re indeed hitting 25% on the Snapdragon 865, compared to the S855, however on the FP side we’re a bit short as the increase falls in at around 29%. The performance increases here strongly depend on the SoC and particular on the memory subsystem, compared to the Kirin 990’s A76 implementation the increases here are only 20% and 24%, but HiSilicon’s chip also has a stronger memory subsystem which allows it to gain quite more performance over the A76’s in the S855.

The overall results for SPEC2006 are very good for the Snapdragon 865. Performance is exactly where Qualcomm advertised it would land at, and we’re seeing a 25% increase in SPECint2006 and a 29% in SPECfp2006. On the integer side, the A77 still trails Apple’s Monsoon cores in the A11, but the new Arm design now has been able to trounce it in the FP suite. We’re still a bit far away from the microarchitectures catching up to Apple’s latest designs, but if Arm keeps up this 25-30% yearly improvement rate, we should be getting there in a few more iterations.

The power and energy efficiency figures, again, taken with a grain of salt, are also very much in line with expectations. Power has slightly increased with performance this generation, however due to the performance increase, energy efficiency has remained relatively flat, or has even seen a slight improvement.

Introduction & Specifications System Performance
POST A COMMENT

176 Comments

View All Comments

  • UglyFrank - Monday, December 16, 2019 - link

    I imagine the Tab S7 will have this.
    Meanwhile the iPad Pro 2020 will most likely have more than double the GPU power.
    Reply
  • Kishoreshack - Monday, December 16, 2019 - link

    That's not how it works bro Reply
  • UglyFrank - Monday, December 16, 2019 - link

    It is. The A12X has more than double the S855's GPU performance and we can expect ~ 20% increase in GPU performance (A12X to A13X) as the A12 to A13 had a similar increase. Reply
  • generalako - Monday, December 16, 2019 - link

    Ok, but then again the SD875 (or whatever it will be called) is expected to be on a new architecture after 3 generation, which generally means 50%+ jump just there. With the transition over to 5nm, you can expect even more performance from that. That would, after all, be the most fair comparison to the A14 (or A14X) on 5nm later this year, due to process node comparisons. Same with CPUs (don't forget, the A77 in the SD865 was released in the summer before by ARM, and even presented in the SD865 in December). Reply
  • close - Tuesday, December 17, 2019 - link

    Over the past few years Apple has been doing a consistently better job than Qualcomm regardless of process node. Probably they can afford to since they are in full control of the whole technology stack, including the software which means they can squeeze additional performance and efficiency like that. But this doesn't change the fact that year after year A-chips are better than their counterparts. Reply
  • tuxRoller - Wednesday, December 18, 2019 - link

    I'm not sure that apple is much, if at all, more optimized than the Android bsps. If you're aware of proof to the contrary I'd be interested in reading it. Reply
  • michael2k - Wednesday, December 18, 2019 - link

    It doesn’t mean optimized the way you envision it. It means more tailored to the design, since Apple has a fixed number of systems it has to support. There are three ways to see it: how many years does Apple push iOS updates? That is a function of performance as well, as as the OS.

    Another way to see it is knowing that Apple ships iPhones with much less RAM, meaning their OS and apps have to be designed to use less RAM too.

    Likewise their iPhone usually ships with smaller batteries; by designing the OS, SoC, and RAM synergistically they can use a smaller battery too. RAM happens to use energy even when idle, so less RAM does translate to lower energy usage.
    Reply
  • michael2k - Tuesday, December 17, 2019 - link

    Yeah, but anything Qualcomm does to boost performance, Apple will be doing too.

    The 865 is going to compete with the A14 in 2020, and the 875 will compete with the A15 in 2021. So if we expect the A14 to boost perf by 15% and the A14X to boost perf by 40%, and the A15 to boost perf again by 10% and A15X to boost perf again by 25%, you'll see:
    855 = 1.00
    865 = 1.25
    875 = 1.50

    A13 = 1
    A13X = 1.4
    A14 = 1.15
    A14X = 1.96
    A15 = 1.26
    A15X = 2.45

    Technically Qualcomm has more room to improve when you compare transistor budgets: the A13 is approximately 8.5b transistors, the A12 7b transistors.

    In comparison, the 855 only had 6b transistors, per Qualcomm itself:
    https://www.qualcomm.com/media/documents/files/sna...
    Reply
  • id4andrei - Tuesday, December 17, 2019 - link

    The 865 competes with A13 not with the future A14. Apple sets the cadence in the SoC space and have done so since breaking rank with sheer performance and transition to a 64bit arch. Reply
  • generalako - Tuesday, December 17, 2019 - link

    This is just misrepresentative. The past two generations ARM's architecture has been closing the gap to Apple. It closed the gap by around 30% in IPC with A76, and doing so by around 15% in IPC with A77 (A77 had 27% IPC gain vs. A13's 12% IPC gain). The gap has been getting smaller, and hopefully it will continue. But the fact is still that it's closing for the performance cores.

    Also, you're comparisons are way off. The SD855 was comparable to the A12, just as the SD865 is to the A13, and so on and so forth. This with process node and the actual release date of the Cortex Core in mind.
    Reply

Log in

Don't have an account? Sign up now