SPEC2006 Performance & Efficiency

HiSilicon made some big promises for the Kirin 980, claiming up to 75% higher performance while improving efficiency by 58%. This time around for our analysis, I’m starting with the results off the bat and going into the more detailed analysis later.

We’re picking off with our mobile SoC SPEC2006 results where we left off in our Apple A12 analysis, and add the scores of the new Kirin 980 to the new set.

As a reminder, as the scores aren’t submitted to SPEC, we have to put a disclaimer that these are just estimates as they aren’t officially validated. Naturally, we verify that the tests are run correctly in-house.

When measuring performance and efficiency, it’s important to take three metrics into account: Evidently, the performance and runtime of a benchmark, which in the graphs below is represented on the right axis, growing from the right. Here the bigger the figures, the more performant a SoC/CPU has benchmarked. The labels represent the SPECspeed scores.

On the left axis, the bars are representing the energy usage for the given workload. The bars grow from the left, and a longer bar means more energy used by the platform. A platform is more energy efficient when the bars are shorter, meaning less energy used. The labels showcase the average power used in Watts, which is still an important secondary metric to take into account in thermally constrained devices, as well as the total energy used in Joules, which is the primary efficiency metric.

Both the SPECint2006 and SPECfp2006 overall scores paint a much better picture than HiSilicon lead us to believe in their estimates, and the figures are also ahead of what I had personally estimated using Arm’s marketing materials during the Cortex-A76 launch.

Against the Kirin 970, the Kirin 980 is sporting about flat out double the SPEC2006 performance. Naturally this is a comparison between SoCs two CPU generations apart, but it’s still an outstanding showing because the improvements were not only in the benefit of performance, but also energy efficiency.

The Kirin 980’s energy usage while completing the workloads is among the best in the space, ending about in the same range as Apple’s A12 SoC, slightly eding it out in SPECint2006 as well as SPECfp2006.

We have to remember that the energy usage isn’t the same as power efficiency: While energy usage of a workload is an immensely important metric that will very much correlate to the battery life of a device, the power efficiency of a CPU and SoC is something needs to take into consideration the actual performance as well. In this case, the actual perf/W of the Kirin 980 is only about 28% better than the Kirin 970. So if performance has doubled, and energy usage has also gone down, then it means that something had to give, and that is the power usage.

The new Cortex-A76 cores and the memory subsystem of the Kirin 980 are a lot more power hungry, reaching 2.14W average in SPECint and 2.65W in SPECfp, a notable increase over the 1.38W and 1.72W of the Kirin 970. In a sense, Arm’s new microarchitectures, including the Cortex-A75 of the Snapdragon 845, have been able to increase their performance in a more linear fashion alongside power.

Now this is not really a negative as long as the relationship between gained performance and raised power usage is kept in check. The best example is obviously Apple’s SoCs, which do sport very high power figures, but also come with very high performance. The best counter-example is the Exynos 9810’s higher frequency states, which come with similarly high power requirements, however don’t showcase an equal increase in performance, thus resulting in a big efficiency disadvantage.

Looking at the wider range of historical SPEC2006 scores, we see the Kirin 980 just losing out to Apple’s A10 in terms of performance. As I had expected some months ago, the A76 largely puts the vastly bigger and more complex Exynos M3 to shame as it manages to post better performance while at the same time using much less power as well as using about half the energy to complete the benchmark.

While Arm has already disclosed the key aspects and improvements of the Cortex-A76 microarchitecture, we can still go over the more detailed SPEC2006 subtests to see if we can extract any further meaningful information:

In SPECint2006, the Kirin 980’s gains are quite relatively even across the board, possibly showcasing a more balanced approach towards the different aspects of the microarchitecture. The biggest generational gains were found in 403.gcc where we see a 2.67x improvement over the Kirin 970. It’s a bit unfortunate that we don’t have a better “apples-to-apples” comparison to the Cortex-A75; the Snapdragon 845’s DRAM memory latency isn’t very good due to its “L4” system cache block, which does handicap it a tad in SPEC.

456.hmmer and 464.h264ref are the two most execution backend bound tests in the suite, and the Cortex-A76 again performs excellently here, showcasing scores that are about in line what you’d expect from the 4-wide microarchitecture as well as clock frequency. I might sound like broken record by now, but again this comes at a great contrast versus Samsung’s M3 core, which in theory just should perform much better than it does.

Something that’s also obvious here is that even though the Cortex-A76 and Kirin 980 are able to show good improvements, it’s not enough to even remotely close the big performance gap in memory latency and bandwidth sensitive tests  – here Apple’s monstrous memory subsystem is just that much further ahead.

In the SPECfp2006 results, we again see some really big improvements of the Kirin 980 versus the 970. Again the generational improvement over the A75 is a bit clouded through the comparison to the Snapdragon 845, which in these memory sensitive tests didn’t manage to separate itself much from the previous Cortex A73 based Snapdragon 835.

Still, all, in all, we see very rounded off improvements across the board in all benchmarks, which fares very well for the Kirin 980 both in terms of performance gains as well as energy usage improvements.

Overall, the Kirin 980 as well as Arm’s Cortex-A76 both delivered on their promises on the CPU side, and even managed to surpass by initial performance projections of the new core. No, the Kirin 980 certainly is not able to match Apple’s A12, or even A11 for that matter, and it’s likely this situation won’t change all that much in the next few generations, at least until the Android SoC vendors invest in significantly better and more robust memory subsystems.

The Kirin 980’s performance here should largely represent what we’ll see in the next generation Snapdragon as well – I expect Qualcomm to be able to push the clocks just slightly more, but the big question here is what they will do on the memory subsystem side and if they’ll be able to get rid of the latency penalty that was introduced with the L4 system cache.

For Samsung, the Cortex-A76 is just scary (Apple aside). If the next generation M4 core is just an iterative microarchitecture, I have a really hard time seeing it compete. Here we’ll need to see major improvements both in performance as well as power efficiency in order to have the next gen Exynos to be able to match the Kirin 980, yet alone beat it.

The Kirin 980 - A Recap Overview Second Generation NPU - NNAPI Tested
Comments Locked

141 Comments

View All Comments

  • name99 - Friday, November 16, 2018 - link

    Andrei you are concentrating on the wrong thing. I don't care about the inadequacies of GB4's memory bandwidth test, or the device uncore, I care about the DRAM part of this.

    I understand you and anomouse are both claiming that LPDDR4-2133 means 4266 MT/s.
    OK, if that's true it's a dumb naming convention, but whatever. The point is, this claim goes directly against the entire thrust of the anandtech DDR5 article from a few days ago that I keep referring to, which states very clearly that something like DDR4-3200 means 3200MT/s

    THAT is the discrepancy I am trying to resolve.
  • ternnence - Friday, November 16, 2018 - link

    name99 , for mobile,LPDDR4x has 4266 spec , however desktop DDR4 rarely could get such frequency. So it is not LPDDR4-2133 has 4266MT/s, it is LPDDR4-4266 has 4266MT/s
  • ternnence - Friday, November 16, 2018 - link

    FYI,https://www.samsung.com/semiconductor/dram/lpddr4x... you could check this site.
  • name99 - Friday, November 16, 2018 - link

    FWIW wikipedia sees things the same way saying that
    https://en.wikipedia.org/wiki/DDR4_SDRAM
    eg DDR4-2133 means 2133MT/s

    This follows the exact same pattern as all previous SDRAM numbering. Up to DDR3 the multiplier was 2 (DDR), 4(DDR2) or 8(DDR3); with DDR4 the multiplier stays at 8 but the base clock doubles so from min of 100MHz it's now min of 200MHz.

    But these are internal details; the part that matters is that most authorities seem to agree that DDR4-2133 means 2133MT/s, each transaction normally 64-bits wide.

    Now there are SOME people claiming no, DDR4-2133 means 4266 MT/s
    - https://www.androidauthority.com/lpddr4-everything...
    claims this (but couches the claim is so much nonsensical techno-double-speak that I don't especially trust them)
    - so do you and anonomouse.

    So, like I said, WTF is going on here? We have a large pool of sources saying the sky is blue, and a different pool insisting that, no, the sky is green.
  • anonomouse - Friday, November 16, 2018 - link

    I never claimed that DDR4-2133 means 4266MT/s. I am instead claiming that there is no LPDDR4-2133.
  • anonomouse - Friday, November 16, 2018 - link

    I think the discrepancy here is just that you/they are mixing the naming conventions. DDR4-3200 means 3200MT/s. After an admittedly brief and cursory search, I don't see any references to Micron using the term LPDDR4-2133. I instead see every indication that they have LPDDR4 running at 2133MHz. Perhaps people here and there are mixing up the terminology, but when in doubt may as well just look at the actual memory clock or bandwidth being listed as that's ultimately what's importantly.
  • name99 - Friday, November 16, 2018 - link

    Yeah, I think you are correct. After looking in a few different places I think the following are all true:
    - The DDR4 guys tend to talk about MT/s and give the sorts of numbers I gave
    - The LPDDR4 guys tend to talk about Mb/s per pin (same as MT/s, but just shows a different culture) and tend to be working with substantially higher numbers.

    I *THINK* (corrections welcome) that
    (a) the way LPDDR4 is mounted (no DIMMs and sockets, rather it's direct mounting, either on the SoC as PoP, or extremely close to it on a dedicated substrate), allows for substantially higher frequencies than DDR4.
    (b) one's natural instinct (mine, and likely other people's) is that "of course DDR4 runs faster [fewer power concerns, etc]" so when you see LPDDR4 running faster (at say "4266") you assume this has to mean some sort of "silent" multiplication by 2, and what's actually meant is the equivalent of DDR4-2133 at 2133MT/s.
    (c) It certainly doesn't help that Micron at least is calling the 4266MT/s LPDDR4 as having a "2133MHz clock". I have no idea what that is supposed to mean given that the DDR4 "clock" runs at 1/8th transaction speed, so for DDR4 the clock of a 4266MT/s device would be 533MHz.

    So I think we have established that the actual speeds ARE 4266MT/s (or so) for LPDDR4.
    Left unresolved
    - these are generally higher than DDR4? Meaning that, sooner or later, PC users are going to have to choose between flexible RAM (DIMMs and sockets) or high speed RAM (PoP mounting, or superclose to the SoC on a substrate --- look at the A12X)?

    - Why is Micron calling something like LPDDR4-4266 as having a 2133MH clock? What does that refer to? I would assume that, like normal DDRx, the "low frequency clock" (what I've said would be 533MHz) is the speed for control transactions, and the 8x speed (4266Mb/s per pin) is the speed for bulk data flow?
  • ternnence - Friday, November 16, 2018 - link

    where do you get this "Micron lists their LPDDR4, for example, as LPDDR4-2133, NOT as LPDDR4-4266?"? just check Micron official site, they mark LPDDR4-4266, not LPDDR4-2133, to their 2133MHz ram.
  • ternnence - Friday, November 16, 2018 - link

    ddr means double data rate. 2133MH equals ram operates 2133 per second. but one operate produce two data output. MT/s equals million transfer per second. so LPDDR4-4266= 4266 million transfer per second = 2133 million Hz
  • name99 - Friday, November 16, 2018 - link

    The Micron datasheets, for example, numdram.pdf,
    https://www.micron.com/~/media/documents/products/...
    do exactly this.

Log in

Don't have an account? Sign up now