CPU Performance

We’ll begin our Kirin 960 performance evaluation by investigating the A73’s integer and floating-point IPC with some synthetic tests. Then we’ll see how the changes to its memory system affect memory latency and bandwidth. Finally, after completing the lower-level tests, we’ll see how Huawei’s Mate 9 and its Kirin 960 SoC perform when running some real-world workloads.

Our first look at the A73’s integer performance comes from SPECint2000, the integer component of the SPEC CPU2000 benchmark developed by the Standard Performance Evaluation Corporation. This collection of single-threaded tests allows us to compare IPC for competing CPU microarchitectures. The scores below are not officially validated numbers, which requires the test to be supervised by SPEC, but we’ve done our best to choose appropriate compiler flags and to get the tests to pass internal validation.

SPECint2000 - Estimated Scores
ARMv8 / AArch64
  Kirin 960 Kirin 950
(% Advantage)
Exynos 7420
(% Advantage)
Snapdragon 821
(% Advantage)
164.zip 1217 1094
(11.3%)
940
(29.5%)
 1273
(-4.4%)
175.vpr 4118 3889
(5.9%)
2857
(44.1%)
1687
(144.1%)
176.gcc 2157 1864
(15.7%)
1294
(66.7%)
 1746
(23.5%)
181.mcf 1118 664
(68.3%)
928
(20.5%)
 1200
(-6.8%)
186.crafty 2222 2083
(6.7%)
1176
(88.9%)
 1613
(37.8%)
197.parser 1395 1208
(15.5%)
933
(49.5%)
1059
(31.8%)
252.eon 3421 3333
(2.6%)
2453
(39.5%)
3714
(-7.9%)
253.perlmk 1748 1651
(5.8%)
1216
(43.8%)
 1513
(15.5%)
254.gap 1930 1667
(15.8%)
1264
(52.6%)
 1594
(21.1%)
255.vortex 2111 1863
(13.3%)
1473
(43.3%)
 1712
(23.3%)
256.bzip2 1402 1220
(15.0%)
1079
(29.9%)
 1172
(19.6%)
300.twolf 2479 2521
(-1.7%)
1887
(31.4%)
 847
(192.6%)

The Kirin 960’s A73 CPU is about 11% faster on average than the Kirin 950’s A72. In addition to the front-end changes discussed on the previous page and the changes to the memory system discussed in the next section, the A73’s integer pipelines have undergone a few tweaks as well. Where the A72 had 3 integer ALUs—2 simple ALUs for basic operations such as addition and shifting and 1 dedicated multi-cycle ALU for complex operations such as multiplication, division, and multiply-accumulate—the A73 only has 2 integer ALUs that are capable of performing both basic and complex operations. This affects performance in different ways. For example, because only one of the A73’s ALUs can handle multiplication while the other handles division, the time to execute multiply or division operations sees no change; however, while an ALU is occupied with a multi-cycle instruction, it cannot execute simple instructions like the A72’s dedicated pipelines can, leading to a potential performance loss. Multiply-accumulate operations, which require both of the A73’s pipelines, incur a similar penalty. It’s not all bad, however. Workloads that perform parallel arithmetic or use certain other complex instructions can see double the execution throughput on A73 versus A72.

Note that the table above does not account for differences in CPU frequency. The Kirin 960’s frequency advantage over the Kirin 950 and Snapdragon 821 is less than 3%, making these numbers easier to compare, but its advantage over the Exynos 7420 is a little over 12%. The chart below accounts for this by dividing the estimated SPECint2000 ratio score by CPU frequency, making IPC comparisons easier.

SPECint2000 64b/32b Estimated Ratio/MHz

Despite the substantial microarchitectural differences between the A73 and A72, the A73’s integer IPC is only 11% higher than the A72’s. This is likely the result of improvements in one area being partially offset by regressions in another. Still, assuming ARM’s power reduction claims hold true, this is not a bad result.

The gap between the A73 and A57 increases to 29%. The integer performance for Qualcomm’s custom Kryo core is well behind ARM’s A73 and A72 cores, essentially matching the A57’s IPC.

Geekbench 4 - Integer Performance
Single Threaded
  Kirin 960 Kirin 950
(% Advantage)
Exynos 7420
(% Advantage)
Snapdragon 821
(% Advantage)
AES 911.3 MB/s 935.6 MB/s
(-2.59%)
795.8 MB/s
(14.52%)
559.1 MB/s
(63.00%)
LZMA 3.03 MB/s 2.87 MB/s
(5.69%)
2.28 MB/s
(33.33%)
2.20 MB/s
(38.09%)
JPEG 16.1 Mpixels/s 15.5 Mpixels/s
(3.66%)
14.1 Mpixels/s
(13.95%)
21.6 Mpixels/s
(-25.62%)
Canny 22.5 Mpixels/s 26.8 Mpixels/s
(-16.06%)
23.6 Mpixels/s
(-4.80%)
30.3 Mpixels/s
(-25.77%)
Lua 1.70 MB/s 1.55 MB/s
(10.13%)
1.20 MB/s
(41.94%)
1.47 MB/s
(16.14%)
Dijkstra 1.53 MTE/s 1.14 MTE/s
(33.53%)
0.92 MTE/s
(65.12%)
1.39 MTE/s
(9.57%)
SQLite 51.6 Krows/s 43.5 Krows/s
(18.62%)
34.0 Krows/s
(51.99%)
36.7 Krows/s
(40.73%)
HTML5 Parse 8.30 MB/s 6.79 MB/s
(22.19%)
6.37 MB/s
(30.25%)
7.61 MB/s
(9.02%)
HTML5 DOM 2.17 Melems/s 1.92 Melems/s
(12.82%)
1.26 Melems/s
(72.91%)
0.37 Melems/s
(489.09%)
Histogram Equalization 48.7 Mpixels/s 57.0 Mpixels/s
(-14.56%)
50.6 Mpixels/s
(-3.66%)
51.2 Mpixels/s
(-4.82%)
PDF Rendering 44.8 Mpixels/s 45.5 Mpixels/s
(-1.47%)
39.7 Mpixels/s
(12.93%)
53.0 Mpixels/s
(-15.36%)
LLVM 194.4 functions/s 167.9 functions/s
(15.76%)
128.6 functions/s
(51.14%)
113.5 functions/s
(71.20%)
Camera 5.45 images/s 5.45 images/s
(0.00%)
4.95 images/s
(10.17%)
7.19 images/s
(-24.12%)

The updated Geekbench 4 workloads give us a second look at integer IPC. Similar to the SPECint2000 results, we see Kirin 960 showing 5% to 15% gains over Kirin 950 in several of the tests, but there’s a bit more variation overall. The Kirin 960 is actually slower than Kirin 950 in some tests, and, in the case of Canny and Histogram Equalization, its A73 is even slower than the Exynos 7420’s A57. It also falls behind Qualcomm’s Kryo in the JPEG, PDF Rendering, and Camera tests. The tests where the Kirin 960 does well—HTML5 Parse, HTML5 DOM, and SQLite—are very common workloads, though, which should translate into better real-world performance.

Geekbench 4  (Single Threaded) Integer Score/MHz

The chart above accounts for differences in CPU frequency, making it easier to directly compare IPC. Overall the A73 shows only about a 4% improvement over the A72 and about a 12% improvement over the A57 in this group of workloads, considerably less than what we saw in SPECint2000; however, with margins ranging from 33.5% in Dijkstra to -16.1% in Canny, it’s impossible to make any sweeping statements about the A73’s integer performance being better or worse than the A72’s.

Qualcomm’s Kryo CPU falls just behind the A57 once again despite posting better results in many of the Geekbench integer tests. Its poor performance in LLVM and HTML5 DOM weighs heavily on its overall score.

I’ve also included results for ARM’s in-order A53 companion core. The A73’s integer IPC is 1.7x to 2x higher overall, which illustrates why octa-core A53 SoCs are so much slower, particularly in Web browsing, than designs that use 2-4 big cores (A73/A72/A57) instead of 4 additional A53s.

Geekbench 4 - Floating Point Performance
Single Threaded
  Kirin 960 Kirin 950
(% Advantage)
Exynos 7420
(% Advantage)
Snapdragon 821
(% Advantage)
SGEMM 10.7 GFLOPS 13.9 GFLOPS
(-23.44%)
11.9 GFLOPS
(-10.36%)
12.2 GFLOPS
(-12.57%)
SFFT 2.89 GFLOPS 2.26 GFLOPS
(27.73%)
2.62 GFLOPS
(10.39%)
3.21 GFLOPS
(-10.07%)
N-Body Physics 838.4 Kpairs/s 896.9 Kpairs/s
(-6.52%)
634.5 Kpairs/s
(32.14%)
1156.7 Kpairs/s
(-27.51%)
Rigid Body Physics 5891.4 FPS 6497.4 FPS
(-9.33%)
4662.7 FPS
(26.35%)
7171.3 FPS
(-17.85%)
Ray Tracing 221.9 Kpixels/s 216.9 Kpixels/s
(2.30%)
136.1 Kpixels/s
(63.07%)
298.3 Kpixels/s
(-25.59%)
HDR 7.46 Mpixels/s 7.57 Mpixels/s
(-1.45%)
7.17 Mpixels/s
(4.09%)
10.8 Mpixels/s
(-30.90%)
Gaussian Blur 23.6 Mpixels/s 28.6 Mpixels/s
(-17.37%)
24.4 Mpixels/s
(-2.94%)
48.5 Mpixels/s
(-51.27%)
Speech Recognition 12.8 Words/s 8.9 Words/s
(44.14%)
10.2 Words/s
(25.49%)
10.9 Words/s
(17.43%)
Face Detection 501.2 Ksubs/s 518.9 Ksubs/s
(-3.42%)
435.5 Ksubs/s
(15.09%)
685.0 Ksubs/s
(-26.83%)

With the exception of SFFT and Speech Recognition, the Kirin 960 is generally a little slower than the Kirin 950 in Geekbench 4’s floating-point workloads. This is a bit of a surprise considering that the A73’s NEON execution units are relatively unchanged from the A72’s design, with reduced latency for specific instructions improving NEON performance by 5%, according to ARM. These results are even harder to interpret after factoring in the A73’s lower-latency front end and improvements to its fetch block and memory subsystems. It’s possible that some of these tests are limited by the A73’s narrower decode stage, but given the variation in workloads, this is probably not true for every case. It will be interesting to see if A73 implementations from other SoC vendors show similar results.

Geekbench 4 (Single Threaded) Floating Point Score/MHz

After accounting for the differences in CPU frequency, floating-point IPC for the Kirin 960’s A73 is 3% to 5% lower overall than the A72 but about 3% higher than the older A57. These results, which are a geometric mean of the floating-point subtest scores, are certainly closer to what I would expect, but hide the large performance variation from one workload to the next.

It’s pretty obvious that floating-point performance was Qualcomm’s focus for its custom Kryo core. While integer IPC was no better than ARM’s A57, Kryo’s floating-point IPC is 23% higher than the A72 in Geekbench 4, with particularly strong results in the Gaussian Blur and HDR tests.

Introduction Memory and System Performance
Comments Locked

86 Comments

View All Comments

  • MajGenRelativity - Tuesday, March 14, 2017 - link

    I'm a dunce sometimes. I totally missed that. Thank you Ian!
  • fanofanand - Tuesday, March 14, 2017 - link

    I love that you have begun moderating (to a degree) the comments section! It's nice to have someone with so much knowledge there to dispel the FUD! Not saying his question was bad, but I really do like that you are getting in the mud with us plebs :)
  • MajGenRelativity - Tuesday, March 14, 2017 - link

    My question wasn't bad, just stupid :P Should have read that page a little more closely.
  • fanofanand - Tuesday, March 14, 2017 - link

    I didn't mean to imply your question was bad at all, and I certainly wasn't lumping you in with those spreading FUD, but Ian has become a growing presence in the comments section and I for one like what he's doing. The comments section in nearly every tech article has become ugly, and having a calming, logical, rational presence like Ian only helps to contribute to a more polite atmosphere where disagreement can be had without presuming that the person with an opposing viewpoint is Hitler.
  • MajGenRelativity - Tuesday, March 14, 2017 - link

    I thought this was the Internet, where the opposing viewpoint is always Hitler? :P
  • fanofanand - Tuesday, March 14, 2017 - link

    Hitler has become omnipresent, now the Barrista who underfoams your latte must be Hitler!
  • lilmoe - Tuesday, March 14, 2017 - link

    Shouldn't this provide you with even more evidence that max frequency workloads are super artificial, and are completely unrepresentative of normal, day-to-day workloads? This further supports my claim in earlier article comments that chip designers are targeting a certain performance target, and optimizing efficiency for that point in particular.

    I keep saying this over and over (like a broken record at this point), but I do firmly believe that the benchmarking methodology for mobile parts of the entire blogsphere is seriously misleading. You're testing these processors the same way you would normally do for workstation processors. The author even said it himself, but the article contradicts his very statement. I believe further research/investigations should be done as to where that performance target is. It definitely defers from year to year, with different popular app trends, and from OS upgrade to another.

    Spec, Geekbench and browser benchmarks, if tested in context of same device, same OS upgrades, are a good indication of what the chip can artificially achieve. But the real test, I believe, is launching a website, using facebook, snapchat, etc., and comparing power draw of various chips, since that's what these chips were designed to run.

    There's also the elephant in the room that NO ONE is accounting for when testing and benchmarking, and that's touch input overhead. Most user interaction is through touch. I don't know about iOS, but everyone knows that Android ramps up the clock when the touchscreen detects input to reduce lag and latency. Your browser battery test DO NOT account for that, further reducing its potential credibility as a valid representation of actual usage.

    I mention touch input clock ramps in particular because I believe this is the clock speed that OEMs believe it delivers optimal efficiency on the performance curve for a given SoC, at least for the smaller cluster. A better test would be logging the CPU clocks of certain workloads, and taking the average, then calculating the power draw of the CPU on that particular average clock.

    This is where I believe Samsung's SoCs shine the most. I believe they deliver the best efficiency for common workloads, evident in the battery life of their devices after normalization of screen size/resolution to battery capacity.

    Worth investigating IMO.
  • fanofanand - Tuesday, March 14, 2017 - link

    If you can come up with a methodology where opening snapchat is a repeatable scientific test, send your hypothesis to Ryan, I'm sure he will indulge your fantasy.
  • lilmoe - Tuesday, March 14, 2017 - link

    Yea, we all love fantasies. Thing is, in the last couple of paragraphs, Matt literally said that the entirety of the review does not match with the actual real-world performance and battery life of the Mate 9.

    But sure, go ahead and keep testing mobile devices using these "scientific" conventional anyway, since it makes readers like fanofanand happy.
  • close - Tuesday, March 14, 2017 - link

    That is, of course, an awesome goal. Now imagine the next review the battery life varies between 10 and 18 hours even on the same phone. Now judge for yourself if this kind of result is more useful to determine which phone has a better battery life. Not only is your real world usage vastly different from mine (thus irrelevant) but you yourself can't even get through 2 days with identical battery life or identical usage. If you can't determine one phone's battery life properly how do you plan on comparing that figure to the ones I come up with?

    If you judged your comment by the same standards you judge the article you wouldn't have posted it. You implicitly admit there's no good way of testing in the manner you suggest (by refusing or being unable to provide a clearly better methodology) but still insisted on posting it. I will join the poster above in asking you to suggest something better. And don't skimp on the details. I'm sure that if you have a reasonable proposal it will be taken into consideration not for your benefit but for all of ours.

    Some of these benchmarks try to simulate a sort of average real world usage (a little bit of everything) in a reproducible manner in order to be used in a comparison. That won't be 100% relevant but there is a good overlap and it's the best comparative tool we've got. Your generic suggestion would most likely provide even less relevant figures unless you come up with that better scenario that you insist on keeping to yourself.

Log in

Don't have an account? Sign up now