SPEC2006 - Full Results

The below chart might be a bit crowded but it it’s the only correct way to have a complete overview of the performance-power-efficiency triad of measurement metrics. The left axis dataset scales based on efficiency (Subtest total energy in joules / subtest SPECspeed score) and also includes the average active power usage (Watts) over the duration of the test. Here the shorter the bars the better the efficiency, while average power being a secondary metric but still should be below a certain value and well within the thermal envelope of a device. The right axis scales simply with the estimated SPECspeed score of the given test, the longer the bar the better the performance.

While the article is focused around the Kirin 970 improvements this is an invaluable opportunity to look back and the two last generations of devices from Qualcomm and Samsung. There is an immediate striking difference in the efficiency of the Snapdragon 820 and Snapdragon 835 across almost all subtests. The comparison between the Exynos 8890 and Snapdragon 820 variants of the S7 was an interesting debate at the time and we came to the conclusion the Exynos 8890 variant was the better unit as it offered longer battery life at higher performance. We see this represented in this dataset as well as the Exynos 8890 manages to have a measurable performance lead in a variety of tests while having higher energy efficiency, albeit a higher power envelope.

2017’s Galaxy S8 reversed this position as the Snapdragon 835 was clearly the better performing unit while having a slight battery life advantage. This efficiency delta can again be seen here as well as the Exynos 8895 isn’t able to compete with the lower power consumption of the Snapdragon 835, even though the performance differences between the Exynos M2 and Cortex A73 are a lot more of a wash than the previous generation’s battle between Exynos M1 and Kryo CPUs.

Switching over to the Kirin SoCs I included as far back as the Kirin 955 with the Cortex A72 as it was a very successful piece of silicon that definitely helped Huawei’s device portfolio for 2016. Remembering our coverage of the Cortex A73 microarchitecture we saw a lot of emphasis from ARM on the core’s floating point and memory subsystem performance. These claims can be easily confirmed when looking at the massive IPC gains in the memory access sensitive tests. When it comes to pure integer execution throughput the A72’s three-wide decoder as expected still managed to outpace the 2-wide unit on the A73 as seen in the 445.gobmk and 456.hmmer subtests.

The Kirin 960 was not able to always demonstrate ARM’s A73’s efficiency gains as again the more execution bound tests the Kirin 955 was equal or slightly more efficient. But again thanks to the new memory subsystem the A73 is able to well distance itself from the A72 with massive gains in 429.mcf, 433.milc, 450.soplex and 482.sphinx3. Again the power figures here are total platform active power so it’s also very possible that the Kirin 960’s memory controller could have a hefty part in the generational improvement.

The Kirin 970 doesn’t change the CPU IP, however we see the introduction of LPDDR4X on the memory controller side which will improve I/O power to the DRAM by lowering the voltage from 1.1V down to 0.6V. While performance should be the same power efficiency should thus be higher by the promised 20% that HiSilicon quotes from the switch to the TSMC 10nm process, plus some percentage due to LPDDR4X.

Performance indeed is within spitting distance of the Kirin 960, however it managed to be a few percentage points slower. On the power efficiency side we see large gains averaging up to 30% across the board. It looks that HiSilicon decided to invest all the process improvement into lowering overall power as the Kirin 970 manages to shave off a whole watt from the Kirin 960 both in integer and floating point benchmarks.

An interesting comparison here is the duel between the Snapdragon 835 and Kirin 970 – both A73 CPUs running at almost identical clocks, one manufactured on Samsung’s 10LPE process and the other on TSMC’s 10FF process. Again by making use of the various workload types we can extract information on the CPU and the memory sub-system. In 445.gobmk and 456.hmmer we see the Kirin have a very slight efficiency advantage at almost identical performance. This could be used as an indicator that TSMC’s process has a power advantage over Samsung’s process, something not too hard to imagine as the latter silicon was brought to market over half a year later.

When we however take a look at more memory bound tests we see the Snapdragon 835 overtake the Kirin 970 by ~20%. The biggest difference is in 429.mcf which is by far the most demanding memory test, and we see the Snapdragon 835 ahead by 32% in performance and a larger amount in efficiency. We can thus strongly assume that between the K970 and S835, Qualcomm has better and more efficient memory controller and subsystem implementation.

The memory subsystem generally seems to be the weak point of Samsung’s Exynos 8895. The M2 core remains competitive in execution bound tests however quickly falls behind in anything more memory demanding. The odd thing here is that I’m not sure if the reason here is memory controller inefficiency but rather something more related to the un-core of the M2 cluster. Firing up even integer power-viruses always have an enormous 1-core power overhead compared to the incremental power cost of additional threads on the remaining 3 cores. A hypothesis here is that given Samsung’s new Exynos 9810 makes use of a completely new cache hierarchy (all but confirmed a DynamiQ cluster) that the existing implementation in the M1 and M2 cores just didn’t see as much attention and design effort compared to the CPU core itself. Using a new efficient cluster design and continuing on improving the core might be how Samsung has managed to find a way (Gaining power and efficiency headroom) to double single-threaded performance in the Exynos 9810.

When overviewing IPC for SPEC2006, we see the Kirin 960 and Snapdragon 835 neck in neck, with the Kirin 970 being just slightly slower due to memory differences. The Exynos 8895 shows a 25% IPC uplift in CINT2006 and 21% uplift in CFP2006 whilst leading the A73 in overall IPC by a slight 3%.

The Snapdragon 820 still has good showing in terms of floating point performance thanks to Kryo’s four main “fat” execution pipelines can all handle integer as well as floating point operations. This theoretically should allow the core to have far more floating point execution power than ARM and Samsung’s cores, and is the explanation as to why 470.lbm sees such massive performance advantages on Kryo and brings up the overall IPC score.

The Final Overview

For a final overview of performance and efficiency we get to a mixed bag. If solely looking at the right axis with overall SPECspeed estimated results of CINT2006 and CFP2006, we see that performance hasn’t really moved much, if at all, over the last 2 generations. The Kirin 970 is a mere 10% faster than the Kirin 955 in CINT, over 2 years later. CFP sees larger gains over the A72 but again we come back to a small performance regression compared to the Kirin 960. If one would leave it at that then it’s understandable to raise the question as to what exactly is happening with Android SoC performance advancements.

For the most part, we’ve seen efficiency go up significantly in 2017. The Snapdragon 835 was a gigantic leap over the Snapdragon 820, doubling efficiency at a higher performance point in CINT and managing a 50% efficiency increase in CFP. The Exynos 8895 and Kirin 970 both managed to increase efficiency by 55% in CINT and the latter showed the same improvements in CFP.

This year’s SoCs have also seen a large decrease in average power usage. This bodes well for thermal throttling and flow low thermal envelope devices, as ARM had touted at the launch of the A73. The upcoming Snapdragon 845 and A75 cores promise no efficiency gains over the A73, so the improved performance comes with linear increase in power usage.

I’m also not too sure about Samsung’s Exynos 9810 claiming such large performance jumps and just hope that those peak 2.9GHz clocks don’t come with outrageous power figures just for the sake of benchmark battling with Apple. The Exynos’ 8890 2-core boost feature was in my opinion senseless as the performance benefit of the additional 300MHz was not worth the efficiency penalty (The above results were run at the full 2.6GHz, 2.3GHz is only 10% slower but 25% more efficient) and the whole thing probably had more to do with matching the Snapdragon 820’s scores in the flawed GeekBench 3.

I’m not too sure how to feel about that as I think the current TDPs of the Snapdragon 835 and Kirin 970 (in CPU workloads) are sweet spots that the industry should maintain as it just gives a better mobile experience to the average user, so I do really hope the Snapdragon 845 offers some tangible process improvements to counter-act the micro-architectural power increase as well as clock increases otherwise we’ll see power shooting up again above 2W. 

SPEC2006 - A Reintroduction For Mobile GPU Performance & Power
POST A COMMENT

116 Comments

View All Comments

  • lilmoe - Monday, January 22, 2018 - link

    Unfortunately, they're not "fully" vertical as of yet. They've been held back since the start by Qualcomm's platform, because of licensing and "other" issues that no one seems to be willing to explain. Like Andrei said, they use the lowest common denominator of both the Exynos and Snapdragon platforms, and that's almost always lower on the Snapdragons.

    Where I disagree with Andrei, and others, are the efficiency numbers and the type of workloads used to reach those results. Measuring efficiency at MAX CPU and GPU load is unrealistic, and frankly, misleading. Under no circumstance is there a smartphone workload that demands that kind of constant load from either the CPU or GPU. A better measure would be running a actual popular game for 30 mins in airplane mode and measuring power consumption accordingly, or loading popular websites, using the native browser, and measuring power draw at set intervals for a set period of time (not even a benchmarking web application).

    Again, these platforms are designed for actual, real world, modern smartphone workloads, usually running Android. They do NOT run workstation workloads and shouldn't be measured as such. Such notions, like Andrei has admitted, is what pushes OEMs to be "benchmark competitive", not "experience competitive". Apple is also guilty of this (proof is in the latest events, where they're power deliver can't handle the SoC, or the SoC is designed well above sustainable TDP). I can't stress this enough. You just don't run SPEC and then measure "efficiency". It just doesn't work that way. There is no app out there that stresses a smartphone SoC this much, not even the leading game. In the matter of fact, there isn't an Android (or iPhone) game that saturates last year's flagship GPU (probably not even the year before).

    We've reached a point of perfectly acceptable CPU and GPU performance for flagships running 1080p and 1440p resolution screens at this point. Co-processors, such as the decoder, ISP, DSP and NPU, in addition to software optimization are far, FAR more more important at this time, and what Huawei has done with their NPU is very interesting and meaningful. Kudos to them. I just hope these co-processors are meant to improve the experience, not collect and process private user data in any form.
    Reply
  • star-affinity - Monday, January 22, 2018 - link

    Just curious about your claims about Apple – so you think it's a design fault? I'm thinking that the problem arise only when the battery has been worn out and a healthy battery won't have the problem of not sustaining enough juice for the SoC. Reply
  • lilmoe - Monday, January 22, 2018 - link

    Their batteries are too small, by design, so that's the first design flaw. But that still shouldn't warrant unexpected slowdowns within 12-18 months of normal usage; their SoCs are too power hungry at peak performance, and the constant amount of bursts was having its tall on the already smaller batteries that weren't protect with a proper power delivery system. It goes both ways. Reply
  • Samus - Monday, January 22, 2018 - link

    Exactly this. Apple still uses 1500mah batteries in 4.7" phones. When more than half the energy is depleted in a cell this small, the nominal voltage drops to 3.6-3.7v from the 3.9-4.0v peak. A sudden spike in demand for a cell hovering around 3.6v could cause it to hit the low-voltage cutoff, normally 3.4v for Li-Ion, and 3.5v for Li-Polymer, to prevent damage to the chemistry the internal power management will shut the phone down, or slow the phone down to prevent these voltage drops.

    Apple designed their software to protect the hardware. It isn't necessarily a hardware problem, it's just an inherently flawed design. A larger battery that can sustain voltage drops, or even a capacitor, both of which take up "valuable space" according to Apple, like that headphone jack that was erroneously eliminated for no reason. A guy even successfully reinstalled a Headphone jack in an iPhone 7 without losing any functionality...it was just a matter of relocating some components.
    Reply
  • ZolaIII - Wednesday, January 24, 2018 - link

    Try with Dolphine emulator & you will see not only how stressed GPU is but also how much more performance it needs. Reply
  • Shadowfax_25 - Monday, January 22, 2018 - link

    "Rather than using Exynos as an exclusive keystone component of the Galaxy series, Samsing has instead been dual-sourcing it along with Qualcomm’s Snapdragon SoCs."

    This is a bit untrue. It's well known that Qualcomm's CDMA patents are the stumbling block for Samsung. We'll probably see Exynos-based models in the US within the next two versions once Verizon phases out their CDMA network.
    Reply
  • Andrei Frumusanu - Monday, January 22, 2018 - link

    Samsung has already introduced a CDMA capable Exynos in the 7872 and also offers a standalone CDMA capable modem (S359). Two year's ago when I talked to SLSI's VP they openly said that it's not a technical issue of introducing CDMA and it'll take them two years to bring it to market once they decide they need to do so (hey maybe I was the catalyst!), but they didn't clarify the reason why it wasn't done earlier. Of course the whole topic is a hot mess and we can only speculate as outsiders. Reply
  • KarlKastor - Thursday, January 25, 2018 - link

    Uh, how many devices have shipped yet with the 7872?
    Why do you think they came with a MDM9635 in the Galaxy S6 in all CDMA2000 regions? In all other regions their used their integrated shannon modem.
    The other option is to use a Snapdragon SoC with QC Modem. They also with opt for this alternative but in the S6 they don't wanted to use the crappy Snapdragon 810.

    It is possible, that Qualcomm today skip their politics concerning CDMA2000 because it is obsolete.
    Reply
  • jjj - Monday, January 22, 2018 - link

    Don't forget that Qualcomm is a foundry customer for Samsung and that could be why they still use it.
    Also, cost is a major factor when it comes to vertical integration, at sufficient scale integration can be much cheaper.
    What Huawei isn't doing is to prioritize the user experience and use their high end SoCs in lower end devices too, that's a huge mistake. They got much lower costs than others in high end and gaining scale by using these SoCs in lower end devices, would decrease costs further. It's an opportunity for much more meaningful differentiation that they fail to exploit. Granted, the upside is being reduced nowadays by upper mid range SoCs with big cores and Huawei might be forced into using their high end SoCs more as the competition between Qualcomm and Mediatek is rather ferocious and upper mid becomes better and better.

    Got to wonder about A75 and the clocks it arrives at ... While at it, I hope that maybe you take a close look at the SD670 when it arrives as it seems it will slightly beat SD835 in CPU perf.

    On the GPU side, the biggest problem is the lack of real world tests. In PC we have that and we buy what we need, in mobile somehow being anything but first is a disaster and that's nuts. Not everybody needs a Ferrari but mobile reviews are trying to sell one to everybody.
    Reply
  • HStewart - Monday, January 22, 2018 - link

    This could be good example why Windows 10 for ARM will failed - it only works for Qualcomm CPU and could explain why Samsung created Intel based Windows Tablets

    I do believe that ARM especially Samsung has good market in Phone and Tablets - I love my Samsung Tab S3 but I also love my Samsung TabPro S - both have different purposes.
    Reply

Log in

Don't have an account? Sign up now