The Snapdragon 855 SoC - A Recap

Although the Galaxy S10 is the first Snapdragon 855 device we’re reviewing, Qualcomm’s new chipset shouldn’t come with any major surprises. We had the opportunity to conduct an extensive and in-depth performance preview back in January at CES, which answered a lot of our initial questions about Qualcomm's new flagship SoC. Indeed the Snapdragon 855 largely met our expectations: The new CPU on the new process node  performs very similarly to the other 7nm + A76 design we've already seen – the Kirin 980 chipset from HiSilicon – with only minor differences on the CPU complex.

Where Qualcomm strongly differs from the competition is in in the auxiliary accelerator blocks such as GPU, DSP and the new tensor units. Let’s go over the specifications again:

Qualcomm Snapdragon Flagship SoCs 2018-2019
SoC

Snapdragon 855

Snapdragon 845
CPU 1x Kryo 485 Gold (A76 derivative)
@ 2.84GHz 1x512KB pL2

3x Kryo 485 Gold (A76 derivative)
@ 2.42GHz 3x256KB pL2

4x Kryo 485 Silver (A55 derivative)
@ 1.80GHz 4x128KB pL2

2MB sL3 @ 1612MHz
4x Kryo 385 Gold (A75 derivative)
@ 2.8GHz 4x256KB pL2

4x Kryo 385 Silver (A55 derivative)
@ 1.80GHz 4x128KB pL2

2MB sL3 @ 1478MHz
GPU Adreno 640 @ 585MHz Adreno 630 @ 710MHz
Memory
Controller
4x 16-bit CH @ 2092MHz
LPDDR4X
33.4GB/s

3MB system level cache
4x 16-bit CH @ 1866MHz
LPDDR4X
29.9GB/s

3MB system level cache
ISP/Camera Dual 14-bit Spectra 380 ISP
1x 48MP or 2x 22MP
Dual 14-bit Spectra 280 ISP
1x 32MP or 2x 16MP
Encode/
Decode
2160p60 10-bit H.265
HDR10, HDR10+, HLG
720p480
2160p60 10-bit H.265
720p480
Integrated Modem Snapdragon X24 LTE
(Category 20)

DL = 2000Mbps
7x20MHz CA, 256-QAM, 4x4

UL = 316Mbps
3x20MHz CA, 256-QAM
Snapdragon X20 LTE
(Category 18/13)

DL = 1200Mbps
5x20MHz CA, 256-QAM, 4x4

UL = 150Mbps
2x20MHz CA, 64-QAM
Mfc. Process TSMC
7nm (N7)
Samsung
10nm LPP

The Snapdragon 855 is Qualcomm’s first SoC powered by Arm’s new Cortex-A76 CPU core, which we also saw in the Kirin 980. Qualcomm still makes use of Arm’s “Built on Cortex Technology” license, where it requests changes to the CPU IP to be delivered by Arm. The end product ends up marketed as a Kryo CPU – in the case of the Snapdragon 855 the new “Kryo 485” CPU.

In past iterations it’s not always been clear exactly what changes Qualcomm had made to the CPU cores, so it was a surprising and much welcomed change to have the company actually provide concrete examples in the case of the new Snapdragon 855 CPU cores: The two big disclosed changes are an increase of the core’s reorder buffer from 128 entries to a higher, unspecified amount, as well as tuning the prefetchers to better work with floating point workloads.

The one thing that makes the Snapdragon 855 unusual though is the new physical CPU configuration. Both the Kirin 980 as well as the Snapdragon 855 both contain four Cortex A76 cores, however the two companies implement these in two completely different ways.

While HiSilicon had opted for a 2+2 core configuration where one pair clocks up to 2.6GHz and the other only reaches 1.92GHz, Qualcomm opts to go with a 1+3 setup. Under Qualcomm's setup one core is clocked up to 2.84GHz, and meanwhile the other three cores reach up to 2.42GHz. While at first glance this makes sense, things get confusing when accounting for the fact that the Snapdragon still only has a single voltage plane for all four CPU cores, whereas the Kirin’s CPU pairs both have their dedicated rails.

Qualcomm has explained that this was a deliberate choice which took into account the actual benefits, as well as (most importantly) the costs of the platform. Having an additional voltage rail means your PMIC needs an additional buck converter and you need to have additional inductors and capacitors on the motherboard, a cost not only in terms of actual component costs but also in terms of valuable PCB space.

What this means is that the power difference between the two CPU groups is much less than one would expect, but most interestingly it will be a difference that is solely dictated by the different physical implementations of the two cores.

In later sections we’ll address the efficiency difference between the two groups of cores, and one thing that was surprising is that the “middle” cores weren’t that much more efficient than the “prime” core. I extracted the power curves out of the scheduler, as dictated by Qualcomm, and this reveals a bit more information and clarification.

What we see is that the middle “Gold” cores’ power curve shape is shifted down towards lower power, meaning it starts growing at an exponential rate earlier than what we see on the “Prime” core. This would explain why at peak performance, the efficiency difference between the two cores is minor. When we look at the middle frequency points in particular though, we see what this power difference is more notable and actually at its greatest point does represent up to 20% lower power on the mid cores compares to the prime core.

Qualcomm also differentiates the large CPU cores by their cache configuration: The Prime core gets 512KB of L2 while the middle cores make due with 256KB. The Cortex A55’s have the usual 128KB and Qualcomm clocks them conservatively at 1.78GHz.

Finally, the DSU’s L3 cache comes in at 2MB. A big question I had is exactly how fast Qualcomm had clocked the cache at, and the answer is 1.6GHz. This represents a slight increase over the 1478MHz of the L3 cache found in the Snapdragon 845.

Other large architectural changes in the Snapdragon 855 are the new Adreno 640 GPU. Here Qualcomm supposedly has increased the execution units by 50% - yet only advertise a 20% boost in performance. The explanation here lies in the clock frequency of the new GPU. The Adreno 640 in the Snapdragon 855 runs at only 585MHz, markedly slower than the 710MHz of the Adreno 630. I suspect that Qualcomm saw some of the increasing power usage of the higher clock frequencies and decided it’s better to go wider and slower. Indeed, we’ll see that the Snapdragon 855 has managed to reduce power usage in 3D workloads ever so slightly compared to the Snapdragon 845 – something which should definitely help thermals and sustained performance.

Finally, the new Hexagon 690 DSP block has seen its biggest change in several years. Scalar performance has gone up by 20% through microarchitectural and clock frequency bumps, but most importantly the DSP's vector unit count has been doubled up from two to four units, doubling the HVX performance of the new cores. This is something that will be particularly visible in the AI workloads we’ll cover shortly.

The new tensor accelerator block in the Hexagon IP is a new fixed function unit that is meant to be used for machine inferencing. Currently this unit should likely be exclusively used by Samsung’s first-party software such as the camera app, as Qualcomm won't make it available to NNAPI until later in the year in Android Q. As we’ll see later on, API compatibility and availability these days is going to be a bigger worry than actual hardware performance for these SoCs.

Design & Ergonomics The Exynos 9820 SoC - A New Tri-CPU Design
POST A COMMENT

228 Comments

View All Comments

  • Brightontech - Sunday, April 21, 2019 - link

    it is an awesome phone
    <a href="https://www.brightontech.net/2019/04/audiovideo-ed... Editor and Video Converter</a>
    Video Editor and Video Converter
    Reply
  • Jhereck - Tuesday, April 23, 2019 - link

    Hi Andrei another question regarding the patch designed to increase PELT resonsiveness : is there any way a third party kernel can include it, therefore making s9 and s10 the devices they should be ?

    You know like last year when you tried to play with s9 exynos kernel in order to match snapdragon power and power efficency ?

    Thanks in advance
    Reply
  • Rixos - Thursday, May 02, 2019 - link

    It's kind of sad, I was actualy looking at the s10e as a replacement device for my galaxy S7 but as I live in Europe I would be getting the Exynos variant. Worse audio quality, less processing power and worse camera results. Basically seeing this kind of ruined the purchase for me. In some sense I wish I would not have seen it, the S10e is likely still a great upgrade for my S7 but knowing that there is a better version out there just ruins it for me. I guess ignorance sometimes really is bliss. Reply
  • theblitz707 - Thursday, May 23, 2019 - link

    I see this is in every review. I actually went to stores and used my phones ambient light sensor and an another phones flashlight to measure display brightnesses. Although slightly inaccurate lg g7 gave a 1050lux reading with boost on.(all test on apl100) Taking that as a base s9 plus did 1020 s10 plus did 1123 and p20 pro did around 900 when i shone my flashlight to each sensor. So why everyone makes it seem like they are less bright than they actually are? Does using a flashlight to trigger high brightness impossible to imagine? Let me tell you those oled screens get very bright with high ambient light like outside on a sunny day. Reply
  • ballsystemlord - Monday, June 03, 2019 - link

    Spelling and grammar corrections. I did not read the whole thing, so there maybe more.

    Samsung new L3 cache consists of two different structures
    Possesive:
    Samsung's new L3 cache consists of two different structures

    Similarly, the A75's should be a ton more efficient the A55 cores at the upper performance points of the A55's.
    Missing "than":
    Similarly, the A75's should be a ton more efficient than the A55 cores at the upper performance points of the A55's.

    Arm states that the new Cortex A76 has new state-of-the-art prefetchers and looking at what the CPU is able to do one my patterns I'd very much agree with this claim.
    Missing "to":
    Arm states that the new Cortex A76 has new state-of-the-art prefetchers and looking at what the CPU is able to do to one my patterns I'd very much agree with this claim.

    The nature of region-based prefetchers means that fundamentally any patterns which has some sort of higher-level repeatability will get caught and predicted, which unfortunately means designing a structured test other than a full random pattern is a bit complicated to achieve.
    "have" not "has" and a missing y:
    The nature of region-based prefetchers means that fundamentally any patterns which have some sort of higher-level repeatability will get caught and predicted, which unfortunately means designing a structured test other than a fully random pattern is a bit complicated to achieve.

    Switching over from linear graphs to logarithmic graphs this makes transitions in the cache hierarchies easier to analyse.
    Excess "this" and analyze is with a "z":
    Switching over from linear graphs to logarithmic graphs makes transitions in the cache hierarchies easier to analyze.

    Indeed one of the bigger microarchitectural changes of the core was the addition of a second data store unit.
    Missing comma:
    Indeed, one of the bigger microarchitectural changes of the core was the addition of a second data store unit.

    ...we see that in the L3 memory region store curve is actually offset by 1MB compared to the flip/load curves, which ending only after 3MB.
    "ed" not "ing":
    ...we see that in the L3 memory region store curve is actually offset by 1MB compared to the flip/load curves, which ended only after 3MB.

    "Traditionally such misses are tracked by miss status holding registers (MSHRs), however I haven't seen Arm CPUs actually use this nomenclature."
    This is almost certainly a run on sentence with missing punctuation. Try:
    "Traditionally, such misses are tracked by miss status holding registers (MSHRs). However, I haven't seen Arm CPUs actually use this nomenclature."

    "Again to have a wider range of performance comparison across ARMv8 cores in mobile here's a grand overview of the most relevant SoCs we've tested:"
    Missing comma:
    "Again, to have a wider range of performance comparison across ARMv8 cores in mobile here's a grand overview of the most relevant SoCs we've tested:"
    Reply
  • giallo - Monday, June 17, 2019 - link

    how much did they pay you to write this bullshit? you must be true downs Reply
  • theblitz707 - Monday, August 19, 2019 - link

    i discovered something about display brightness on oleds recently. I did a test with a7 with auto brightness on.

    Lets assume, on a slightly dark room you set your brightness to 25nits(whites), so when you go out to the sun phone boosts around 750-800 nits.

    Now lets assume on a slightly dark room you set your brightness to 250 nits, now when you go out to the sun phone boosts to 900nits. (what i actually did was not go in a dark room but while i was outside i covered the sensor with my hand so it thought i was in a dim place)

    I used to assume everytime you go out to sun it would get maxed but apparently it still depends on what you set your phone before.(dumb a bit if you ask me, cuz you know, its THE sun, brightest thing..) I believe this might be the reason why you didnt reach to 100APL 1200nits.

    P.s. I know every brightness sensor is different but i had tested lg on full white and i had gotten 1050 lux, i also tested s10 or plus, all white and i had gotten 1120lux on white,100APL.(It was painfully hard to find the sensor to shine the flashlight, its somewhere around upper part of the phone under the display).

    It would be cool if you retested the brightness in this way:

    1- After you put auto brightness on, Go in a very dark room or cover the sensor, so phone put itself to a dark brightness, after that happens, set the brigthness to max while you are still in the dark room.(auto is still on).
    2- Now go under sun or shine a phone flashlight to sensor and test the brightness on white APL100. That would be really nice.
    Reply
  • theblitz707 - Monday, August 19, 2019 - link

    lg is g7 on boosted, forgot to mention Reply

Log in

Don't have an account? Sign up now