I don’t think there’s any way to sugarcoat this, but 2015 has not been a particularly great year for Qualcomm in the high-end SoC business. The company remains a leading SoC developer, but Snapdragon 810, the company’s first ARMv8 AArch64-capable SoC, did not live up to expectations. Seemingly held back by design matters and a rough 20nm planar manufacturing process – a problem shared by many vendors in the last year – Snapdragon 810 couldn’t make good use of its highly clocked ARM Cortex-A57 cores, and ultimately struggled in the face of SoCs built on better processes such as Samsung’s surprisingly early Exynos 7420.

But the purpose of today’s article isn’t to reminisce about the past, rather it’s to look towards the future. Qualcomm knows all too well what has happened in the past year and the cost to the company that has come from it, so now they need to dust themselves off and try again. With Samsung’s more advanced 14nm FinFET process in hand, a new CPU core, a new GPU, and a number of other advancements, Qualcomm is ready to try again; to try to recapture the good old days of 28nm and their Krait CPU architecture.

To that end Qualcomm started talking about Snapdragon 820 early and doing so loudly. Last month the company held their first press demonstration of the SoC, showcasing early demonstrations in action and going into more detail than ever before on their performance and power projections for their next-generation SoC.

If there is any unfortunate aspect to any of this, it’s that while Qualcomm is showing off Snapdragon 820 today, it won’t be ready for the holidays (lining up with what we expect will be the typical spring smartphone refreshes). But some of this is clearly driven by Qualcomm’s business needs and the aforementioned effort at Qualcomm to quickly pick themselves up and try again.

Meanwhile after last month’s demonstrations, this month Qualcomm is ready to move on to the next phase in what has become their traditional roll-out process for a new SoC: giving the press access to the company’s Mobile Development Platform (MDP) devices. Designed for software developers to begin building apps and (for lack of a better word) experiences around the new SoC, the MDP is something of the home-stretch in SoC development, as it means Qualcomm is ready to let the press and developers see the hardware and near-final software stack. We’ve previously previewed the Snapdragon 800, 805, and 810 via their MDPs, and for Snapdragon 820 Qualcomm has once again opted to do the same. So without further ado, let’s take our first look at Snapdragon 820.

Qualcomm Snapdragon S810 Specifications
SoC Snapdragon 820 Snapdragon 810 Snapdragon 800
CPU 2x Kryo@1.593GHz
512KB(?) L2 cache

2x Kryo@2.150GHz
1MB(?) L2 cache
4x A53@1.555GHz
512KB L2 cache

4x A57@1.958GHz
2MB L2 cache
4x Krait 400@2.45GHz
4x512KB L2 cache
Memory
Controller
2x 32-bit
LPDDR4 @ 1803MHz

28.8GB/s b/w
2x 32-bit
LPDDR4 @ 1555MHz

24.8GB/s b/w
2x 32-bit
LPDDR3 @ 933MHz

14.9GB/s b/w
GPU Adreno 530
@ 624MHz
Adreno 430
@ 600MHz
Adreno 330
@ 600MHz
Mfc.
Process
Samsung
14nm LPP
TSMC
20nm SoC
TSMC
28nm HPm

Taking a trek down to sunny San Diego, Qualcomm handed to us the Snapdragon 820 MDP/S. A 6.2” phablet, the MDP/S is a development kit designed for function over form, containing a full system implementation (sans cellular) in an otherwise utilitarian design. Along with the Snapdragon 820 SoC, the 820 MDP/S also includes a 6.2” 2560x1600 display, 3GB of LPDDR4 memory runnning at a slightly higher 1804MHz instead of 1555MHz we've seen on the Snapdragon 810 and Exynos 7420, a 64GB Universal Flash Storage package, a 21MP rear camera, 802.11ac WiFi, and a Sense ID ultrasonic fingerprint scanner. Overall the aesthetics of the MDP/S differs significantly from what retail phones will go for, but internally the MDP/S won’t be far removed from the kinds of configurations we’ll see in 2016 smartphones.

Overall there’s little to report on the MDP/S experience itself. Qualcomm is still sorting out some driver bugs – only one device in our group was ready to run PCMark – and to be sure like past Qualcomm MDP previews this is very much a preview. However the experience was otherwise unremarkable (in a good way) with our unit completing all of our tests bar part of SPEC CPU 2000, which will require further analysis.

More interesting from a testing perspective is that Qualcomm opted to demonstrate Snapdragon 820 using the MDP/S smartphone development kit, instead of a larger MDP/T tablet development kit. Qualcomm has used MDP/T for the press demonstrations on both Snapdragon 800 and Snapdragon 810, so the fact that they are once again using the MDP/S is very notable. From a pure performance perspective the MDP/T allowed Qualcomm to show off previous Snapdragon designs at their best – these are just performance previews, after all – but after Snapdragon 810 I don’t doubt that had this been another MDP/T that the 820’s thermals and power consumption would be called into question. So instead we are looking at 820 in a phablet, and while this may not put 820 in the best possible light, the end result is that we get to see what performance in a large phone looks like, and for Qualcomm there isn’t any doubt about 820’s suitability for a smartphone.

As for Snapdragon 820 itself, we’ve already covered the SoC in some depth in past articles – and this week’s preview doesn’t come with much in the way of new architectural information – but here’s a quick recap of what we know so far. 820 uses a new Qualcomm developed CPU core called Kryo. The quad core CPU is best described as an HMP solution with two high-performance cores clocked at 2150 MHz and two low-power cores clocked at 1593MHz. The CPU architectures of both clusters are identical, but with differences in cache configuration and their power/frequency tuning.

Meanwhile the GPU inside 820 is the Adreno 530. This is a next-generation design from Qualcomm and includes functionality that until now has only been found in PC desktops, such as shared virtual memory with the CPU, which allows an OpenCL host program and a device's kernel to share a virtual address space so access to data structures like lists and trees can be easily shared between the host and GPU. The underlying architecture is capable of Renderscript and OpenCL 2.0 on the compute side – a significant step up from Adreno 400 – and on the graphics side supports OpenGL ES 3.1 + AEP and Vulkan. We know the 530 should be powerful, but like past Qualcomm designs the company is saying virtually nothing about the underlying architecture.

Finally, while it’s not something that can be covered in our brief testing, the 820 contains a new DSP block, the Hexagon 680. Hexagon 680 and its Hexagon Vector Extensions (HVX) are designed to handle significant compute workloads for image processing applications such as virtual reality, augmented reality, image processing, video processing, and computer vision. This means that tasks that might otherwise be running on a relatively power hungry CPU or GPU can run a comparatively efficient DSP instead. The HVX has 1024-bit vector data registers, with the ability to address up to four of these slots per instruction, which allows for up to 4096 bits per cycle.

CPU Performance: Meet Kryo
Comments Locked

146 Comments

View All Comments

  • lilmoe - Friday, December 11, 2015 - link

    He's not talking about clock speeds. He's talking about the whole package. Samsung has lots more experience with big.LITTLE and their implementations are far superior than competing chips on the SAME process node. Both the Exynos 5433 and the Snapdragon 808 are built on 20nm, yet the Exynos performs AND sustains its performance better than the Snapdragon.
  • testbug00 - Sunday, December 13, 2015 - link

    yes, Qualcomm's memory controller was busted. Doesn't make the A57 core any better.

    It's a pretty bad core compared to just about everything else ARM offers currently. A7, A9, A12/17, A53, A72. All far superior to the A57 overall.
  • tipoo - Thursday, December 10, 2015 - link

    On Samsungs 14nm process, yeah they ran ok. But it effectively cancelled out a generation's worth of fabrication process advantage, just to be able to run the things without throttle hell.
  • melgross - Thursday, December 10, 2015 - link

    This should be interesting. Phones delivering this chip will be seen, mostly, during the April-May period. That leaves them about 4 months, on average, before the iPhone 7 with the new A10 comes out. With this behind the A9 in many areas, that doesn't give them much leeway in performance or time.

    So most of the year leaves Apple's chips basically unchallenged. It seems to me that shipping schedules for flagship Android phones needs to shift, along with the release of high end SoCs to more closely match Apple's release dates, or there will always be this disparity.

    While it's often said that Android phone manufacturers are competing against one another more than they are competing against Apple, that's only true because they have a hard time competing against Apple at the higher end. Having phones that better compete in performance on the same release schedule would help somewhat.

    This chip really needed to come out last August, not next spring.
  • Refuge - Friday, December 11, 2015 - link

    I disagree, they have no problem competing with Apple at the high end. They won me easily.

    The A9 is a nice chip, but running iOS its like having Camaro SS with a limiter set at 75mph.

    I'm sorry, I just can't and won't consider the two eco-systems in any way similar. People buy the OS first and the device second. Like iphone, but want an android OS? Someone has an iPhone clone out right now just for you.
  • mdriftmeyer - Friday, December 11, 2015 - link

    Wow. The voice of one dot speaking against reality. Apple's SoC designs and implementations are only expanding their leads on the competition. That ecosystem they also dominate in is building ever greater loyalty: they deliver and the software matches the hardware.

    Move along and hope for the future.
  • mdriftmeyer - Friday, December 11, 2015 - link

    Above comment should have embedded below Refuge.
  • bug77 - Thursday, December 10, 2015 - link

    Nice preview, but, as it happens lately, what matters more is sustained performance, not some burst numbers during a single benchmark run.
  • jjj - Thursday, December 10, 2015 - link

    Interesting that they seem to be going with a small cache and the memory score is rather nuts for just 2x32bit.

    "And though one could have a spirited argument about whether single-threaded or multi-threaded performance is more important, I’m firmly on the side of ST for most use cases."
    Do note that SD820 has 2 cores clocked lower, it's not just 4 vs 8, it's 4+4 vs 2+2. Everybody in the dumb press will be tempeted to forget that 2 cores are clocked lower here .
    As for ST perf , the thing is that at this perf level ST is more than enough so it loses relevance. Would be nice if you guys would compare ST perf with Nehalem and newer desktop cores.

    Anyway, it blows that you insist on using the same empty synthetic benchmarks that have no relevance at all. SPECint2000 and Geekbench are fine but all else is irrelevant.

    "Where the 820 MDP/S makes up for it is in the photo editing score, which is through the roof. Here Qualcomm’s development device holds a 34% performance lead over the next-fastest device, the 810/A57 based Mi Note Pro."
    So using the GPU or DSP? If so , is it cheating or (all) actual apps will use the GPU/DSP too, as they should. How about the behaviour of all other SoCs. Long live synthetic! It's like begging them to cheat....

    "Apple’s commanding lead in ARM CPU performance."
    How is that exactly? Have you actually done any math , at the very least at equal per core power? In die area Apple is far behind but you don't like that being mentioned.
    In Geekbench Apple does 2.5k in each of the 3 segments, Kryo does about 2.1k in FP and integer and well over 3k in memory. So 20% higher clocks could eneble Kryo to match Apple's core in FP and integer. It's not impossible that in a dual core config Kryo could clock 20% higher. Same for A72. In the end if MTK can clock 2xA72 at 2.5GHz on 20nm, they could do much better on 16ff+. In theory 16ff+ can provide up to 40% higher speed over 20nm but only some 30% is needed. Ofc A72 is also much much smaller than Apple's core and you can actually make a cheap SoC with it for 150$ phones.

    vs A72 ,it's hard to assume things. If A72 goes to 2.5 GHz in quad config and matches the SD820 in power ,then it's somewhat even and not really.
    In Geekbench Kryo at 2.15 vs A72 at 2.5Ghz should be about even in integer with Kryo having some 10% lead in FP but Kryo would be at higher per core power.
    You got core 3 and 4 at likely half the power (or even less) at max load, so total power is like having 3 cores at max clocks. Folks could do that with A72 too.
    Ofc remains to be seen if A72 can reach 2.5GHz or even more with fewer cores and how everybody does in power.

    Will be very interesting to see Kryo in server. Assuming it will be a slightly tuned Kryo and not something very different.
    A72 does enable others to provide a multitude of configs in different price ranges and that could be interesting. Just today a Xiaomi device showed up in Geekbench with SD618 and just 2GB of RAM. 2GB of RAM would be too little for anything above Redmi 3 and Redmi 3 couldn't be priced above 699CNY (109$). Sure it would be dual A72 at low clocks on 28nm but it's a start.
  • tipoo - Thursday, December 10, 2015 - link

    You're assuming it will happily clock 20% higher with no disproportionate power draw increases. This is what Qualcomm provided, so it only makes sense for the reviewer to test it as they got it, rather than speculating on what it would be while higher clocked.

    I don't see how Apples die area matters to an end user. The cost is spread through the entirety of the product, they are premium products, but really all that matters in the end to a user is performance and battery life.

Log in

Don't have an account? Sign up now