The new Hexagon 690 now plays an even bigger role

The new Hexagon 690 is both an iteration on past generation units, as well as bringing a totally new aspect to the IP block. The new DSP block doubles up on the number of HVX vector pipelines as compared to the Hexagon 680 and 685 (as a reminder, the 680/685 contained four scalar units along with two 1024b vector processing units). On the scalar pipeline performance, the new unit sees a 20% increase in performance, which might be due to microarchitectural changes or simply a higher clock speed. On the vector side, he new Hexagon now contains four 1024b vector pipelines, doubling the processing power for those units.

What is entirely new for the new Hexagon IP is the introduction of a new dedicated “tensor accelerator” unit that is seemingly tightly coupled within the larger block. This would most likely be a generally more simplistic MAC array resembling that of other neural network inferencing engines from other vendors. Effectively, this thus represents Qualcomm’s first generation dedicated ML inferencing engine, although Qualcomm does market it as a 4th generation “AI engine” (Previously this nomenclature referred to ML inferencing on CPU, GPU and DSP).

One of the interesting limitations that Qualcomm has wrangled with in previous architectures was that DSP couldn’t process traditional DSP workloads alongside ML inferencing work. However for the S855, Qualcomm has confirmed that the new Tensor units are able to work concurrently and independently alongside the vector units. This means that traditional image processing tasks can now be done in parallel to inferencing tasks, vastly increasing the real-world usability of inferencing on the new Snapdragon platform.

Alongside the vector and tensor units is a new Voice Assistant AI module, designed to accelerate processing for common AI assistants – Qualcomm specifically states Baidu and Google in this case, although it’s unlikely to be exclusive to these two.

New Spectra 380 ISP brings a new Computer Vision accelerator

Alongside the AI Engine, Qualcomm is pairing it with an upgraded Spectra 380 image signalling processor. It also gets a boost to its processing power with a tri-core Computer Vision (CV) ISP. This might seem a little odd – Computer Vision is a family of algorithms that we used to have before the latest advent of machine learning acceleration, so why is it here?

Computer Vision IP is still very useful for various tasks, and in comparison to ML, it's computationally a lot lighter. As such, Qualcomm has deepened its ISP acceleration. With the CV-ISP, Qualcomm believes it can save 4x power (75%?) for image object classification, object segmentation, depth sensing (at 60 FPS), augmented reality body tracking, and image stabilisation.

It’s to be noted that Qualcomm’s claim about being the first computer vision ISP is a bit confusing – it’s definitely not the first SoC with a computer vision IP block, as at least Samsung had this two generations ago. Maybe Qualcomm is referencing some better integration within the ISP pipeline?

With the new accelerator, Qualcomm states that rather than using the AI Engine for everything, the CV-ISP is preferred for these tasks due to the power savings. The company expects many of the features we normally take for granted in photography will now be available for video as well, such as 4K HDR at 60 FPS with both portrait mode enhancements and depth sensing for live bokeh. Power consumption for 4K HDR capture will be 30% lower than the Snapdragon 845.

Also in the cards is support for new generations of image formats. Currently JPEG rules the roost in smartphone photography, however the new S855 will support ‘high efficiency image files’, or HEIF, which will offer a large number of new features over JPEG:

  • Regular Images Up to 50% Smaller
  • HDR Support for multiple color spaces
  • Computer Vision data support
  • RAW data
  • Depth Map support
  • Alpha Masking data
  • Burst Photography data
  • Animated Photography data
  • HEVC Video data

The idea here is to enable new use cases for the photos that users take every day. Within the same HEIF, it can store a photo from every camera: a telephoto, a wide angle photo, and a super-wide angle photo for example. This allows of post processing such as bokeh adjustment, object segmentation, quality adjustments, and to a certain extent even basic 3D modelling.

Snapdragon 855 Connectivity

With day one being about Qualcomm’s approach to 5G, connectivity is going to be at the heart of the S855. Built into the chip is the Snapdragon X24 modem, which offers LTE Category 20 class download speeds good for up to 2 Gbps on supported devices and networks. The increased speeds are achieved by using up to 7x carrier aggregation with 4x4 MIMO and 256QAM. On the upload side, the new SoC modem now also for the first time supports 256QAM, able to double the peak upload rate to up to 300Mbps with 3xCA.

Meanwhile when it comes to 5G, for the S855 platform it’s not the actual SoC which delivers this, but rather the external Snapdragon X50 modem. When combined with the S855, the platform will support both mmWave (>24 GHz) and sub-6 GHz 5G NR technologies. Qualcomm made mention of its entrance in the RFFE (RF front-end) field, which the company now includes in its QTM052 RFFE/antenna mini modules.

Not to be left behind, Qualcomm has also improved their local wireless networking capabilities as well. Snapdragon 855 support of the new WiFi 6 standard (802.11ax) as well as the new 60GHz “Wi-Gig” 802.11ay standard, enabled by an external WiFi combo chipset.

First impressions

Overall the new Snapdragon 855 promises some great improvements in pretty much every single aspect of the SoC. The biggest generational improvements seem to be on the CPU side, where Qualcomm promises a 45% jump in performance. The figure is pretty much in line with what we’ve seen on the Kirin 980, and that SoC performed excellently. And I’m certain Qualcomm will be able to squeeze out even more performance through software.

One big uncertaintly that I have is in regards to the CPU configuration. Here at first I thought the 1+3 configuration seemed reasonable, however the disclosure that the different cores are running on the same power/voltage plane has given me some doubts about how this will pan out in terms of power efficiency. Power efficiency of the CPU and the SoC in general use-cases seems to be something that Qualcomm hasn’t talked about at all today and this worries me quite a bit.

On the GPU side, as expected the new Adreno 640 is just an iterative design on the Adreno 630 – increasing the number of ALUs versus its predecessor by 50%. The performance gains were quoted at 20%, which are a bit on the conservative side but not something to be particularly worried about. Power efficiency is claimed to still be  leading edge, and at least based on the sustained performance metrics showed it looks like the Snapdragon 855 will continue to lead in this regard.

The Snapdragon 855 is Qualcomm’s first chipset to include a dedicated neural network inference engine / “NPU”. Here Qualcomm claims to have a lead over its nearest competitor, the Kirin 980, however it’s something that will need a thorough investigation in the future, as we’ve seen that both vendors' software stacks aren’t quite mature enough to be able to fully extract the capabilities of the hardware. Still, this is an important step for Qualcomm as it now allows concurrent usage of the DSP for image processing while having a dedicated hardware block for inferencing tasks.

The ISP and the new computer vision hardware seem fantastic, the new features are really raising the bar in what will be possible with computational photography as well as introduce new aspects such as computational videography. Here we’ll have to see what OEM vendors will be able to actually implement on final commercial devices.

Last but not least, the Snapdragon 855 is the first platform that promises to provide a complete solution of next generation connectivity technologies. Here the features aren’t directly featured into the SoC but rather in accompanying chipsets, so I do expect that there’s going to be many devices that will not chose the full set of next-generation solutions, but also we might see some devices which will implement the new X50 5G modem alongside the new WiFi6/.11ax and 11ay WiFi solutions.

Today’s announcement is just the first wave of information on the Snapdragon 855. To really judge the new chipset, we’re eagerly waiting on Qualcomm to allow us to do a performance preview on the reference designs. Most importantly, we’re still a few months away from the first commercial devices – which will represent the definitive evaluation of the new chipset.

New Cortex A76 Derived CPU & Adreno 640 GPU
POST A COMMENT

47 Comments

View All Comments

  • Desierz - Wednesday, December 05, 2018 - link

    "one thing I did note in my Huawei Mate 20 review is that the Pixel 3 and OnePlus 3 still felt faster in terms of application launch time"

    I think it would be better to actually have numbers, rather than what you 'felt'..
    Reply
  • III-V - Monday, December 10, 2018 - link

    It's a pre-dive. Who cares? Reply
  • Wardrive86 - Wednesday, December 05, 2018 - link

    So you estimate 128 ALUs per core/ 256 ALUs total for the A630? I must have read the article wrong because I thought you had estimated 256 ALUs per core /512 total. I was under the impression also that the Adreno 540 was a quad core GPU with 64 ALUs per core/ 256 Total Reply
  • Wardrive86 - Wednesday, December 05, 2018 - link

    The 727 Gflop Fmadds you quoted in the earlier article would assume 512 ALUs at 710 mhz Reply
  • Andrei Frumusanu - Thursday, December 06, 2018 - link

    I corrected this, it was a brain fart on my part. Reply
  • icalic - Thursday, December 06, 2018 - link

    hi Andrei, could you test Adreno 630 on clpeak GFLOPs benchmark and share the result?

    https://play.google.com/store/apps/details?id=kr.c...

    i think 727GFLOPs is too high for Adreno 630 and i believe it only slightly above tegra x1(>512GFLOPs) after i compare score in 3dmark sling shot extreme unlimited graphics. Adreno 630 maybe only have 384 ALU in total.
    Reply
  • Andrei Frumusanu - Thursday, December 06, 2018 - link

    clPeak isn't accurate. Reply
  • Wardrive86 - Thursday, December 06, 2018 - link

    Thank you for the clarification Reply
  • ZolaIII - Thursday, December 06, 2018 - link

    A540 is a quad cluster GPU, A630 is dual cluster with 2x more ALU's & it seams A640 is three cluster one on lower clock speed. Also meaning it will have a bit higher mm2 size. Reply
  • Rudde - Monday, December 10, 2018 - link

    According to wikipedia (unverified source), adreno 630 has 256 ALUs managing 4x 16-bit FMA calculations per cycle yielding 727 gflops at 710MHz. Assuming adreno 640 is similar, it has 384 ALUs managing 4x 16-bit FMA. 1.2x performance puts it at 872gflops, which suggests a 568MHz frequency. Reply

Log in

Don't have an account? Sign up now