Memory Subsystems Compared

On the memory subsystem side, there’s quite a few big changes for both the Snapdragon 865 as well as the Exynos 990, as these are the first commercial SoCs on the market using LPDDR5. Qualcomm especially is said to have made huge progress in its memory subsystem, and we’re now able to verify the initially promissing results we saw on the QRD865 back in December with a production device.

And indeed, the news keeps on getting better for Qualcomm, as the new Galaxy S20 showcases even better memory results than we had measured on the reference device. The improvements over the Snapdragon 855 are just enormous and Qualcomm not only manages to catch up but very much now is able to beat the Exynos chips in terms of memory subsystem performance.

Arm very famously quotes that an improvement of 5ns in memory latency corresponds to an increase of around 1% in performance. And if that’s the case, Qualcomm will have had a ~12% improvement in CPU performance just by virtue of the new memory controller and SoC memory subsystem design. Our structural estimate in the memory latency falls in around 106 vs 124ns – most of the improvement seems to be due to how Qualcomm is now handling accesses to the DRAM chips themselves, previously attributing the bad latencies on the Snapdragon 855 due to power management mechanisms.

Samsung’s Exynos 990 also improves in memory latency compared to the Exynos 9820, but by a smaller margin than what the Snapdragon 865 was able to achieve. All latency patterns here are still clearly worse than the Qualcomm chip, and there’s some oddities in the results. Let’s zoom in into a logarithmic graph:

 

Comparing the Exynos 990 results vs the Exynos 9820, it’s now quite visible that the L2 cache has increased dramatically in size, similar to what we’ve described on the previous page, corresponding to the doubling of the available cache to a core from 1MB to 2MB. Samsung’s cores still have some advantages, for example they’re still on a 3-cycle L1 latency design whereas the Arm cores make due with 4-cycle accesses, however in other regards, the design just falls apart.

The TLB issues that we had described last year in the M4 are still very much present in the M5 core, which results in some absurd results such as random accesses over a 2MB region being actually faster than at 1MB. Cache-line accesses with TLB miss penalties now actually have lower access latencies in the L3 than in the L2 regions, and I have no idea what’s happening in the 16-64MB region in that test as it behaves worse than the 9820.

Examining the A76 cores of the Exynos 990, we see a much cleaner set of results more akin to what you’d expect to see from a CPU. Here we also see the 2MB SLC cache hierarchy in the 1-3MB region, meaning the Arm core cluster does have access to this cache, with the M5 cores bypassing it for better latency. Last year I had noted that the A76’s prefetchers had seen some massive improvements, and this is again evident here in the result sets of the two CPUs on the same chip as the middle cores actually handle some access patterns better than the M5 cores.

Samsung has had large issues with its memory subsystem ever since the M3 design, and unfortunately it seems they never addressed them, even with the more recent M5 core.

The Snapdragon 865 here is quite straightforward. The biggest difference to the 855, besides the improved DRAM latency, is the doubling of the L3 from 2 to 4MB which is also immediately visible. It still pales in comparison to the Apple A13’s cache hierarchy, but we do hope that the Arm vendors will be able to catch up in the next few years.

The Exynos 990 SoC: Last of Custom CPUs SPEC2006: Worst Disparity Yet
Comments Locked

137 Comments

View All Comments

  • toyeboy89 - Friday, April 3, 2020 - link

    I'm really amazed in the fact that the iPhone XR is still beating snapdragon 865 in GFXBench in both peak and sustained performance. I am hoping the OnePlus 8 has better sustained performance.
  • TMCThomas - Friday, April 3, 2020 - link

    Amazing review! Always wait for this one before getting a new samsung. And I won't be getting any of the s20 phones. For me the kind of feel like "beta" phones. The 120hz which is not quite ready for 1440p yet, the underutilized 108mp camera, the space zoom which is blurry, the camera hole still being there the big camera bump and so on. I think all these features and more could be way more refined with the next galaxy s which I'll be waiting for. Also the poor exynos 990 performance especially the GPU part is just unacceptable to me. Especially with it probably being a lot better next year, so I'll skip this year
  • wheeliebin - Friday, April 3, 2020 - link

    Thanks Andrei, really good review!

    I have read many users complain about extra crazy post-processing on the S10/S20 series when there is a face detected in the frame. i.e. the phone will apply an aggressive 'smooth skin' filter that you can't disable unless you shoot RAW. I was hoping that your review might touch on this however there were no people in your example shots so perhaps you didn't get a chance to experience the problem. I wonder if you have heard of this issue and can replicate it yourself with the S20 range?
  • anonomouse - Friday, April 3, 2020 - link

    Hi Andrei, did you also run the bandwidth and MLP sweeps from previous reviews? Last year you noted the Snapdragon 855/A76 had peculiar behavior in the L1, and it would be also interesting to see if there are any MLP changes in both the SD865 and the Exynos.
  • anonomouse - Friday, April 3, 2020 - link

    Also, any idea why the new scores for these in 403.gcc seems to be worse than their previous generation products? In particular the score for the SD865 in these S20s is substantially worse than the SD865 score from the QRD preview article.
  • Andrei Frumusanu - Saturday, April 4, 2020 - link

    Yes I know. I don't know why that happens. I also got a V60 now and the scores there are higher, I'm wondering if there's something with Samsungs shared libraries.
  • anonomouse - Sunday, April 5, 2020 - link

    What type of compile flags are used for these binaries? Are they the same for all of the tested binaries (or even same binary on each given platform)? Are LTO or PGO used (and if not why not)?

    I'm also not convinced of this statement from the article:

    "I had mentioned that the 7LPP process is quite a wildcard in the comparisons here. Luckily, I’ve been able to get my hands on a Snapdragon 765G, another SoC that’s manufactured on Samsung’s EUV process. It’s also quite a nice comparison as we’re able to compare that chip’s performance A76 cores at 2.4GHz to the middle A76 cores of the Exynos 990 which run at 2.5GHz. Performance and power between the two chips here pretty much match each other, and a clearly worse than other TSMC A76-based SoCs, especially the Kirin 990’s. The only conclusion here is that Samsung’s 7LPP node is quite behind TSMC’s N7/N7P/N7+ nodes when it comes to power efficiency – anywhere from 20 to 30%."

    Both the energy consumed and the performance scores for both of these A76's seem to also very closely track the "mid" 2.43Ghz A76's on the TSMC-fabbed SD855 - all of which have similar L2's and similar frequencies, but possibly differ significantly (to the point of being suboptimal on latency) on the memory hierarchy and SoC beyond that - which greatly affects many of the SPEC workloads. All of these may also have implementation targets. Given this, is it really conclusive that the Samsung process is truly 20-30% worse in energy efficiency? Granted, things will probably not look pretty next year when TSMC is on a true 5nm and Samsung is not.
  • Andrei Frumusanu - Monday, April 6, 2020 - link

    The test is just -Ofast without any other addition. LTO wasn't/isn't in a good state on the Android NDK - it's something to look into in maybe a new binary revision.

    As for the 855 figures, well, that's also on an earlier 7nm. HiSilicon did a lot better in terms of they physical implementation. If not against N7, 7LPP clearly has a disadvantage against N7P/N7+.
  • Andrei Frumusanu - Saturday, April 4, 2020 - link

    I'll add them in, that test takes a whole day and I needed the phones doing battery tests and other stuff.
  • dad_at - Saturday, April 4, 2020 - link

    Again, your S10+ Exynos results in pc mark are false as of 2020. In performance mode I easily get 9500 work 2.0 overall, about 9600 in browser bench, 21K in photo editing. PC mark in general is inconsistent, irrelevant benchmark, not representative of actual performance in daily usage. The same about these ancient SPEC synthetics. No one uses these for performance evaluation now.

Log in

Don't have an account? Sign up now