Memory Subsystems Compared

On the memory subsystem side, there’s quite a few big changes for both the Snapdragon 865 as well as the Exynos 990, as these are the first commercial SoCs on the market using LPDDR5. Qualcomm especially is said to have made huge progress in its memory subsystem, and we’re now able to verify the initially promissing results we saw on the QRD865 back in December with a production device.

And indeed, the news keeps on getting better for Qualcomm, as the new Galaxy S20 showcases even better memory results than we had measured on the reference device. The improvements over the Snapdragon 855 are just enormous and Qualcomm not only manages to catch up but very much now is able to beat the Exynos chips in terms of memory subsystem performance.

Arm very famously quotes that an improvement of 5ns in memory latency corresponds to an increase of around 1% in performance. And if that’s the case, Qualcomm will have had a ~12% improvement in CPU performance just by virtue of the new memory controller and SoC memory subsystem design. Our structural estimate in the memory latency falls in around 106 vs 124ns – most of the improvement seems to be due to how Qualcomm is now handling accesses to the DRAM chips themselves, previously attributing the bad latencies on the Snapdragon 855 due to power management mechanisms.

Samsung’s Exynos 990 also improves in memory latency compared to the Exynos 9820, but by a smaller margin than what the Snapdragon 865 was able to achieve. All latency patterns here are still clearly worse than the Qualcomm chip, and there’s some oddities in the results. Let’s zoom in into a logarithmic graph:

 

Comparing the Exynos 990 results vs the Exynos 9820, it’s now quite visible that the L2 cache has increased dramatically in size, similar to what we’ve described on the previous page, corresponding to the doubling of the available cache to a core from 1MB to 2MB. Samsung’s cores still have some advantages, for example they’re still on a 3-cycle L1 latency design whereas the Arm cores make due with 4-cycle accesses, however in other regards, the design just falls apart.

The TLB issues that we had described last year in the M4 are still very much present in the M5 core, which results in some absurd results such as random accesses over a 2MB region being actually faster than at 1MB. Cache-line accesses with TLB miss penalties now actually have lower access latencies in the L3 than in the L2 regions, and I have no idea what’s happening in the 16-64MB region in that test as it behaves worse than the 9820.

Examining the A76 cores of the Exynos 990, we see a much cleaner set of results more akin to what you’d expect to see from a CPU. Here we also see the 2MB SLC cache hierarchy in the 1-3MB region, meaning the Arm core cluster does have access to this cache, with the M5 cores bypassing it for better latency. Last year I had noted that the A76’s prefetchers had seen some massive improvements, and this is again evident here in the result sets of the two CPUs on the same chip as the middle cores actually handle some access patterns better than the M5 cores.

Samsung has had large issues with its memory subsystem ever since the M3 design, and unfortunately it seems they never addressed them, even with the more recent M5 core.

The Snapdragon 865 here is quite straightforward. The biggest difference to the 855, besides the improved DRAM latency, is the doubling of the L3 from 2 to 4MB which is also immediately visible. It still pales in comparison to the Apple A13’s cache hierarchy, but we do hope that the Arm vendors will be able to catch up in the next few years.

The Exynos 990 SoC: Last of Custom CPUs SPEC2006: Worst Disparity Yet
POST A COMMENT

137 Comments

View All Comments

  • MAGAover9000 - Tuesday, April 7, 2020 - link

    I have the s10+. Fantastic device. Very happy Reply
  • id4andrei - Friday, April 3, 2020 - link

    No need for me to praise this review any longer. Still, I must nitpick. The 3dmark GPU test always has caveats in your reviews. Drop it if you feel it is detected by OEMs or it's a false GPU test like the physics one.

    On web tests. I read on wiki that JetStream is an Apple made test, literally. Wouldn't you say that's a big caveat when testing against ios? Similarly Speedometer is developed by the webkit team at Apple. With Android webview based on Blink, not webkit, wouldn't Android smartphones be at a disadvantage against iphones? I don't see Kraken(Firefox) or Octane(Google) being used.

    Kraken would actually be neutral to both. Other 3rd party tests might be Testdrive(Microsoft) or Basemark.
    Reply
  • Andrei Frumusanu - Friday, April 3, 2020 - link

    I don't think that the fact that the WebKit team made those tests is a valid argument against using them. You can go and read the source JS yourself if you wish, and they're industry accepted benchmarks. Both Kraken and Octane are ancient and outdated and we dropped them just like we dropped SunSpider of the early days. Reply
  • id4andrei - Friday, April 3, 2020 - link

    Thank you for the prompt answer. Reply
  • s.yu - Friday, April 3, 2020 - link

    Thank you Andrei, again the most comprehensive and reliable set of samples anywhere!
    There seems to be considerable sample variation again (last time with Samsung was the main module since S9 with the variable aperture) in the UWA, S20+E and S20UE should have absolutely identical UWA performance but the S20UE seems to have far worse sagittal resolution than the S20+E, and Samsung's processing isn't that good in the first place, considering the 12MP 1.4μm could produce incredibly sharp pictures as that been the specs of the Pixels' main module for generations.
    I don't regret their switch to f/1.8 because the old module that went up to f/1.5 wasn't sharp wide open, especially in the corners, but a further two stops' variation to f/3.3 could be useful for more DoF in closeups provided inserting that physical aperture into the tiny module doesn't compromise the optical design otherwise.
    This time around the E seems to generally outperform the S, except in color as E doesn't seem to have proper color fidelity...almost as if chroma NR is set too high even in broad daylight, and the "hybridization" of the digital zoom, in which the E clearly uses a smaller portion from the periscope's readout than the S in the resulting merge. Speaking of the zoom, S20+ still performs slightly worse at 2x(16MP readout) than S10's native 12MP, though the difference is small and could be down to lens variation. Considering S10U's Z height, they could've easily fixed the S20U like Xiaomi, going 1/2.3" f/2 12MP with the 2x. Xiaomi used it despite a 4-1 bin, all the more reason to use it with a 9-1 bin. S20U's corner performance at 3x would also be much improved.
    Regarding the comparison with the Fuji though, I suspect your unit has trouble focusing to infinity correctly, because the train and forest samples show clear superiority of the Fuji's zoom. I especially recognize that kind of slight haziness as being very responsive to dehaze and low radius sharpening in LR and would result in far more detail with extraction in post. Also, with an ILC, there's always stopping down a little for more sharpness and more DoF.
    Regarding the full res modes, it's not worth storing 108MP of data with the CFA asking for a 9-1 bin, of course the 64MP would be better, without the RAW it's hard to say for sure, but the 64MP seems to be quad bayer.
    Reply
  • s.yu - Friday, April 3, 2020 - link

    I don't agree with your remark about the night comparison with Mate30P though, the UWA is not "UW" so it has better image quality, that's true, and the night mode of the Mate30P is far superior, that's also true, but not auto mode, nor any aspect of the telephoto as it's clearly using a crop of the main for 3x. Samsung does attempt to use the 4x for telephoto and although there's a significant issue of chroma noise, it's far sharper than Mate30P's crop, with at least twice 3 times the effective resolution in night mode. With S20U you could also crop out a single shot 3-4x of similar brightness to the Mate30P crop, but it's just a crop.
    As for the potential of P40P surpassing S20U, that model operates on a 9.4MP crop by default, interpolated to 12.5MP which clearly has consequences. In daylight it's often a regression compared to P30P (much less match Mate30P), and in night shots using the current firmware it has severe color issues of rendering large portions of the scene as a crimson red, so it's hard to say at this point too.
    Reply
  • s.yu - Friday, April 3, 2020 - link

    Oh, there's exception of the Mate30P auto mode in the last sample, but the night mode isn't constantly superior either. Reply
  • RealBeast - Friday, April 3, 2020 - link

    I've been looking forward to getting one of these, not sure which yet. The fly in the ointment now is that I won't see my Mom (who gets my old S9+) until the Fall due to the whole COVID problem, not to mention less income. That will weigh heavily on sales of what is otherwise an amazing looking phone for me. Reply
  • 29a - Friday, April 3, 2020 - link

    How large are the picture file sizes created by this thing? Reply
  • BedfordTim - Sunday, April 5, 2020 - link

    The same size as any other 12MP camera. They will depend on content, hdr, motion and compression options but I would expect about 36MB for a raw image and 8MB for a high quality jpeg. Reply

Log in

Don't have an account? Sign up now