Memory Subsystems Compared

On the memory subsystem side, there’s quite a few big changes for both the Snapdragon 865 as well as the Exynos 990, as these are the first commercial SoCs on the market using LPDDR5. Qualcomm especially is said to have made huge progress in its memory subsystem, and we’re now able to verify the initially promissing results we saw on the QRD865 back in December with a production device.

And indeed, the news keeps on getting better for Qualcomm, as the new Galaxy S20 showcases even better memory results than we had measured on the reference device. The improvements over the Snapdragon 855 are just enormous and Qualcomm not only manages to catch up but very much now is able to beat the Exynos chips in terms of memory subsystem performance.

Arm very famously quotes that an improvement of 5ns in memory latency corresponds to an increase of around 1% in performance. And if that’s the case, Qualcomm will have had a ~12% improvement in CPU performance just by virtue of the new memory controller and SoC memory subsystem design. Our structural estimate in the memory latency falls in around 106 vs 124ns – most of the improvement seems to be due to how Qualcomm is now handling accesses to the DRAM chips themselves, previously attributing the bad latencies on the Snapdragon 855 due to power management mechanisms.

Samsung’s Exynos 990 also improves in memory latency compared to the Exynos 9820, but by a smaller margin than what the Snapdragon 865 was able to achieve. All latency patterns here are still clearly worse than the Qualcomm chip, and there’s some oddities in the results. Let’s zoom in into a logarithmic graph:

 

Comparing the Exynos 990 results vs the Exynos 9820, it’s now quite visible that the L2 cache has increased dramatically in size, similar to what we’ve described on the previous page, corresponding to the doubling of the available cache to a core from 1MB to 2MB. Samsung’s cores still have some advantages, for example they’re still on a 3-cycle L1 latency design whereas the Arm cores make due with 4-cycle accesses, however in other regards, the design just falls apart.

The TLB issues that we had described last year in the M4 are still very much present in the M5 core, which results in some absurd results such as random accesses over a 2MB region being actually faster than at 1MB. Cache-line accesses with TLB miss penalties now actually have lower access latencies in the L3 than in the L2 regions, and I have no idea what’s happening in the 16-64MB region in that test as it behaves worse than the 9820.

Examining the A76 cores of the Exynos 990, we see a much cleaner set of results more akin to what you’d expect to see from a CPU. Here we also see the 2MB SLC cache hierarchy in the 1-3MB region, meaning the Arm core cluster does have access to this cache, with the M5 cores bypassing it for better latency. Last year I had noted that the A76’s prefetchers had seen some massive improvements, and this is again evident here in the result sets of the two CPUs on the same chip as the middle cores actually handle some access patterns better than the M5 cores.

Samsung has had large issues with its memory subsystem ever since the M3 design, and unfortunately it seems they never addressed them, even with the more recent M5 core.

The Snapdragon 865 here is quite straightforward. The biggest difference to the 855, besides the improved DRAM latency, is the doubling of the L3 from 2 to 4MB which is also immediately visible. It still pales in comparison to the Apple A13’s cache hierarchy, but we do hope that the Arm vendors will be able to catch up in the next few years.

The Exynos 990 SoC: Last of Custom CPUs SPEC2006: Worst Disparity Yet
POST A COMMENT

135 Comments

View All Comments

  • FunBunny2 - Friday, April 3, 2020 - link

    this is what I mean.

    "If you can run a game at 100 frames per second, you may see a tangible benefit from playing it on a monitor that can refresh that many times per second. But if you’re watching a movie at a classic 24 FPS (frames per second), a higher refresh rate monitor won’t make any difference."

    here: https://www.digitaltrends.com/computing/do-you-nee...

    IOW, unless the processor sending either video or coded application images does so 120 per second, all the 120hz screen does is re-scan each image multiple times. how can the refresh rate create modified images, between those sent by the processor? or do 90/120hz screens do just that?

    do you disagree with that author?
    Reply
  • krazyfrog - Friday, April 3, 2020 - link

    The screen refreshes at a set rate regardless of the content being sent to it. In this case, it always refreshes at 120Hz. If the content is in 24fps, each frame of the video persists for 5 refreshes of the display. To the eye, it looks no different than watching the same 24fps video on a 60Hz display. Reply
  • surt - Saturday, April 4, 2020 - link

    Not true. It does not look the same to your eye, and the difference is the latency from the time that information is ready to display to the time it reaches your eye. The 120hz display will show that transition from e.g. the 23rd to the 24th frame significantly faster. Reply
  • FunBunny2 - Sunday, April 5, 2020 - link

    " It does not look the same to your eye"

    that's a may be. years ago I worked in a manufacturing plant, no windows and only florescent lights. one of the guys I worked with wore glasses that looked like very weak sunglasses, but no prescription. I asked him about them and he said his eye doctor prescribed them for his constant headaches. turns out that some folks rectify the 60hz flash of florescent light, and it hurts. the same phenomenon would occur with monitors. if you're not among the rectifiers, it's hard to see how you would see different at 120hz.
    Reply
  • surt - Sunday, April 5, 2020 - link

    And yet, it's not hard to see at all. Response tests are undeniable. People's reactions are unquestionably faster on 120hz. Whether you notice the difference or not, it exists. Reply
  • surt - Saturday, April 4, 2020 - link

    It matters to any game. If your game updates at 30fps, the 120hz display will get that information to your eye a fraction faster than the 60hz display, because the 'time to next frame' + 'time to display next frame' is always smaller on the 120hz. Reply
  • eastcoast_pete - Friday, April 3, 2020 - link

    Great review, thanks Andrei! Question: just how much power draw does the 5G modem add, especially the mm ones for us in the US? Along those lines, can the 5G function disabled in software, so not just deselected, but actually shut off? I imagine that the phone hunting for mm connectivity when it's not there could eat quite a bit of battery life. Reply
  • Andrei Frumusanu - Friday, April 3, 2020 - link

    I don't even have 5G coverage here so I wouldn't know!

    Yes, 5G can be disabled in the options. I would assume that actually shuts off the extra RF. Similarly, I don't know how the mmWave antenna power management works.
    Reply
  • eastcoast_pete - Friday, April 3, 2020 - link

    Thanks for the reply! mm 5G coverage is supposedly "available" in some places here in the US, but I don't believe the carriers here have set up anywhere near enough cells for it to be viable. Plus, even if I'd get Gb download rates, they still have caps on their plans, unless one shells out for the premium unlimited ones. And those make the 20 Ultra's price tag look like a bargain (: Reply
  • Reflex78 - Friday, April 3, 2020 - link

    I live in Europe, I like Samsung and have S9 at the moment.
    But I will never pay +1000€ for lower quality Exynos S20 version in Europe!
    This is a big mistake from the company management to allow such a difference between this 2 variants at the same price!
    And I just read that they have chosen to sell Snapdragon version even for their home country:
    https://www.phonearena.com/news/Samsung-chip-divis...
    Reply

Log in

Don't have an account? Sign up now