Core-to-Core, Cache Latency, Ramp

For some of our standard tests, we look at how the CPU performs in a series of synthetic workloads to example any microarchitectural changes or differences. This includes our core-to-core latency test, a cache latency sweep across the memory space, and a ramp test to see how quick a system runs from idle to load.

Core-to-Core

Inside the chip are eight cores connected through a bi-directional ring, each direction capable of transmitting 32 bytes per cycle. In this test we test how long it takes to probe an L3 cache line from a different core on the chip and return the result.

For two threads on the same core, we’re seeing a 7 nanosecond difference, whereas for two separate cores we’re seeing a latency from 15.5 nanoseconds up to 21.2 nanoseconds, which is a wide gap. Finding out exactly how much each jump takes is a bit tricky, as the overall time is reliant on the frequency of the core, of the cache, and of the fabric over the time of the test. It also doesn’t tell us if there is anything else on the ring aside from the cores, as there is also going to be some form of external connectivity to other elements of the SoC.

However, compared to the Zen3 numbers we saw on the Ryzen 9 5980HS, they are practically the same.

Cache Latency Ramp

This test showcases the access latency at all the points in the cache hierarchy for a single core. We start at 2 KiB, and probe the latency all the way through to 256 MB, which for most CPUs sits inside the DRAM.

Part of this test helps us understand the range of latencies for accessing a given level of cache, but also the transition between the cache levels gives insight into how different parts of the cache microarchitecture work, such as TLBs. As CPU microarchitects look at interesting and novel ways to design caches upon caches inside caches, this basic test proves to be very valuable.

The data here again mirrors exactly what we saw with the previous generation on Zen3.

Frequency Ramp

Both AMD and Intel over the past few years have introduced features to their processors that speed up the time from when a CPU moves from idle into a high-powered state. The effect of this means that users can get peak performance quicker, but the biggest knock-on effect for this is with battery life in mobile devices, especially if a system can turbo up quick and turbo down quick, ensuring that it stays in the lowest and most efficient power state for as long as possible.

Intel’s technology is called SpeedShift, although SpeedShift was not enabled until Skylake.

One of the issues though with this technology is that sometimes the adjustments in frequency can be so fast, software cannot detect them. If the frequency is changing on the order of microseconds, but your software is only probing frequency in milliseconds (or seconds), then quick changes will be missed. Not only that, as an observer probing the frequency, you could be affecting the actual turbo performance. When the CPU is changing frequency, it essentially has to pause all compute while it aligns the frequency rate of the whole core.

We wrote an extensive review analysis piece on this, called ‘Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics’, due to an issue where users were not observing the peak turbo speeds for AMD’s processors.

We got around the issue by making the frequency probing the workload causing the turbo. The software is able to detect frequency adjustments on a microsecond scale, so we can see how well a system can get to those boost frequencies. Our Frequency Ramp tool has already been in use in a number of reviews.

A ramp time of within one millisecond is as expected for modern AMD platforms, although we didn’t see the high 4.9 GHz that AMD has listed this processor as being able to obtain. We saw it hit that frequency in a number of tests, but not this one. AMD’s previous generation took a couple of milliseconds to hit around the 4.0 GHz mark, but then another 16 milliseconds to go full speed. We didn’t see it in this test, perhaps due to some of the new measurements AMD is doing on core workload and power. We will have to try this on a different AMD Ryzen 6000 Mobile system to see if we get the same result.

AMD's Ryzen 9 6900HS Rembrandt Benchmarked Power Consumption
Comments Locked

92 Comments

View All Comments

  • DannyH246 - Wednesday, March 2, 2022 - link

    For a laugh.
  • Speedfriend - Wednesday, March 2, 2022 - link

    Seriously, how old are you?
  • abufrejoval - Friday, March 4, 2022 - link

    It's a slow season (for computers) so they have to spread it out some. The other pieces evidently have been prepared already as parting gifts by Ian.
  • vegemeister - Tuesday, March 1, 2022 - link

    >Per-Thread Power/Clock Control: Rather than being per core, each thread can carry requirements

    Does that imply the core can change its voltage and clocking on the same timescale as switching SMT thread? I thought modern SMT was fine-grained enough that there are instructions from both threads in-flight at once.

    Or is it just for simplifying the OS's cpufreq driver?

    >For example, if a core is idle for a few seconds, would it be better to put in a sleep state?

    A few hundred microseconds, surely?
  • Arnulf - Tuesday, March 1, 2022 - link

    "... following AMD’s cadence of naming its mobile processors after painters"

    As opposed to what, their desktop lineup naming (also named after painters)? Consumer processors are named after painters.
  • syxbit - Tuesday, March 1, 2022 - link

    >>While we haven’t touched battery life or graphics in this article

    that's pretty critical for a Laptop review.
    I'm pretty tired of Intel reviews constantly covering their 12th gen superiority without talking about power. It's easy to beat a competitor if you just double the power budget. It's laughable that Intel is pretending they've caught up to Apple.
  • Oxford Guy - Tuesday, March 1, 2022 - link

    I am sure those producing the Steam handheld would like reviewers to not test battery life.
  • ninjaquick - Tuesday, March 1, 2022 - link

    How fast do these chips perform vp9 4k decode? A major use case moving forward will be game streaming, and I'm struggling to find hardware acceleration numbers.
  • dwillmore - Tuesday, March 1, 2022 - link

    Error on page 3: "yCrundher" is a misspelling
  • YukaKun - Tuesday, March 1, 2022 - link

    Writing this from a 5900HX (Asus G17 Strix) and upgrading from a i7 7700HQ that, I have to say is really efficient for what it is, the AMD laptop is just in another league of its own. Both have a 90Wh battery and the Intel, not even new, would break the 4h mark. This thing has as much usage as my tablets with normal usage. It's really impressive and, for the go stuff, it's so SO nice. Then you need to game and it just works. The 6800M is quite the beast in its own right. Sad this thing doesn't have a mux switch, but it still works amazingly well.

    This preamble was just to say, I'm surprised the 6000HK isn't a lot better, but I guess it's to be expected. On paper, the 6000 mobile series has a lot of potential with PCIe4 and slightly better process. DDR5 is too new IMO to show a definitive advantage on mobile, but maybe next gen will leap. I have DDR4L 3200 with my 5900HX and I put DDR4L 2666 to the i7 7700HQ, so DDR5L needs to be way faster than the crappy 4800 MT/s JEDEC spec we have currently.

    Regards.

Log in

Don't have an account? Sign up now