Ryzen 5, Core Allocation, and Power

In our original review of Ryzen 7, we showed that the underlying silicon design of the Ryzen package consists of a single eight-core Zeppelin die with Zen microarchitecture cores.

The silicon design consists of two core complexes (CCX) of four cores apiece. Each CCX comes with 512 KB of L2 cache per core, which is disabled when a core is disabled, and each CCX has 8MB of L3 cache which can remain enabled even when cores are disabled. This L3 cache is an exclusive victim cache, meaning that it only accepts evicted L2 cache entries, rather than loading data straight into it (which is how Intel builds their current L3 cache designs).

One of the suggestions regarding Ryzen 7’s performance was about thread migration and scheduling on the core design, especially as core-to-core latency varies depending on where the cores are located (and there’s a jump between CCXes). Despite the use of AMD’s new Infinity Fabric, which is ultimately a superset of HyperTransport, there is still a slightly longer delay jumping over that CCX boundary, although the default Windows scheduler knows how to manage that boundary as demonstrated by Allyn at PCPerspective.

So when dealing with a four-core or six-core CPU, and the base core design has eight-cores, how does AMD cut them up? It is possible for AMD to offer a 4+0, 3+1 or 2+2 design for its quad-core parts, or 4+2 and 3+3 variants for its hexacore parts, similar to the way that Intel cuts up its integrated graphics for GT1 variants.

There are some positives and negatives to each configuration, some of which we have managed to view through this review. The main downside from high level to a configuration split across CCXes, such as a 2+2 or 3+3, is that CCX boundary. Given that the Windows scheduler knows how to deal with this means this is less of an issue, but it is still present.

There are a couple of upsides. Firstly is related to binning – if the 2+2 chips didn’t exist, and AMD only supported 4+0 configurations, then if the binning of such processors was down to silicon defects, fewer silicon dies would be able to be used, as one CCX would have to be perfect. Depending on yield this may or may not be an issue to begin with, but having a 2+2 (and AMD states that all 2+2 configs will be performance equivalent) means more silicon available, driving down cost by having more viable CPUs per wafer out of the fabs.

Secondly, there’s the power argument. Logic inside a processor expends energy, and more energy when using a higher voltage/frequency. When placing lots of high-energy logic next to each other, the behavior becomes erratic and the logic has to reduce in voltage/frequency to remain stable. This is why AVX/AVX2 from Intel causes those cores to run at a lower frequency compared to the rest of the core. A similar thing can occur within a CCX: if all four cores of a CCX are loaded (and going by Windows Scheduler that is what happens in order), then the power available to each core has to be reduced to remain stable. Ideally, if there’s no cross communication between threads, you want the computation to be in opposite cores as threads increase. This is not a new concept – some core designs intentionally put in ‘dark silicon’ - silicon of no use apart from providing extra space/area between high power consuming logic. By placing the cores in a 2+2 and 3+3 design for Ryzen 5, this allows the cores to run at a higher power than if they were in 4+0 and 4+2 configurations.

Here’s some power numbers to show this. First, let’s start with a core diagram.

Where exactly the 0/1/2/3 cores are labelled is not really important, except 0-3 are in one CCX and 4-7 are in another CCX. As we load up the cores with two threads each, we can see the power allocation change between them. It is worth noting that the Ryzen cores have a realistic voltage/frequency limit near 4.0-4.1 GHz due to the manufacturing process – getting near or above this frequency requires a lot of voltage, which translates into power.

First up is the 1800X, which is a 4+4 configuration with a maximum TDP of 95W. One fully loaded core gets 22.6W, and represents the core at its maximum frequency with XFR also enabled. The same thing happens with two cores fully loaded, but at 20.6 W apiece. Moving onto three cores loaded is where XFR is disabled, and we see the drop to 3.7 GHz is saving power, as we only consume +1.33W compared to the two cores loaded situation. Three to four cores, still all on the same CCX, shows a decrease in power per core.

As we load up the first core of the second CCX, we see an interesting change. The core on CCX-2 has a bigger power allocation than any core in CCX-1. This can be interpreted in two ways: there is more dark silicon around, leading to a higher potential for this core on CCX-2, or that more power is required given the core is on its own. Technically it is still running at the same frequency as the cores on CCX1. Now as we populate the cores on CCX-2, they still consume more power per core until we hit the situation where all cores are loaded and the system is more or less equal.

Moving to the Ryzen 5 1600X, which is a 3+3 configuration, nets more of the same. During XFR with one or two cores loaded, the power consumption is high. As we move onto the second CCX, the cores on CCX-2 consumer more power per core than those already loaded on CCX-1.

It is worth noting here that the jump from two cores loaded to three cores loaded on the 3+3 gives a drop in the total power consumption of the cores. Checking my raw data numbers, and this also translates to a total package power drop as well, showing how much extra effort it is to run these cores near 4.0 GHz with XFR enabled.

On the Ryzen 5 1500X, using a 2+2 configuration, the situation is again duplicated. The hard comparison here is the 2+2 of the 1500X to the 4+0 on the 1800X, because the TDP of each of the processors is different.

It should be noted however the total package power consumption (cores plus IO plus memory controller and so on) is actually another 10W or so above these numbers per chip.  

Power: Cores Only (Full Load)

The cache configurations play an important role in the power consumption numbers as well. In a 3+3 or a 2+2 configuration, despite one or two cores per CCX being disabled, the L3 cache is still fully enabled in these processors. As a result, cutting 25% of the cores doesn’t cut 25% of the total core power, depending on how the L3 cache is being used.

Nonetheless, the Ryzen 5 1600X, despite being at the same rated TDP as the Ryzen 7 1800X, does not get close to matching the power consumption. This ropes back into the point at the top of the page – usually we see fewer cores giving a higher frequency to match the power consumption with parts that have more cores. Because the silicon design has such a high barrier to get over 4.0 GHz with voltage and power, AMD has decided that it is too big a jump to remain stable, but still given the 1600X the higher TDP rating anyway. This may be a nod to the fact that it will cause users to go out and buy bigger cooling solutions, providing sufficient headroom for Turbo modes and XFR, giving better performance.

Despite this, we see the 1800X and 1500X each tear their TDP rating for power consumption (92W vs 95W and 67W vs 65W respectively).

However, enough talking about the power consumption. Time for benchmarks!

MOAR CORES Test Bed Setup and Hardware
POST A COMMENT

251 Comments

View All Comments

  • loguerto - Friday, April 21, 2017 - link

    9 is not prime :) Reply
  • LawJikal - Friday, April 21, 2017 - link

    What I'm surprised to see missing... in virtually all reviews across the web... is any discussion (by a publication or its readers) on the AM4 platform's longevity and upgradability (in addition to its cost, which is readily discussed).

    Any Intel Platform - is almost guaranteed to not accommodate a new or significantly revised microarchitecture... beyond the mere "tick". In order to enjoy a "tock", one MUST purchase a new motherboard (if historical precedent is maintained).

    AMD AM4 Platform - is almost guaranteed to, AT LEAST, accommodate Ryzen "II" and quite possibly Ryzen "III" processors. And, in such cases, only a new processor and BIOS update will be necessary to do so.

    This is not an insignificant point of differentiation.
    Reply
  • systemBuilder - Friday, April 28, 2017 - link

    I believe the Ryzen core is 20% slower than the Intel core, in instructions per clock. A hyperthread is only about 30% as fast as a full core. With both of these factors thrown in, 6 Ryzen Cores = 5 Intel cores. So the advantage of Ryzen is actually miniscule. It's why I sold all of my AMD stocks in February. Reply
  • willis936 - Thursday, July 27, 2017 - link

    "sold all of my AMD stocks in February"

    I'm cringing.
    Reply
  • systemBuilder - Friday, April 28, 2017 - link

    Ryzen's cores are 20% slower than Intel's. A hyperthread is only worth (at best) 30% as much as a full core. Therefore, Intel offers 4 cores, AMD offers 6 * 0.8 * 1.3 = 6.24 cores, a decent bump but obviously not significant because few if any games are set up to use more than 8 cores, which in the best case for AMD would be (6 + 0.3 + 0.3)*0.8 = 5.28 cores, a small bump. Reply
  • msroadkill612 - Monday, May 1, 2017 - link

    Some thoughts from a ~newb, are that if 8 cores are the new black, then maybe 16GB (or 2GB per core) of ram, isnt as generous as it seems?

    Also, its a new paradigm. Tasks which taxed the cpu and thus historically avoided (software raid e.g.), can now be embraced with ~impunity.

    "Normal" CPUs can handle 16 jobs before a queue forms, commonly, an increase by a factor of 8 for a prospective upgrader.
    Reply
  • Gothmoth - Tuesday, May 2, 2017 - link

    "...affords a comfortable IPC uplift over Broadwell....."

    yeah does it?

    what is comfortable??.... 10%.... who are you trying to kid here?
    Reply
  • msroadkill612 - Thursday, May 4, 2017 - link

    I still dont get what the deal w/ am4 mobos and a pair of m.2 pcie3 nand ssdS in raid 0 is?

    the x370 (but not the x350) chipset seems to allow an extra 4x pcie3 lanes, directly linked to the cpu (not shared lanes via the chipset), for one or 2 x onboard m.2 sockets.

    But its never made clear, to me anyway, that if u use 2 m.2 drives, does each get 2 lanes of pcie3, and therefore are perfectly matched, as desired by raid0.

    Surely its not just me that finds a 4GBps storage resource exciting?

    (e.g. see storage in specs on link re m.2)

    https://www.msi.com/Motherboard/X370-XPOWER-GAMING...

    https://www.msi.com/Motherboard/X370-XPOWER-GAMING...

    I suspect it translates to 2 x 2 lane pcie3 lanes - 2GBps for each m.2 nvme ssd socket, which surreally, is less than samsung nvme ssdS e.gS maxed out ability of 2.5GB+ ea.

    Drives are now too fast for the interface :)

    A pair of nand nvme ssds could individually max out each of the 2, 2 pci3 lane sockets (2 GB each), for a total of up to 4GBps read AND WRITE (normally write is much slower than read on single drives). Thats just insane storage speed vs historical norms - a true propeller head would kill for that.

    I also hear ssdS are so reliable now, that the risks of raid 0 are considerably diminished.

    IMO, a big question prospective ~server & workstation ryzen users should be asking, is if they can manage w/ 8 lanes of pcie3 for their gpu - which seems entirely possible?

    "Video cards do benefit from faster slots, but only a little. Unless you are swapping huge textures all the time, even 4x is quite close to 16x because the whole point of 8GB VRAM is to avoid using the PCIe at all costs. Plus many new games will pre-load textures in an intelligent manner and hide the latency. So, running two 8x SLI/CF is almost identical to two 16x cards. The M.2 drives are much faster in disk-intensive workloads, but the differences in consumer workloads (load an application, a game level) are often minimal. You really need to understand the kind of work you are doing. If you are loading and processing huge video streams, for example, then M.2 is worth it. NVMe RAID0 is even more extreme. Will the CPU keep up? Are you reaching a point of diminishing returns? And if you do need such power, you should consider a separate controller to offload the checksuming and related overhead, otherwise you will need 1 core just to keep up with the RAID array."

    (interesting last line - w/ 8 cores the new black, who cares?)

    This would free up 8x pcie3 lanes for a high end add in card if a big end of town app requires it.

    So yeah, re a raid 0 using 2 m.2 slots onboard a suitable 2xm.2 slot am4 mobo, do I get what i need for proper raid0?

    i.e.

    each slot is 2GBps, so my raid pair is evenly matched, and the pair theoretically capable of 4GBps b4 bandwidth is saturated?
    Reply
  • msroadkill612 - Thursday, May 4, 2017 - link

    PS re my prev post

    specifically from the link

    "• AMD® X370 Chipset
    ....
    • 2 x M.2 ports (Key M)
    - M2_1 slot supports PCIe 3.0 x4 (RYZEN series processor) or PCIe 3.0 x2 (7th Gen A-series/ Athlon™ processors) and SATA 6Gb/s 2242/ 2260 /2280/ 22110 storage devices
    - M2_2 slot supports PCIe 2.0 x4 and SATA 6Gb/s 2242/ 2260 /2280 storage devices
    • 1 x U.2 port
    - Supports PCIe 3.0 x4 (RYZEN series processor) or PCIe 3.0 x2 (7th Gen A-series/ Athlon™ processors) NVMe storage
    * Maximum support 2x M.2 PCIe SSDs + 6x SATA HDDs or 2x M.2 SATA SSDs + 4x SATA HDDs."

    it sure seems to be saying the 2nd m.2 poet would be a pcie2 port, and the first m.2 port uses the whole 4 pcie3 lanes linked to the cpu.

    thats sad if so - it means no matched pair for raid 0 onboard. only a separate controller would do.

    i cannot see why? why cant the 4 pcie3 lanes be shared evenly?
    Reply
  • asuchemist - Wednesday, May 17, 2017 - link

    Every review I read has different results but same conclusion. Reply

Log in

Don't have an account? Sign up now