Execution, Load/Store, INT and FP Scheduling

The execution of micro-ops get filters into the Integer (INT) and Floating Point (FP) parts of the core, which each have different pipes and execution ports. First up is the Integer pipe which affords a 168-entry register file which forwards into four arithmetic logic units and two address generation units. This allows the core to schedule six micro-ops/cycle, and each execution port has its own 14-entry schedule queue.

The INT unit can work on two branches per cycle, but it should be noted that not all the ALUs are equal. Only two ALUs are capable of branches, one of the ALUs can perform IMUL operations (signed multiply), and only one can do CRC operations. There are other limitations as well, but broadly we are told that the ALUs are symmetric except for a few focused operations. Exactly what operations will be disclosed closer to the launch date.

The INT pipe will keep track of branching instructions with differential checkpoints, to cut down on storing redundant data between branches (saves queue entries and power), but can also perform Move Elimination. This is where a simple mov command between two registers occurs – instead of inflicting a high energy loop around the core to physically move the single instruction, the core adjusts the pointers to the registers instead and essentially applies a new mapping table, which is a lower power operation.

Both INT and FP units have direct access to the retire queue, which is 192-entry and can retire 8 instructions per cycle. In some previous x86 CPU designs, the retire unit was a limiting factor for extracting peak performance, and so having it retire quicker than dispatch should keep the queue relatively empty and not near the limit.

The Load/Store Units are accessible from both AGUs simultaneously, and will support 72 out-of-order loads. Overall, as mentioned before, the core can perform two 16B loads (2x128-bit) and one 16B store per cycle, with the latter relying on a 44-entry Store queue. The TLB buffer for the L2 cache for already decoded addresses is two level here, with the L1 TLB supporting 64-entry at all page sizes and the L2 TLB going for 1.5K-entry with no 1G pages. The TLB and data pipes are split in this design, which relies on tags to determine if the data is in the cache or to start the data prefetch earlier in the pipeline.

The data cache here also has direct access to the main L2 cache at 32 Bytes/cycle, with the 512 KB 8-way L2 cache being private to the core and inclusive. When data resides back in L1 it can be processed back to either the INT or the FP pipes as required.

Moving onto the floating point part of the core, and the first thing to notice is that there are two scheduling queues here. These are listed as ‘schedulable’ and ‘non-schedulable’ queues with lower power operation when certain micro-ops are in play, but also allows the backup queue to sort out parts of the dispatch in advance via the LDCVT. The register file is 160 entry, with direct FP to INT transfers as required, as well as supporting accelerated recovery on flushes (when data is written to a cache further back in the hierarchy to make room).

The FP Unit uses four pipes rather than three on Excavator, and we are told that the latency in Zen is reduced as well for operations (though more information on this will come at a later date). We have two MUL and two ADD in the FP unit, capable of joining to form two 128-bit FMACs, but not one 256-bit AVX. In order to do AVX, the unit will split the operations accordingly. On the counter side each core will have 2 AES units for cryptography as well as decode support for SSE, AVX1/2, SHA and legacy mmx/x87 compliant code.

Fetch and Decode The Core Complex, Caches, and Fabric
Comments Locked

574 Comments

View All Comments

  • Meteor2 - Friday, March 3, 2017 - link

    ...In which case you'd be better off with a 7700K, looking at the benchmark results. Cheaper too.
  • ddriver - Thursday, March 2, 2017 - link

    Ryzen offers the same performance at half the cost. More pci-e lanes is good for io, however quad channel memory is pretty much pointless, aside of pointless synthetic benches. Ryzen might not make it to my personal workstation due to the low pci-e lane count, but it has enough to replace my aging 3770k farm nodes, to which it will be a significant upgrade, provided the chip and platform turn out to be stable and bug free,

    Intel has gotten lazy and sloppy, bricking products, chipset bugs, they haven't really done anything new architecture wise for years, milking the same old cow.

    It is rather silly to assume that gaming dictates CPU prices, this IS NOT a gaming product, if your ass-logic is to be followed, the intel needs to drop the 7700k price to 168, because in games it is barely any faster than the i3-7350K, and has the same pathetic, even lower than ryzen, number of pci-e lanes.

    This is a chip for HPC, which gaming is NOT. Go back to the kiddie garden, eight core chips are for grown ups ;)
  • imaheadcase - Thursday, March 2, 2017 - link

    People compare it to gaming, because its the main driving for these type of CPUs, its not even gaming specfic but VR, Graphics modeling, etc. You honestly think people are buying these for offices or industry for complex math problems? lol
  • ddriver - Thursday, March 2, 2017 - link

    It is not "people" but "fanboys", and they cling to gaming because it is the only workload where intel can offer better performance for the price, albeit by comparing products from different tiers, which is quite frankly moronic.

    Cars are faster than trucks, so who in the world needs to spend money on trucks? That's the kind of retarded logic you are advocating...

    Smart people buy whatever suits their needs. Obviously, if all you do is play games you wouldn't be buying ryzen or a lga2011 system. Just get an unlocked i5 and overclock it, best bang for the buck. You must realize that even if you don't, other people use computers for tasks other than gaming. And for a large portion of them ryzen will be the best deal, because it is versatile - it is good enough for gaming too, while still offering significant performance advantage compare to an intel quad in tasks that are time staking, and are very much competitive with intel's 8 and 10 core chips while delivering more than twice the value, which is important for everyone who doesn't have money to throw away.

    Claiming that "gaming is the main driving for these type of CPUs" is foolish to say the least, because games don't benefit from that particular type of CPUs. Most of the games can't even property utilize 4 threads. And this is not likely to change soon, because the overhead of complexity and thread synchronization is not worth it for non-performance demanding tasks such as games.
  • Lord-Bryan - Thursday, March 2, 2017 - link

    That's one really well thought out argument
  • rarson - Thursday, March 2, 2017 - link

    Ryzen's versatility and price are the two biggest factors that make it so good. It might not beat the very best gaming CPU that Intel has, or the very best multi-threaded monster that Intel has in every scenario, but it's competitive with both at half the price of the high-end stuff. Hence, while I do game some and want to build a computer to use for gaming, I also do other stuff like audio production that benefits greatly from Ryzen's multi-threaded performance. To me, it's a no-brainer: Ryzen right now is the best bang-for-the-buck chip for someone who wants all-around high-end performance, by far. Maybe not the 1800X, I kind of think the 1700X is a better value, but still, for most people who want multiple-use performance instead of absolute maximum gaming performance, Ryzen is the clear choice.

    Ryzen's max clock speeds seem, like Intel's, to be hindered by the total number of cores on chip, so it should be extremely interesting to see how the 4- and 6-core chips overclock once they arrive, and what kind of performance they'll achieve. I actually think that, like Intel, a 4-core Ryzen might be a better gaming chip than the 8-core, and if that's the case, then it might be really darn close to Intel's best Kaby Lake, because like you pointed out, most games aren't threaded well at all.

    Additionally, from a gaming perspective, it seems like AMD has done more to push technology forward in that respect than anyone else. They've worked on Mantle, Vulkan, FreeSync, TrueAudio, and others. They've always tried to give performance value by offering more cores, but software has been slow to take advantage of them. Intel is content to stagnate by offering extremely incremental increases because performance is "good enough" so developers have no reason to really try to take advantage of extra cores aside from outside use cases. With Ryzen, AMD is pushing chips towards higher core counts (much like they did with the Athlon X2) but this time, they're trying harder to get developers on board and help them achieve good results. So while it always takes forever for software to better utilize the hardware, once the hardware becomes more common the software will start to follow and you'll see the actual gaming performance improve. Is that a valid reason to buy Ryzen today if your sole focus is gaming? Of course not, but it does bode well for Ryzen owners in the future. The performance can only get better. Can the same be said about Intel? Well, probably not if you're using one of the 4-core chips. It's pretty much a known quantity.

    I had high hopes for Bulldozer and Ryzen is the exact opposite of what Bulldozer was. I feel like the CPU market has been stagnant for years and now suddenly there's a reason to be excited. This makes AMD competitive again, which will be good for pricing even if you're an Intel fan. It's been a long wait, but it was worth it, this is a good product.
  • Notmyusualid - Friday, March 3, 2017 - link

    http://www.gamersnexus.net/hwreviews/2822-amd-ryze...
  • Makaveli - Thursday, March 2, 2017 - link

    +1 ddriver you destroyed that kid with your logic well done.
  • khanikun - Friday, March 3, 2017 - link

    Gaming definitely isn't the main driving force for CPUs, as use case changes. I bought a 7700K for my gaming rig. I'd get a 1700 for a VM host, as I'd like to start building a lab again. It won't be this round though. I'd rather wait for AMD to iron our any kinks and buy the next generation. It's something the 7700K could do, but more cores would definitely make it a much better lab.
  • Meteor2 - Friday, March 3, 2017 - link

    Nobody buys mid/high-end consumer chips for HPC. They buy them for gaming. A few for video production. That's it.

Log in

Don't have an account? Sign up now