Fetch

For Zen, AMD has implemented a decoupled branch predictor. This allows support to speculate on incoming instruction pointers to fill a queue, as well as look for direct and indirect targets. The branch target buffer (BTB) for Zen is described as ‘large’ but with no numbers as of yet, however there is an L1/L2 hierarchical arrangement for the BTB. For comparison, Bulldozer afforded a 512-entry, 4-way L1 BTB with a single cycle latency, and a 5120 entry, 5-way L2 BTB with additional latency; AMD doesn’t state that Zen is larger, just that it is large and supports dual branches. The 32 entry return stack for indirect targets is also devoid of entry numbers at this point as well.

The decoupled branch predictor also allows it to run ahead of instruction fetches and fill the queues based on the internal algorithms. Going too far into a specific branch that fails will obviously incur a power penalty, but successes will help with latency and memory parallelism.

The Translation Lookaside Buffer (TLB) in the branch prediction looks for recent virtual memory translations of physical addresses to reduce load latency, and operates in three levels: L0 with 8 entries of any page size, L1 with 64 entries of any page size, and L2 with 512 entries and support for 4K and 256K pages only. The L2 won’t support 1G pages as the L1 can already support 64 of them, and implementing 1G support at the L2 level is a more complex addition (there may also be power/die area benefits).

When the instruction comes through as a recently used one, it acquires a micro-tag and is set via the op-cache, otherwise it is placed into the instruction cache for decode. The L1-Instruction Cache can also accept 32 Bytes/cycle from the L2 cache as other instructions are placed through the load/store unit for another cycle around for execution.

Decode

The instruction cache will then send the data through the decoder, which can decode four instructions per cycle. As mentioned previously, the decoder can fuse operations together in a fast-path, such that a single micro-op will go through to the micro-op queue but still represent two instructions, but these will be split when hitting the schedulers. The purpose of this allows the system to fit more into the micro-op queue and afford a higher throughput when possible.

The new Stack Engine comes into play between the queue and the dispatch, allowing for a low-power address generation when it is already known from previous cycles. This allows the system to save power from going through the AGU and cycling back around to the caches.

Finally, the dispatch can apply six instructions per cycle, at a maximum rate of 6/cycle to the INT scheduler or 4/cycle to the FP scheduler. We confirmed with AMD that the dispatch unit can simultaneously dispatch to both INT and FP inside the same cycle, which can maximize throughput (the alternative would be to alternate each cycle, which reduces efficiency). We are told that the operations used in Zen for the uOp cache are ‘pretty dense’, and equivalent to x86 operations in most cases.

The High Level Zen Overview Execution, Load/Store, INT and FP Scheduling
Comments Locked

574 Comments

View All Comments

  • Meteor2 - Friday, March 3, 2017 - link

    ...In which case you'd be better off with a 7700K, looking at the benchmark results. Cheaper too.
  • ddriver - Thursday, March 2, 2017 - link

    Ryzen offers the same performance at half the cost. More pci-e lanes is good for io, however quad channel memory is pretty much pointless, aside of pointless synthetic benches. Ryzen might not make it to my personal workstation due to the low pci-e lane count, but it has enough to replace my aging 3770k farm nodes, to which it will be a significant upgrade, provided the chip and platform turn out to be stable and bug free,

    Intel has gotten lazy and sloppy, bricking products, chipset bugs, they haven't really done anything new architecture wise for years, milking the same old cow.

    It is rather silly to assume that gaming dictates CPU prices, this IS NOT a gaming product, if your ass-logic is to be followed, the intel needs to drop the 7700k price to 168, because in games it is barely any faster than the i3-7350K, and has the same pathetic, even lower than ryzen, number of pci-e lanes.

    This is a chip for HPC, which gaming is NOT. Go back to the kiddie garden, eight core chips are for grown ups ;)
  • imaheadcase - Thursday, March 2, 2017 - link

    People compare it to gaming, because its the main driving for these type of CPUs, its not even gaming specfic but VR, Graphics modeling, etc. You honestly think people are buying these for offices or industry for complex math problems? lol
  • ddriver - Thursday, March 2, 2017 - link

    It is not "people" but "fanboys", and they cling to gaming because it is the only workload where intel can offer better performance for the price, albeit by comparing products from different tiers, which is quite frankly moronic.

    Cars are faster than trucks, so who in the world needs to spend money on trucks? That's the kind of retarded logic you are advocating...

    Smart people buy whatever suits their needs. Obviously, if all you do is play games you wouldn't be buying ryzen or a lga2011 system. Just get an unlocked i5 and overclock it, best bang for the buck. You must realize that even if you don't, other people use computers for tasks other than gaming. And for a large portion of them ryzen will be the best deal, because it is versatile - it is good enough for gaming too, while still offering significant performance advantage compare to an intel quad in tasks that are time staking, and are very much competitive with intel's 8 and 10 core chips while delivering more than twice the value, which is important for everyone who doesn't have money to throw away.

    Claiming that "gaming is the main driving for these type of CPUs" is foolish to say the least, because games don't benefit from that particular type of CPUs. Most of the games can't even property utilize 4 threads. And this is not likely to change soon, because the overhead of complexity and thread synchronization is not worth it for non-performance demanding tasks such as games.
  • Lord-Bryan - Thursday, March 2, 2017 - link

    That's one really well thought out argument
  • rarson - Thursday, March 2, 2017 - link

    Ryzen's versatility and price are the two biggest factors that make it so good. It might not beat the very best gaming CPU that Intel has, or the very best multi-threaded monster that Intel has in every scenario, but it's competitive with both at half the price of the high-end stuff. Hence, while I do game some and want to build a computer to use for gaming, I also do other stuff like audio production that benefits greatly from Ryzen's multi-threaded performance. To me, it's a no-brainer: Ryzen right now is the best bang-for-the-buck chip for someone who wants all-around high-end performance, by far. Maybe not the 1800X, I kind of think the 1700X is a better value, but still, for most people who want multiple-use performance instead of absolute maximum gaming performance, Ryzen is the clear choice.

    Ryzen's max clock speeds seem, like Intel's, to be hindered by the total number of cores on chip, so it should be extremely interesting to see how the 4- and 6-core chips overclock once they arrive, and what kind of performance they'll achieve. I actually think that, like Intel, a 4-core Ryzen might be a better gaming chip than the 8-core, and if that's the case, then it might be really darn close to Intel's best Kaby Lake, because like you pointed out, most games aren't threaded well at all.

    Additionally, from a gaming perspective, it seems like AMD has done more to push technology forward in that respect than anyone else. They've worked on Mantle, Vulkan, FreeSync, TrueAudio, and others. They've always tried to give performance value by offering more cores, but software has been slow to take advantage of them. Intel is content to stagnate by offering extremely incremental increases because performance is "good enough" so developers have no reason to really try to take advantage of extra cores aside from outside use cases. With Ryzen, AMD is pushing chips towards higher core counts (much like they did with the Athlon X2) but this time, they're trying harder to get developers on board and help them achieve good results. So while it always takes forever for software to better utilize the hardware, once the hardware becomes more common the software will start to follow and you'll see the actual gaming performance improve. Is that a valid reason to buy Ryzen today if your sole focus is gaming? Of course not, but it does bode well for Ryzen owners in the future. The performance can only get better. Can the same be said about Intel? Well, probably not if you're using one of the 4-core chips. It's pretty much a known quantity.

    I had high hopes for Bulldozer and Ryzen is the exact opposite of what Bulldozer was. I feel like the CPU market has been stagnant for years and now suddenly there's a reason to be excited. This makes AMD competitive again, which will be good for pricing even if you're an Intel fan. It's been a long wait, but it was worth it, this is a good product.
  • Notmyusualid - Friday, March 3, 2017 - link

    http://www.gamersnexus.net/hwreviews/2822-amd-ryze...
  • Makaveli - Thursday, March 2, 2017 - link

    +1 ddriver you destroyed that kid with your logic well done.
  • khanikun - Friday, March 3, 2017 - link

    Gaming definitely isn't the main driving force for CPUs, as use case changes. I bought a 7700K for my gaming rig. I'd get a 1700 for a VM host, as I'd like to start building a lab again. It won't be this round though. I'd rather wait for AMD to iron our any kinks and buy the next generation. It's something the 7700K could do, but more cores would definitely make it a much better lab.
  • Meteor2 - Friday, March 3, 2017 - link

    Nobody buys mid/high-end consumer chips for HPC. They buy them for gaming. A few for video production. That's it.

Log in

Don't have an account? Sign up now