Fetch

For Zen, AMD has implemented a decoupled branch predictor. This allows support to speculate on incoming instruction pointers to fill a queue, as well as look for direct and indirect targets. The branch target buffer (BTB) for Zen is described as ‘large’ but with no numbers as of yet, however there is an L1/L2 hierarchical arrangement for the BTB. For comparison, Bulldozer afforded a 512-entry, 4-way L1 BTB with a single cycle latency, and a 5120 entry, 5-way L2 BTB with additional latency; AMD doesn’t state that Zen is larger, just that it is large and supports dual branches. The 32 entry return stack for indirect targets is also devoid of entry numbers at this point as well.

The decoupled branch predictor also allows it to run ahead of instruction fetches and fill the queues based on the internal algorithms. Going too far into a specific branch that fails will obviously incur a power penalty, but successes will help with latency and memory parallelism.

The Translation Lookaside Buffer (TLB) in the branch prediction looks for recent virtual memory translations of physical addresses to reduce load latency, and operates in three levels: L0 with 8 entries of any page size, L1 with 64 entries of any page size, and L2 with 512 entries and support for 4K and 256K pages only. The L2 won’t support 1G pages as the L1 can already support 64 of them, and implementing 1G support at the L2 level is a more complex addition (there may also be power/die area benefits).

When the instruction comes through as a recently used one, it acquires a micro-tag and is set via the op-cache, otherwise it is placed into the instruction cache for decode. The L1-Instruction Cache can also accept 32 Bytes/cycle from the L2 cache as other instructions are placed through the load/store unit for another cycle around for execution.

Decode

The instruction cache will then send the data through the decoder, which can decode four instructions per cycle. As mentioned previously, the decoder can fuse operations together in a fast-path, such that a single micro-op will go through to the micro-op queue but still represent two instructions, but these will be split when hitting the schedulers. The purpose of this allows the system to fit more into the micro-op queue and afford a higher throughput when possible.

The new Stack Engine comes into play between the queue and the dispatch, allowing for a low-power address generation when it is already known from previous cycles. This allows the system to save power from going through the AGU and cycling back around to the caches.

Finally, the dispatch can apply six instructions per cycle, at a maximum rate of 6/cycle to the INT scheduler or 4/cycle to the FP scheduler. We confirmed with AMD that the dispatch unit can simultaneously dispatch to both INT and FP inside the same cycle, which can maximize throughput (the alternative would be to alternate each cycle, which reduces efficiency). We are told that the operations used in Zen for the uOp cache are ‘pretty dense’, and equivalent to x86 operations in most cases.

The High Level Zen Overview Execution, Load/Store, INT and FP Scheduling
Comments Locked

574 Comments

View All Comments

  • BurntMyBacon - Friday, March 3, 2017 - link

    @ShieTar: "Well, the point of low-resolution testing is, that at normal resolutions you will always be GPU-restricted."

    If this statement is accepted as true, then by deduction, for people playing at normal (or high) resolutions, gaming is not a differentiator and therefore unimportant to the CPU selection process. If gaming is your only criteria for CPU selection, then that means you can get the cheapest CPU possible until you are not GPU restricted.

    @ShieTar: "The most interesting question will be how Ryzen performs on those few modern games which manage to be CPU-restricted even in relevant resolutions, e.g. Battlefield 1 Multiplayer."

    I agree here fully. Show CPU heavy titles to tease out the difference between CPUs. Artificially low resolutions are academic at best. That said, according to Steam Surveys, just over half of their respondents are playing at resolutions less than 1080P. Over a third are playing at 1366x768 or less. Though, I suspect the overlap between people playing at these resolutions and people using high end processors is pretty small.

    Average frame rate is fairly uninteresting in most games for high end CPUs, due to being GPU bound or using unrealistic settings. Some, more interesting, metrics are min frame rate, frame time distribution (or simply graph it), frame time consistency, and similar. These metrics do more to show how different CPUs will change the experience for the player in a configuration the player is more likely to use.
  • Lord-Bryan - Thursday, March 2, 2017 - link

    Who buys a 500 dollar cpu to play games at 720p res. All that talk is just BS.
  • JMB1897 - Friday, March 3, 2017 - link

    That test is not done for real world testing reasons. At that low resolution, you're not GPU bound, you're CPU bound. That's why the test exists.

    Now advance a few years into the future when you still have your $500 Ryzen 7 CPU and a brand new GPU - you may suddenly become CPU bound even at QHD or 4k, whereas a 7700k might not quite be CPU bound just yet.
  • MAC001010 - Saturday, March 4, 2017 - link

    Or a few years in the future (when you get your new GPU) you find that games have become more demanding but better multi-threaded, in which case your Ryzen 7 CPU works fine and the 7700k has become a bottleneck despite its high single-threaded performance.

    This illustrates the inherent difficulty of comparing high freq. CPUs to high core count CPUs in regards to future potential performance.
  • cmdrdredd - Saturday, March 4, 2017 - link

    "Or a few years in the future (when you get your new GPU) you find that games have become more demanding but better multi-threaded, in which case your Ryzen 7 CPU works fine and the 7700k has become a bottleneck despite its high single-threaded performance."

    Maybe, the overclocking scenario is also important. Most gamers will overclock to get a bit of a boost. I have yet to replace my 4.5Ghz 3570k even though new CPUs offer more raw performance, the need hasn't been there yet.

    One other interesting thing is how Microsoft's PlayReady 3.0 will be supported for 4k HDR video content protection. So far I know Kaby Lake supports it, but haven't heard about any of AMD's offerings unless I missed it somewhere.
  • Cooe - Sunday, February 28, 2021 - link

    Lol, except here in reality the EXACT OPPOSITE thing happened. A 6-core/12-thread Ryzen 5 1600 still holds up GREAT in modern titles/game engines thanks to the massive advantage in extra CPU threads. A 4c/4t i5-7600K otoh? Nowadays it performs absolutely freaking TERRIBLY!!!
  • basha - Thursday, March 2, 2017 - link

    all the reviews i read are using NVidia 1080 gfx card. my understanding is AMD graphics has better implementation of DX12 with ability to use multiple cores. I would like to see benchmarks with something like RX480 crosfire with 1700x. this would be in the similar budget as i7 7700 + GTX 1080.
  • Notmyusualid - Friday, March 3, 2017 - link

    http://www.gamersnexus.net/hwreviews/2822-amd-ryze...
  • cmdrdredd - Saturday, March 4, 2017 - link

    Overclocking will be interesting. I don't use my PC for much besides gaming and lately it hasn't been a lot of that either due to lack of compelling titles. However, I would still be interested in seeing what it can offer here too for whenever I finally break down and decide I need to replace my 3570k @ 4.5Ghz.
  • Midwayman - Thursday, March 2, 2017 - link

    Here's hoping the 1600x hits the same gaming benches as the 1800x when OC'd. $500 for the 1800x is fine, Its just not the best value for gaming. Just like the i5's having been better value gaming systems in the past.

Log in

Don't have an account? Sign up now