In their own side event this week, AMD invited select members of the press and analysts to come and discuss the next layer of Zen details. In this piece, we’re discussing the microarchitecture announcements that were made, as well as a look to see how this compares to previous generations of AMD core designs.

AMD Zen

Prediction, Decode, Queues and Execution

First up, let’s dive right into the block diagram as shown:

If we focus purely on the left to start, we can see most of the high-level microarchitecture details including basic caches, the new inclusion of an op-cache, some details about decoders and dispatch, scheduler arrangements, execution ports and load/store arrangements.  A number of slides later in the presentation talk about cache bandwidth.

Firstly, one of the bigger deviations from previous AMD microarchitecture designs is the presence of a micro-op cache (it might be worth noting that these slides sometimes say op when it means micro-op, creating a little confusion). AMD’s Bulldozer design did not have an operation cache, requiring it to fetch details from other caches to implement frequently used micro-ops. Intel has been implementing a similar arrangement for several generations to great effect (some put it as a major stepping stone for Conroe), so to see one here is quite promising for AMD. We weren’t told the scale or extent of this buffer, and AMD will perhaps give that information in due course.

Aside from the as-expected ‘branch predictor enhancements’, which are as vague as they sound, AMD has not disclosed the decoder arrangements in Zen at this time, but has listed that they can decode four instructions per cycle to feed into the operations queue. This queue, with the help of the op-cache, can deliver 6 ops/cycle to the schedulers. The reasons behind the queue being able to dispatch more per cycle is if the decoder can supply an instruction which then falls into two micro-ops (which makes the instruction vs micro-op definitions even muddier). Nevertheless, this micro-op queue helps feed the separate integer and floating point segments of the CPU. Unlike Intel who uses a combined scheduler for INT/FP, AMD’s diagram suggests that they will remain separate with their own schedulers at this time.

The INT side of the core will funnel the ALU operations as well as the AGU/load and store ops. The load/store units can perform 2 16-Byte loads and one 16-Byte store per cycle, making use of the 32 KB 8-way set associative write-back L1 Data cache. AMD has explicitly made this a write back cache rather than the write through cache we saw in Bulldozer that was a source of a lot of idle time in particular code paths. AMD is also stating that the load/stores will have lower latency within the caches, but has not explained to what extent they have improved.

The FP side of the core will afford two multiply ports and two ADD ports, which should allow for two joined FMAC operations or one 256-bit AVX per cycle. The combination of the INT and FP segments means that AMD is going for a wide core and looking to exploit a significant amount of instruction level parallelism. How much it will be able to depends on the caches and the reorder buffers – no real data on the buffers has been given at this time, except that the cores will have a +75% bigger instruction scheduler window for ordering operations and a +50% wider issue width for potential throughput. The wider cores, all other things being sufficient, will also allow AMD’s implementation of simultaneous multithreading to potentially take advantage of multiple threads with a linear and naturally low IPC.

Deciphering the New Cache Hierarchy: L1, 512 KB L2, 8 or 16 MB L3
Comments Locked

216 Comments

View All Comments

  • breweyez - Friday, August 19, 2016 - link

    You sure sound like an intel fanboy
  • smilingcrow - Friday, August 19, 2016 - link

    Recognising and acknowledging that AMD's CPUs were in the doldrums for 10 long years doesn't make you an Intel fanboy but a realist. Ignoring that inconvenient truth does though make you an AMD fanboy.
    Come on Zen although the amount of crap that the fanboys on both sides will spout when it is released will be immense. I will keep off the forums.
  • jjj - Friday, August 19, 2016 - link

    There are no volumes above 350$, anything above that might as well not exist. Zen more or less needs to compete with Skylake while offering 2x the cores. If they have some higher clocks SKU above 350$, that could work but people need to be able to afford Zen,otherwise what's the point. Zen shouldn't be a huge die so AMD should be able to offer reasonable prices. Ofc there is no need to offer 8 cores high clocks at 200$ ,that's too far.
  • BMNify - Friday, August 19, 2016 - link

    if AMD cant get far better throughput than skylake with twice the zen cores , then they have no right to stay in business after all these missteps and the clammer of Jim Keller PR a DEC engineer who helped design the Alpha 21164 and 21264 processors then how can you ever expect to get a UHD1 rec.2020 capable CPU/GPU by even 2020.
  • smilingcrow - Friday, August 19, 2016 - link

    A lot of people are hoping that will be their strategy but it depends also on yields and final clock speeds.
    If they have low yields for the high clock speed parts they might well push that as an FX part and price it at $500 or more. It would still be a good halo product.
    Also if they have a really good 8 core at $350 or under it will impact how much they can ask for the higher volume quad core parts.
    If they sell too cheap they might have trouble matching the demand.
    It's quite a juggling act to balance all that.
  • azazel1024 - Friday, August 19, 2016 - link

    It sounds very good in fact. My biggest thing is overall system cost. Next is performance and finally noise and power consumption. Sure, I've love what a 10 core Core processor can level, but I don't really need it. I can get by with my Ivy Bridge i5-3570, but if I am going to upgrade, I'd like it to be for a nice boost in performance. Compared to my Ivy Bridge, I could be okay with a very small loss in single threaded performance, but I'd like a big gain in multithreaded performance. That to me says that Zen needs to bring, compared to my i5-3570, at least 90% single thread performance and at least 70-80% of the per core performance under multithreaded workloads. Then deliver it with roughly a $400 overall platform cost (between an "entry" mid-grade board and the CPU, ignoring RAM costs). Do that and they have a buyer from me. Don't and I'll probably look at the lowest level Hexacore Skylake-E processor once they come out next year.

    Basically I need 8 core Zen to be at least a little faster, averaged out, than current 6 core Broadwell-E, yet come in somewhat under the price of 6 core Broadwell-E. That would be enough extra performance to justify an upgrade from my current system early next year.
  • AndrewJacksonZA - Thursday, August 18, 2016 - link

    I am disappointed that they are only releasing Zen in 2017 as I really am looking to upgrade my PC towards the end of the year. But hey, what's another few months, I guess? *siiiiiiiiiiiiiiigh*
  • AndrewJacksonZA - Thursday, August 18, 2016 - link

    Aaargh! Where's the edit button please guys????

    Just to be clear, I'm not waiting /to buy Zen/, I'm waiting for it to come out so that proper, independent tests can show what CPU would be better suited to my pocket and my needs.
  • melgross - Thursday, August 18, 2016 - link

    As always, I've hopes that this will be what AMD says it will, but little confidence that it will.
  • silverblue - Thursday, August 18, 2016 - link

    The micro ops cache is a bit of a surprise; I believe the Steamroller preview mentioned that particular design was getting such a cache. Perhaps it didn't in the end.

Log in

Don't have an account? Sign up now