In their own side event this week, AMD invited select members of the press and analysts to come and discuss the next layer of Zen details. In this piece, we’re discussing the microarchitecture announcements that were made, as well as a look to see how this compares to previous generations of AMD core designs.

AMD Zen

Prediction, Decode, Queues and Execution

First up, let’s dive right into the block diagram as shown:

If we focus purely on the left to start, we can see most of the high-level microarchitecture details including basic caches, the new inclusion of an op-cache, some details about decoders and dispatch, scheduler arrangements, execution ports and load/store arrangements.  A number of slides later in the presentation talk about cache bandwidth.

Firstly, one of the bigger deviations from previous AMD microarchitecture designs is the presence of a micro-op cache (it might be worth noting that these slides sometimes say op when it means micro-op, creating a little confusion). AMD’s Bulldozer design did not have an operation cache, requiring it to fetch details from other caches to implement frequently used micro-ops. Intel has been implementing a similar arrangement for several generations to great effect (some put it as a major stepping stone for Conroe), so to see one here is quite promising for AMD. We weren’t told the scale or extent of this buffer, and AMD will perhaps give that information in due course.

Aside from the as-expected ‘branch predictor enhancements’, which are as vague as they sound, AMD has not disclosed the decoder arrangements in Zen at this time, but has listed that they can decode four instructions per cycle to feed into the operations queue. This queue, with the help of the op-cache, can deliver 6 ops/cycle to the schedulers. The reasons behind the queue being able to dispatch more per cycle is if the decoder can supply an instruction which then falls into two micro-ops (which makes the instruction vs micro-op definitions even muddier). Nevertheless, this micro-op queue helps feed the separate integer and floating point segments of the CPU. Unlike Intel who uses a combined scheduler for INT/FP, AMD’s diagram suggests that they will remain separate with their own schedulers at this time.

The INT side of the core will funnel the ALU operations as well as the AGU/load and store ops. The load/store units can perform 2 16-Byte loads and one 16-Byte store per cycle, making use of the 32 KB 8-way set associative write-back L1 Data cache. AMD has explicitly made this a write back cache rather than the write through cache we saw in Bulldozer that was a source of a lot of idle time in particular code paths. AMD is also stating that the load/stores will have lower latency within the caches, but has not explained to what extent they have improved.

The FP side of the core will afford two multiply ports and two ADD ports, which should allow for two joined FMAC operations or one 256-bit AVX per cycle. The combination of the INT and FP segments means that AMD is going for a wide core and looking to exploit a significant amount of instruction level parallelism. How much it will be able to depends on the caches and the reorder buffers – no real data on the buffers has been given at this time, except that the cores will have a +75% bigger instruction scheduler window for ordering operations and a +50% wider issue width for potential throughput. The wider cores, all other things being sufficient, will also allow AMD’s implementation of simultaneous multithreading to potentially take advantage of multiple threads with a linear and naturally low IPC.

Deciphering the New Cache Hierarchy: L1, 512 KB L2, 8 or 16 MB L3
Comments Locked

216 Comments

View All Comments

  • MonkeyPaw - Thursday, August 18, 2016 - link

    Heck, people are forgetting some significant firsts for AMD--the first to bring the IMC and hypertransport to x86. Both were big for the server world, too.
  • smilingcrow - Friday, August 19, 2016 - link

    Innovation is good and AMD have had many moments in the sun but let's not forget that Core 2 duo lacked an IMC and still trounced the AMD competition.
    Good products matter more than innovation and the trick is to turn innovation into good products.
  • The_Countess - Saturday, August 20, 2016 - link

    and lets not forget the first with a on-die memory controller for a x86 CPU. a huge improvement on both the desktop and the server space. it basically is what made more then 2 socket motherboards viable at all.
  • Kevin G - Saturday, August 20, 2016 - link

    Intel had a 386 chip with an integrated memory controller back in 90's so AMD wasn't the first to do it even in the x86 segment.
  • looncraz - Saturday, August 20, 2016 - link

    Pretty sure they did not, they had memory management techs built-in, but not memory controllers. The 286 had memory management capabilities as well, and so did all of AMD's comparable CPUs, naturally.
  • The_Countess - Sunday, August 21, 2016 - link

    that would be the first time i heard about that so: source?
  • Kevin G - Sunday, August 21, 2016 - link

    80386EX for the embedded market. Came out in 1994. By this time, it was two generations behind the Pentium so it generated very little fan fair.

    http://news.softpedia.com/news/Intel-Wants-The-Emb...

    Here is a book that includes some details about the 80386EX, though you'll have to do a bit of digging in it:
    http://userpages.umbc.edu/~squire/intel_book.pdf
  • Gigaplex - Thursday, August 18, 2016 - link

    "AMD was first with CPU + GPU = APU."

    Llano came out June 2011. Intel had Westmere (January 2010) and Sandy Bridge (January 2011) before Llano. Even if you discount Westmere since they were separate dies on the same package, Sandy Bridge still got there before Llano.
  • Mahigan - Thursday, August 18, 2016 - link

    https://www.sec.gov/Archives/edgar/data/2488/00011...

    Intel and AMD have a cross-licensing agreement which covers patents and technologies from both firms allowing both firms to use a rather generous amount of patents from their competitor.

    Nobody copied nobody. The two companies are quite intertwined tbh.
  • Mahigan - Thursday, August 18, 2016 - link

    "Advanced Micro Devices has clarified terms of the cross-license agreement with Intel Corp. on Thursday. As it appears, if either AMD or Intel change their control (i.e., gets acquired), the cross-license agreement between the two companies is automatically terminated for both parties.
    AMD and Intel have a cross-licensing agreement, which guarantees that the companies will not pursue litigation for any potential patent infringements. The agreement covers entire patent portfolio for both companies, which includes general-purpose processors, graphics adapters and other intellectual property. For example, AMD can design and sell microprocessors compatible with Intel’s x86 instruction set architecture and featuring a variety of extensions, whereas Intel can develop and ship central processing units that utilize IP and extensions initially created by AMD . The cross-licensing agreement has a number of terms and limitations. For example, the chip designers are not allowed to build processors that are compatible with competitor’s infrastructure (e.g., sockets, mainboards, etc.). Besides, if AMD and Intel change their ownership , the agreement terminated."

    http://www.kitguru.net/components/cpu/anton-shilov...

Log in

Don't have an account? Sign up now