In their own side event this week, AMD invited select members of the press and analysts to come and discuss the next layer of Zen details. In this piece, we’re discussing the microarchitecture announcements that were made, as well as a look to see how this compares to previous generations of AMD core designs.

AMD Zen

Prediction, Decode, Queues and Execution

First up, let’s dive right into the block diagram as shown:

If we focus purely on the left to start, we can see most of the high-level microarchitecture details including basic caches, the new inclusion of an op-cache, some details about decoders and dispatch, scheduler arrangements, execution ports and load/store arrangements.  A number of slides later in the presentation talk about cache bandwidth.

Firstly, one of the bigger deviations from previous AMD microarchitecture designs is the presence of a micro-op cache (it might be worth noting that these slides sometimes say op when it means micro-op, creating a little confusion). AMD’s Bulldozer design did not have an operation cache, requiring it to fetch details from other caches to implement frequently used micro-ops. Intel has been implementing a similar arrangement for several generations to great effect (some put it as a major stepping stone for Conroe), so to see one here is quite promising for AMD. We weren’t told the scale or extent of this buffer, and AMD will perhaps give that information in due course.

Aside from the as-expected ‘branch predictor enhancements’, which are as vague as they sound, AMD has not disclosed the decoder arrangements in Zen at this time, but has listed that they can decode four instructions per cycle to feed into the operations queue. This queue, with the help of the op-cache, can deliver 6 ops/cycle to the schedulers. The reasons behind the queue being able to dispatch more per cycle is if the decoder can supply an instruction which then falls into two micro-ops (which makes the instruction vs micro-op definitions even muddier). Nevertheless, this micro-op queue helps feed the separate integer and floating point segments of the CPU. Unlike Intel who uses a combined scheduler for INT/FP, AMD’s diagram suggests that they will remain separate with their own schedulers at this time.

The INT side of the core will funnel the ALU operations as well as the AGU/load and store ops. The load/store units can perform 2 16-Byte loads and one 16-Byte store per cycle, making use of the 32 KB 8-way set associative write-back L1 Data cache. AMD has explicitly made this a write back cache rather than the write through cache we saw in Bulldozer that was a source of a lot of idle time in particular code paths. AMD is also stating that the load/stores will have lower latency within the caches, but has not explained to what extent they have improved.

The FP side of the core will afford two multiply ports and two ADD ports, which should allow for two joined FMAC operations or one 256-bit AVX per cycle. The combination of the INT and FP segments means that AMD is going for a wide core and looking to exploit a significant amount of instruction level parallelism. How much it will be able to depends on the caches and the reorder buffers – no real data on the buffers has been given at this time, except that the cores will have a +75% bigger instruction scheduler window for ordering operations and a +50% wider issue width for potential throughput. The wider cores, all other things being sufficient, will also allow AMD’s implementation of simultaneous multithreading to potentially take advantage of multiple threads with a linear and naturally low IPC.

Deciphering the New Cache Hierarchy: L1, 512 KB L2, 8 or 16 MB L3
Comments Locked

216 Comments

View All Comments

  • DigitalFreak - Thursday, August 18, 2016 - link

    Microsoft was already in the process of creating a 64bit version of Windows based on AMD's 64bit implementation (hence the reason you see AMD64 everywhere in 64bit Windows). Microsoft basically told Intel they were not going to support two competing implementations of "x64", so Intel caved and adopted the AMD64 implementation.
  • tygrus - Thursday, September 8, 2016 - link

    They license the ISA's ie. use of instructions and the expected output. The whole silicon designs are not cross-licensed. There probably have some IP of the silicon cross-licensed but the major point was they could handle the same instructions and be mostly compatible. AMD could only fully copy 486 and earlier designs. You can copy and implement the same ISA without having the same silicon. Intel had started a design for x86-64 but the front-end decoding and instructions were changed to be cmpatible. With micro/macro ops and microcoding there can be a lot of abstraction between ISA and execution. Intel made at least 1 mistake with their early AMD64 implementation that had to have work arounds and a later fix.
  • frenchy_2001 - Thursday, August 18, 2016 - link

    Opposite.
    Intel was vehement at the time that 64 bits needed to be a clean break from x86 and were pushing for their Itanium processors, implementing IA64 (completely incompatible with x86).
    The market followed AMD, especially sice they had the better architecture at the time (Athlon64, with 64 bits and in processor memory controllers, faster interconnect, better server scaling...).
    Intel then licensed AMD64 and and rebranded it EMT64 or x86-64.
  • wifiwolf - Friday, August 19, 2016 - link

    wow. finally someone who remembers that time correctly. Intel pushed for Itanium for too much time, even after they adopted amd's 64bit implementation. They eventually had to drop it as it never got enough market.
  • Samus - Sunday, August 21, 2016 - link

    Microsoft did make an IA64 edition on NT and 2000 but without x86 compatibility there were no apps. The genius behind AMD's 64 bit implementation is it is simply a memory extension of x86 with 64 bit integer registers, maintaining complete 32-bit compatibility with no real impact on 32 bit performance, while costing very little die space for the extensions.

    Microsoft and software developers saw this and basically told Intel their Itanium dreams were not going to come true.
  • anubis44 - Monday, August 22, 2016 - link

    And the genius behind that 'genius' was none other than Jim Keller, the man who also just designed the upcoming Zen processor family.
  • Visual - Tuesday, August 23, 2016 - link

    No, the IA64 architecture of Itanium does not try to keep any backwards-compatibility with x86, so any mention of it even being considered as an alternative to AMD64 is absurd. At that time the world was just not ready for a compatibility-breaking switch.
  • Kevin G - Tuesday, August 23, 2016 - link

    The ISA didn't directly try to keep backwards compatibility but Intel did put some x86 functionality into the first few generations of Itanium. This was later removed in chips post 2006.

    https://en.wikipedia.org/wiki/IA-32_Execution_Laye...
  • Gigaplex - Thursday, August 18, 2016 - link

    Which is a legal term to describe "copying with permission".
  • pikunsia - Friday, August 19, 2016 - link

    AMD cannot copy ``TM'' Intel technologies as this is a crime with criminal consequences. All is managed through licenses and royalties.

Log in

Don't have an account? Sign up now