In their own side event this week, AMD invited select members of the press and analysts to come and discuss the next layer of Zen details. In this piece, we’re discussing the microarchitecture announcements that were made, as well as a look to see how this compares to previous generations of AMD core designs.

AMD Zen

Prediction, Decode, Queues and Execution

First up, let’s dive right into the block diagram as shown:

If we focus purely on the left to start, we can see most of the high-level microarchitecture details including basic caches, the new inclusion of an op-cache, some details about decoders and dispatch, scheduler arrangements, execution ports and load/store arrangements.  A number of slides later in the presentation talk about cache bandwidth.

Firstly, one of the bigger deviations from previous AMD microarchitecture designs is the presence of a micro-op cache (it might be worth noting that these slides sometimes say op when it means micro-op, creating a little confusion). AMD’s Bulldozer design did not have an operation cache, requiring it to fetch details from other caches to implement frequently used micro-ops. Intel has been implementing a similar arrangement for several generations to great effect (some put it as a major stepping stone for Conroe), so to see one here is quite promising for AMD. We weren’t told the scale or extent of this buffer, and AMD will perhaps give that information in due course.

Aside from the as-expected ‘branch predictor enhancements’, which are as vague as they sound, AMD has not disclosed the decoder arrangements in Zen at this time, but has listed that they can decode four instructions per cycle to feed into the operations queue. This queue, with the help of the op-cache, can deliver 6 ops/cycle to the schedulers. The reasons behind the queue being able to dispatch more per cycle is if the decoder can supply an instruction which then falls into two micro-ops (which makes the instruction vs micro-op definitions even muddier). Nevertheless, this micro-op queue helps feed the separate integer and floating point segments of the CPU. Unlike Intel who uses a combined scheduler for INT/FP, AMD’s diagram suggests that they will remain separate with their own schedulers at this time.

The INT side of the core will funnel the ALU operations as well as the AGU/load and store ops. The load/store units can perform 2 16-Byte loads and one 16-Byte store per cycle, making use of the 32 KB 8-way set associative write-back L1 Data cache. AMD has explicitly made this a write back cache rather than the write through cache we saw in Bulldozer that was a source of a lot of idle time in particular code paths. AMD is also stating that the load/stores will have lower latency within the caches, but has not explained to what extent they have improved.

The FP side of the core will afford two multiply ports and two ADD ports, which should allow for two joined FMAC operations or one 256-bit AVX per cycle. The combination of the INT and FP segments means that AMD is going for a wide core and looking to exploit a significant amount of instruction level parallelism. How much it will be able to depends on the caches and the reorder buffers – no real data on the buffers has been given at this time, except that the cores will have a +75% bigger instruction scheduler window for ordering operations and a +50% wider issue width for potential throughput. The wider cores, all other things being sufficient, will also allow AMD’s implementation of simultaneous multithreading to potentially take advantage of multiple threads with a linear and naturally low IPC.

Deciphering the New Cache Hierarchy: L1, 512 KB L2, 8 or 16 MB L3
Comments Locked

216 Comments

View All Comments

  • Kevin G - Saturday, August 20, 2016 - link

    HyperTransport was an AMD creation though they were not the first to use it. Former DEC engineers did help create it but they were employees by AMD at the time. AMD did license the EV6 bus for the the first Athlon (not Athlon 64). The first chip to that used the HT bus was Transmeta due to delays on the first generation Athlon 64/Operton.
  • slyronit - Tuesday, August 23, 2016 - link

    Ah! Good old days! I used to read all this in "Chip" magazine back in the day. Cyber cafes those days used "Cyrix" CPUs. Cheap.
  • BMNify - Friday, August 19, 2016 - link

    thats the thing, Did AMD actually learn something from their ARM inc partners and put in a real up to date interconnect or two that can lower overall latency and massively improve data throughput (ready with HBM2 perhaps) or did they cheap out again and rehash the usual antiquated suspects
  • nandnandnand - Thursday, August 18, 2016 - link

    Good. I want Zen to perform well. Let's see Intel copy AMD and offer a 8c/16t chip at mainstream prices.
  • akamateau - Thursday, August 18, 2016 - link

    Hmmm...

    AMD was first with 1Ghz and faster processors.

    AMD was first with multi-core processor.

    AMD was first with CPU + GPU = APU. Intel has the laughably poor performing Intel IGP LOL. And to get it Intle had to poach technology from NVidia and then NVidia sued them!!!! LOL

    AMD owns X86-64.

    SO your point?????

    AMD has a license to copy Intel and if like Frank Sinatra chooses to do it their way, it can only be good for the consumer.

    So smarten up. Without AMD Intel would have killed the PC 10 years ago with $2000 CPU's!!!
  • smilingcrow - Thursday, August 18, 2016 - link

    I don't live in the past from a decade ago. When AMD finally release their first decent CPU in 10 years wake me up.
    Even with negligible competition from AMD Intel has chosen to keep the prices of chips for the mainstream socket at low levels for 10 years. It was 2009 with Lynnfield that they last had a $1,000 Extreme chip for consumers and there were plenty of good chips in that range starting at under $300 so the Extreme chips were for rich fools really.
  • The_Countess - Saturday, August 20, 2016 - link

    intel created a entire artificial market segment with the i5's because of lack of competition. they still sell dual cores for christ sake, and havent offered anything above 4 cores on the main stream market, which AMD's had for over 6 years already.

    On 22nm, let alone 14, there is no way they couldn't have made a affordable 6 core. but all we get are ridiculously priced -E variants on a ridiculously overpriced platform.
  • FMinus - Thursday, August 18, 2016 - link

    that is still at most ~20 years back, AMD is on the face of the earth for 47 years and more as half of that they spend innovating nothing, but being contractors and priates of technology.
  • tamalero - Friday, August 19, 2016 - link

    whats with hardcore intel fanboys just getting out of their caves now that AMD might have a decent cpu to compete with Intel?
  • The_Countess - Saturday, August 20, 2016 - link

    and they went from that to creating the athlon64 and royally kicking intel's ass with superior innovations in just 14 years (counting from the first k5).

    to bad intel's monopoly abuse has already done its damage leaving AMD wofully short on production capacity meaning 80% of people still had to buy intel's crummy shit for too much money. and with AMD not making nearly as much money as it should have from the athlon64, intel could one again copy everything AMD did and then brute force outspend them.

Log in

Don't have an account? Sign up now