In their own side event this week, AMD invited select members of the press and analysts to come and discuss the next layer of Zen details. In this piece, we’re discussing the microarchitecture announcements that were made, as well as a look to see how this compares to previous generations of AMD core designs.

AMD Zen

Prediction, Decode, Queues and Execution

First up, let’s dive right into the block diagram as shown:

If we focus purely on the left to start, we can see most of the high-level microarchitecture details including basic caches, the new inclusion of an op-cache, some details about decoders and dispatch, scheduler arrangements, execution ports and load/store arrangements.  A number of slides later in the presentation talk about cache bandwidth.

Firstly, one of the bigger deviations from previous AMD microarchitecture designs is the presence of a micro-op cache (it might be worth noting that these slides sometimes say op when it means micro-op, creating a little confusion). AMD’s Bulldozer design did not have an operation cache, requiring it to fetch details from other caches to implement frequently used micro-ops. Intel has been implementing a similar arrangement for several generations to great effect (some put it as a major stepping stone for Conroe), so to see one here is quite promising for AMD. We weren’t told the scale or extent of this buffer, and AMD will perhaps give that information in due course.

Aside from the as-expected ‘branch predictor enhancements’, which are as vague as they sound, AMD has not disclosed the decoder arrangements in Zen at this time, but has listed that they can decode four instructions per cycle to feed into the operations queue. This queue, with the help of the op-cache, can deliver 6 ops/cycle to the schedulers. The reasons behind the queue being able to dispatch more per cycle is if the decoder can supply an instruction which then falls into two micro-ops (which makes the instruction vs micro-op definitions even muddier). Nevertheless, this micro-op queue helps feed the separate integer and floating point segments of the CPU. Unlike Intel who uses a combined scheduler for INT/FP, AMD’s diagram suggests that they will remain separate with their own schedulers at this time.

The INT side of the core will funnel the ALU operations as well as the AGU/load and store ops. The load/store units can perform 2 16-Byte loads and one 16-Byte store per cycle, making use of the 32 KB 8-way set associative write-back L1 Data cache. AMD has explicitly made this a write back cache rather than the write through cache we saw in Bulldozer that was a source of a lot of idle time in particular code paths. AMD is also stating that the load/stores will have lower latency within the caches, but has not explained to what extent they have improved.

The FP side of the core will afford two multiply ports and two ADD ports, which should allow for two joined FMAC operations or one 256-bit AVX per cycle. The combination of the INT and FP segments means that AMD is going for a wide core and looking to exploit a significant amount of instruction level parallelism. How much it will be able to depends on the caches and the reorder buffers – no real data on the buffers has been given at this time, except that the cores will have a +75% bigger instruction scheduler window for ordering operations and a +50% wider issue width for potential throughput. The wider cores, all other things being sufficient, will also allow AMD’s implementation of simultaneous multithreading to potentially take advantage of multiple threads with a linear and naturally low IPC.

Deciphering the New Cache Hierarchy: L1, 512 KB L2, 8 or 16 MB L3
Comments Locked

216 Comments

View All Comments

  • patel21 - Friday, August 19, 2016 - link

    Actually whom are you asking these questions ?
  • Peichen - Friday, August 19, 2016 - link

    Lets hope this isn't another one of AMD's empty claim that we've all seen like 8 times over the last 10 years on both CPU, GPU and the nonsense APU.

    The stock tripled over the last 12 months but that's only if Zen can deliver. If Zen is another <fill in AMD product for the last 10 years>, AMD will be a dollar stock again.
  • mxnerd - Friday, August 19, 2016 - link

    Wow. AMD stock climbs 12.5% after the news.
  • jihe - Friday, August 19, 2016 - link

    I pray to god this is a worthwhile processor
  • just4U - Friday, August 19, 2016 - link

    All it really needs to be is competitive on the performance front. It doesn't need to beat Intel but hey if it can well shoot.. that would be interesting. Not expecting that or even hoping for it since I think that would be unrealistic.
  • cocochanel - Friday, August 19, 2016 - link

    If Zen is good enough, it'll take some market share away from Intel, but not much since Intel CPU's are pretty much state of the art. However, the real advantages will come with their APU's (Zen + Polaris). The upcoming PlayStation Neo and Xbox Scorpio will use them. AMD will also go after mobile since they have no completion there with their APU's. Intel has some powerful iGPU's but they are nowhere near AMD APU's in performance. With the node disadvantage gone away, performance and power consumption should be up there. I know the desktop diehards will disagree, but desktops sales have been falling for years. Likely causes are a move by many to mobile devices and cheap, powerful gaming consoles. I don't see that trend changing. The ARM ecosystem is also rolling along and now it's beginning to creep into so far, exclusive x86 server markets. VR will also force ARM designers to come up with more powerful hardware. The next 5-10 years should be interesting.
  • Michael Bay - Friday, August 19, 2016 - link

    Problem is, mobile itself has reached saturation and isn`t so attractive anymore. Plus, what AMD is to go there with, x86? Intel tried already.
    Node disadvantage will come back at some point, simply because it`s a matter of survival for intel.

    Where things should get interesting is the server side. ARM is hardly a threat, but AMD might have a good product here with GPU+CPU compute, likely at lower price.
  • cocochanel - Friday, August 19, 2016 - link

    My mistake. By mobile, I meant laptops and not tablets. It's still a big market.
    Intel regaining node advantage ? Mmm, from I have seen in tech reports ( and I am not a big expert ), both 10nm and 7nm will be a tough nut to crack and will cost huge amounts. Compared to years past, Intel is now up against big giants ( Samsung and TSMC ) who are making billions every year selling tons of ARM SoC and have the deep pockets needed for new nodes. The South Koreans and the Chinese are a smart bunch. I mean, look at the SSD market. Intel had a lock on that until Samsung decided that one was too many. Remember Thunderbold ? Nice tech, but now the market is moving away from it. Sadly, even for mighty Intel, the landscape has changed.
    I hope you're right about the server market. AMD can use any sales they can get, but then again, Intel has a lock on that and they will get aggressive and mean if necessary ( it's big bucks, you know).
    ARM not a threat ? Architecturally speaking, they have advantages, after all, x86 is a dinosaur and ARM business model is one of their biggest strengths. But you're right, the installed base for x86 is huge and it will take time. There are some big names, however ( Qualcomm and others ) pouring some serious money in it.
    Should be interesting.
  • Michael Bay - Saturday, August 20, 2016 - link

    x86 is a very functional dinosaur with A LOT of companies standing behind it. ARM can license all they want, to actually break in and make those huge server monies, you need a full sw/hw/oem stack.
    I`d look at IBM`s last hooray POWER thing as the real competitor for intel right now, with AMD hopefully coming in soon as well.
  • BillBear - Saturday, August 20, 2016 - link

    Google has announced their intention to open up competition in the server space by fully adopting IBM's POWER chips over their entire server software/hardware stack and is working with AMD and others to make sure they can do the same thing with the ARM based server chips in development.

    Competition is a good thing.

Log in

Don't have an account? Sign up now