Low Power, FinFET and Clock Gating

When AMD launched Carrizo and Bristol Ridge for notebooks, one of the big stories was how AMD had implemented a number of techniques to improve power consumption and subsequently increase efficiency. A number of those lessons have come through with Zen, as well as a few new aspects in play due to the lithography.

First up is the FinFET effect. Regular readers of AnandTech and those that follow the industry will already be bored to death with FinFET, but the design allows for a lower power version of a transistor at a given frequency. Now of course everyone using FinFET can have a different implementation which gives specific power/performance characteristics, but Zen on the 14nm FinFET process at Global Foundries is already a known quantity with AMD’s Polaris GPUs which are built similarly. The combination of FinFET with the fact that AMD confirmed that they will be using the density-optimised version of 14nm FinFET (which will allow for smaller die sizes and more reasonable efficiency points) also contributes to a shift of either higher performance at the same power or the same performance at lower power.

AMD stated in the brief that power consumption and efficiency was constantly drilled into the engineers, and as explained in previous briefings, there ends up being a tradeoff between performance and efficiency about what can be done for a number of elements of the core (e.g. 1% performance might cost 2% efficiency). For Zen, the micro-op cache will save power by not having to go further out to get instruction data, improved prefetch and a couple of other features such as move elimination will also reduce the work, but AMD also states that cores will be aggressively clock gated to improve efficiency.

We saw with AMD’s 7th Gen APUs that power gating was also a target with that design, especially when remaining at the best efficiency point (given specific performance) is usually the best policy. The way the diagram above is laid out would seem to suggest that different parts of the core could independently be clock gated depending on use (e.g. decode vs FP ports), although we were not able to confirm if this is the case. It also relies on having very quick (1-2 cycle) clock gating implementations, and note that clock gating is different to power-gating, which is harder to implement.

Deciphering the New Cache Hierarchy: L1, 512 KB L2, 8 or 16 MB L3 Simultaneous Multi-Threading, Time Frame
Comments Locked

216 Comments

View All Comments

  • Ro_Ja - Thursday, August 18, 2016 - link

    I just want Zen and hope people who are waiting for it won't be disappointed.
  • MrSpadge - Thursday, August 18, 2016 - link

    "Unlike Bulldozer, where having a shared FP unit between two threads was an issue for floating point performance, Zen’s design is more akin to Intel’s in that each thread will appear as an independent core and there is not that resource limitation that BD had. With sufficient resources, SMT will allow the core instructions per clock to improve"

    Ian, this section makes no sense! The reason Bulldozer and kids were slow was not the module design, but simply the cores being too weak. What helps Zen is not SMT but rather the fatter cores and the power optimization. If Zen had only 2 FP execution units, the maximum FP throughput per clock would be the same as for Bulldozer, independent of whether 1 or 2 threads run on a core / module. Or similarly if a Bulldozer module would have gotten 4 FPUs.
  • Nagorak - Thursday, August 18, 2016 - link

    As I understand it the issue was that for many purposes Bulldozer cores were really only dual core, not four true cores.
  • TheinsanegamerN - Friday, August 19, 2016 - link

    bulldozer had one FPU (a weaksauce FPU at that) for two cores, zen will have one FPU per core, the way intel does it.
  • jjj - Thursday, August 18, 2016 - link

    You list Broadwell-E L3$ at 1.5MB per core but they got 2.5.

    AMD with less cache and likely 2 mem chans might get away with substantially lower power and smaller die as well as lower BOM for system builders and only a minor perf penalty in consumer.
  • SunnyNW - Thursday, August 18, 2016 - link

    Seems they wanted some press since this week and since they have a Zen presentation at Hot Chips next week anyway the timing doesnt hurt.
  • SunnyNW - Thursday, August 18, 2016 - link

    Wow that got sent ALL wrong lol....
    With them presenting at Hot Chips next week anyway grabbing some press this week doesnt hurt.
  • extide - Thursday, August 18, 2016 - link

    OMG, SO excited for this. Gotta say that the FinFet GPU's and Zen are some of the most anticipated releases in a long time! I remember when I used to get excited about Intel releasing new arch's but these days that's so boring!
  • SunnyNW - Thursday, August 18, 2016 - link

    "We’ve got another couple of pieces detailing some of the AMD internal/live benchmark numbers during the presentation, as well as the dual socket server platform, the 32-core Naples server CPU, and what we saw at the event in terms of motherboard design. "

    Please hurry up and publish these benchmark numbers!! :)
  • SunnyNW - Thursday, August 18, 2016 - link

    NICE!!
    https://www.youtube.com/watch?v=oQS8s7TOXsE

Log in

Don't have an account? Sign up now