Low Power, FinFET and Clock Gating

When AMD launched Carrizo and Bristol Ridge for notebooks, one of the big stories was how AMD had implemented a number of techniques to improve power consumption and subsequently increase efficiency. A number of those lessons have come through with Zen, as well as a few new aspects in play due to the lithography.

First up is the FinFET effect. Regular readers of AnandTech and those that follow the industry will already be bored to death with FinFET, but the design allows for a lower power version of a transistor at a given frequency. Now of course everyone using FinFET can have a different implementation which gives specific power/performance characteristics, but Zen on the 14nm FinFET process at Global Foundries is already a known quantity with AMD’s Polaris GPUs which are built similarly. The combination of FinFET with the fact that AMD confirmed that they will be using the density-optimised version of 14nm FinFET (which will allow for smaller die sizes and more reasonable efficiency points) also contributes to a shift of either higher performance at the same power or the same performance at lower power.

AMD stated in the brief that power consumption and efficiency was constantly drilled into the engineers, and as explained in previous briefings, there ends up being a tradeoff between performance and efficiency about what can be done for a number of elements of the core (e.g. 1% performance might cost 2% efficiency). For Zen, the micro-op cache will save power by not having to go further out to get instruction data, improved prefetch and a couple of other features such as move elimination will also reduce the work, but AMD also states that cores will be aggressively clock gated to improve efficiency.

We saw with AMD’s 7th Gen APUs that power gating was also a target with that design, especially when remaining at the best efficiency point (given specific performance) is usually the best policy. The way the diagram above is laid out would seem to suggest that different parts of the core could independently be clock gated depending on use (e.g. decode vs FP ports), although we were not able to confirm if this is the case. It also relies on having very quick (1-2 cycle) clock gating implementations, and note that clock gating is different to power-gating, which is harder to implement.

Deciphering the New Cache Hierarchy: L1, 512 KB L2, 8 or 16 MB L3 Simultaneous Multi-Threading, Time Frame
Comments Locked

216 Comments

View All Comments

  • looncraz - Thursday, August 18, 2016 - link

    Really, this design is like nothing Intel has.

    Intel uses a unified scheduler, and it looks from the diagram that AMD is using seven schedulers... which is just insane. Beyond both using SMT schemes and executing x86, they are very different designs.
  • e36Jeff - Thursday, August 18, 2016 - link

    Just a quick FYI, Intel is licencing the SMT technology from Sun, as they hold the US patents for it. So Intel, just like AMD, is copying Sun.
  • svan1971 - Thursday, August 18, 2016 - link

    Wow that was a hell of a burn on AMD zingy....Nothing better than routing for Goliath huh.
  • farmergann - Thursday, August 18, 2016 - link

    Zen is actually an enlarged evolution of the Jag Cores with doubled up pipelines and SMT. Don't take my word for it either, study the link below and pay attention to what we learn about Zen. Jag/Puma+ are actually better Cores than their intel competitors despite a huge node disadvantage. AMD is back.
    http://www.realworldtech.com/jaguar/
  • msx68k - Thursday, August 18, 2016 - link

    AMD did not copy anything from Intel, because Intel did not invent the SMT technique. The SMT was developed by IBM in '60, while CMT was by DEC in '90, and both are processor design techniques, something like Risc or CISC.
  • The_Countess - Friday, August 19, 2016 - link

    like intel copied the short pipeline of the athlon64, the on die memory controller, and the larger l1 and l2 caches, in addition to the already mentioned AMD64.
  • medi03 - Friday, August 19, 2016 - link

    That's one silly statement.
    That's the way progress works. When there is a good idea to (re-)use, you do it. Nothing wrong with it.
  • stimudent - Friday, August 19, 2016 - link

    Think or research about what you're about to say before posting.
  • SanX - Friday, August 19, 2016 - link

    Doubt about that. Somebody is just pumping AMD stock. Typical bluff, none of these 200 journos have a clue about all these cache speed exchange etc, they understand only cash speed exchange. The 40% increase in processor performance they claim will actually be 20% or even 10%. And compared to Intel in 2017 - 0%. You can not jump factor of 2 anymore, the Moore's law is dead. And 10-20% difference in computing means EQUAL, and all that Zen noise means NOTHING.
  • looncraz - Friday, August 19, 2016 - link

    In order for the feat they demonstrated to be real, they had to have exceeded 40% IPC over Excavator, unless their SMT is scaling unusually well.

    FX-8350 at 3Ghz would take well more than twice as long. Even the FX-8350 at 4GHz would probably take twice as long.

Log in

Don't have an account? Sign up now