AMD Zen Microarchitecture: Dual Schedulers, Micro-Op Cache and Memory Hierarchy Revealed

Name: AMD Zen Microarchitecture: Dual Schedulers, Micro-Op Cache and Memory Hierarchy Revealed
Item: AMD Zen Microarchitecture: Dual Schedulers, Micro-Op Cache and Memory Hierarchy Revealed
Author: Dr. Ian Cutress

by Ian Cutress on August 18, 2016 9:00 AM EST

Posted in
CPUs
AMD
Zen

216 Comments | Add A Comment

216 Comments

Deciphering the New Cache Hierarchy

The cache hierarchy is a significant deviation from recent previous AMD designs, and most likely to its advantage. The L1 data cache is both double in size and increased in associativity compared to Bulldozer, as well as being write-back rather than write-through. It also uses an asymmetric load/store implementation, identifying that loads happen more often than stores in the critical paths of most work flows. The instruction cache is no longer shared between two cores as well as doubling in associativity, which should decrease the proportion of cache misses. AMD states that both the L1-D and L1-I are low latency, with details to come.

The L2 cache sits at half a megabyte per core with 8-way associativity, which is double that of Intel’s Skylake which has 256 KB/core and is only 4-way. On the other hand, Intel’s L3/LLC on their high-end Skylake SKUs is at 2 MB/core or 8 MB/CPU, whereas Zen will feature 1 MB/core and both are at 16-way associativity.

Edit 7:18am: Actually, the slide above is being slightly evasive in its description. It doesn't say how many cores the L3 cache is stretched over, or if there is a common LLC between all cores in the chip. However, we have recieved information from a source (which can't be confirmed via public AMD documents) that states that Zen will feature two sets of 8MB L3 cache between two groups of four cores each, giving 16 MB of L3 total. This would means 2 MB/core, but it also implies that there is no last-level unified cache in silicon across all cores, which Intel has. The reasons behind something like this is typically to do with modularity, and being able to scale a core design from low core counts to high core counts. But it would still leave a Zen core with the same L3 cache per core as Intel.

Cache Levels
	Bulldozer FX-8150	Zen	Broadwell-E i7-6950X	Skylake i7-6700K
L1 Instruction	64 KB 2-way per module	64 KB 4-way	32 KB 8-way	32 KB 8-way
L1 Data	16 KB 4-way Write Through	32 KB 8-way Write Back	32 KB 8-way Write-Back	32 KB 8-way Write-Back
L2	2 MB 16-way per module	512 KB 8-way	256 KB 8-way	256 KB 4-way
L3	1 MB/core 64-way	1 or 2 MB/core ? 16-way	2.5 MB/core 16/20-way	2 MB/core 16-way

What this means, between the L2 and the L3, is that AMD is putting more lower level cache nearer the core than Intel, and as it is low level it becomes separate to each core which can potentially improve single thread performance. The downside of bigger and lower (but separate) caches is how each of the cores will perform snoop in each other’s large caches to ensure clean data is being passed around and that old data in L3 is not out-of-date. AMD’s big headline number overall is that Zen will offer up to 5x cache bandwidth to a core over previous designs.

Zen High Level Block Diagram Low Power, FinFET and Clock Gating

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

216 Comments

View All Comments

DigitalFreak - Thursday, August 18, 2016 - link
Microsoft was already in the process of creating a 64bit version of Windows based on AMD's 64bit implementation (hence the reason you see AMD64 everywhere in 64bit Windows). Microsoft basically told Intel they were not going to support two competing implementations of "x64", so Intel caved and adopted the AMD64 implementation.
tygrus - Thursday, September 8, 2016 - link
They license the ISA's ie. use of instructions and the expected output. The whole silicon designs are not cross-licensed. There probably have some IP of the silicon cross-licensed but the major point was they could handle the same instructions and be mostly compatible. AMD could only fully copy 486 and earlier designs. You can copy and implement the same ISA without having the same silicon. Intel had started a design for x86-64 but the front-end decoding and instructions were changed to be cmpatible. With micro/macro ops and microcoding there can be a lot of abstraction between ISA and execution. Intel made at least 1 mistake with their early AMD64 implementation that had to have work arounds and a later fix.
frenchy_2001 - Thursday, August 18, 2016 - link
Opposite.
Intel was vehement at the time that 64 bits needed to be a clean break from x86 and were pushing for their Itanium processors, implementing IA64 (completely incompatible with x86).
The market followed AMD, especially sice they had the better architecture at the time (Athlon64, with 64 bits and in processor memory controllers, faster interconnect, better server scaling...).
Intel then licensed AMD64 and and rebranded it EMT64 or x86-64.
wifiwolf - Friday, August 19, 2016 - link
wow. finally someone who remembers that time correctly. Intel pushed for Itanium for too much time, even after they adopted amd's 64bit implementation. They eventually had to drop it as it never got enough market.
Samus - Sunday, August 21, 2016 - link
Microsoft did make an IA64 edition on NT and 2000 but without x86 compatibility there were no apps. The genius behind AMD's 64 bit implementation is it is simply a memory extension of x86 with 64 bit integer registers, maintaining complete 32-bit compatibility with no real impact on 32 bit performance, while costing very little die space for the extensions.

Microsoft and software developers saw this and basically told Intel their Itanium dreams were not going to come true.
anubis44 - Monday, August 22, 2016 - link
And the genius behind that 'genius' was none other than Jim Keller, the man who also just designed the upcoming Zen processor family.
Visual - Tuesday, August 23, 2016 - link
No, the IA64 architecture of Itanium does not try to keep any backwards-compatibility with x86, so any mention of it even being considered as an alternative to AMD64 is absurd. At that time the world was just not ready for a compatibility-breaking switch.
Kevin G - Tuesday, August 23, 2016 - link
The ISA didn't directly try to keep backwards compatibility but Intel did put some x86 functionality into the first few generations of Itanium. This was later removed in chips post 2006.

https://en.wikipedia.org/wiki/IA-32_Execution_Laye...
Gigaplex - Thursday, August 18, 2016 - link
Which is a legal term to describe "copying with permission".
pikunsia - Friday, August 19, 2016 - link
AMD cannot copy ``TM'' Intel technologies as this is a crime with criminal consequences. All is managed through licenses and royalties.

AMD Zen Microarchitecture: Dual Schedulers, Micro-Op Cache and Memory Hierarchy Revealed

Deciphering the New Cache Hierarchy

Post Your Comment

216 Comments

View All Comments

DigitalFreak - Thursday, August 18, 2016 - link

tygrus - Thursday, September 8, 2016 - link

frenchy_2001 - Thursday, August 18, 2016 - link

wifiwolf - Friday, August 19, 2016 - link

Samus - Sunday, August 21, 2016 - link

anubis44 - Monday, August 22, 2016 - link

Visual - Tuesday, August 23, 2016 - link

Kevin G - Tuesday, August 23, 2016 - link

Gigaplex - Thursday, August 18, 2016 - link

pikunsia - Friday, August 19, 2016 - link

Log in

Don't have an account? Sign up now