Thoughts and Comparisons

Throughout AMD's road to releasing details on Zen, we have had a chance to examine the information on the microarchitecture often earlier than we had expected to each point in the Zen design/launch cycle. Part of this is due to the fact that internally, AMD is very proud of their design, but some extra details (such as the extent of XFR, or the size of the micro-op cache), AMD has held close to its chest until the actual launch. With the data we have at hand, we can fill out a lot of information for a direct comparison chart to AMD’s last product and Intel’s current offerings.

CPU uArch Comparison
  AMD Intel
  Zen
8C/16T
2017
Bulldozer
4M / 8T
2010
Skylake
Kaby Lake
4C / 8T
2015/7
Broadwell
8C / 16T
2014
L1-I Size 64KB/core 64KB/module 32KB/core 32KB/core
L1-I Assoc 4-way 2-way 8-way 8-way
L1-D Size 32KB/core 16KB/thread 32KB/core 32KB/core
L1-D Assoc 8-way 4-way 8-way 8-way
L2 Size 512KB/core 1MB/thread 256KB/core 256KB/core
L2 Assoc 8-way 16-way 4-way 8-way
L3 Size 2MB/core 1MB/thread >2MB/cire 1.5-3MB/core
L3 Assoc 16-way 64-way 16-way 16/20-way
L3 Type Victim Victim Write-back Write-back
L0 ITLB Entry 8 - - -
L0 ITLB Assoc ? - - -
L1 ITLB Entry 64 72 128 128
L1 ITLB Assoc ? Full 8-way 4-way
L2 ITLB Entry 512 512 1536 1536
L2 ITLB Assoc ? 4-way 12-way 4-way
L1 DTLB Entry 64 32 64 64
L1 DTLB Assoc ? Full 4-way 4-way
L2 DTLB Entry 1536 1024 - -
L2 DTLB Assoc ? 8-way - -
Decode 4 uops/cycle 4 Mops/cycle 5 uops/cycle 4 uops/cycle
uOp Cache Size 2048 - 1536 1536
uOp Cache Assoc ? - 8-way 8-way
uOp Queue Size ? - 128 64
Dispatch / cycle 6 uops/cycle 4 Mops/cycle 6 uops/cycle 4 uops/cycle
INT Registers 168 160 180 168
FP Registers 160 96 168 168
Retire Queue 192 128 224 192
Retire Rate 8/cycle 4/cycle 8/cycle 4/cycle
Load Queue 72 40 72 72
Store Queue 44 24 56 42
ALU 4 2 4 4
AGU 2 2 2+2 2+2
FMAC 2x128-bit 2x128-bit
2x MMX 128-bit
2x256-bit 2x256-bit

Bulldozer uses AMD-coined macro-ops, or Mops, which are internal fixed length instructions and can account for 3 smaller ops. These AMD Mops are different to Intel's 'macro-ops', which are variable length and different to Intel's 'micro-ops', which are simpler and fixed-length.

Excavator has a number of improvements over Bulldozer, such as a larger L1-D cache and a 768-entry L1 BTB size, however we were never given a full run-down of the core in a similar fashion and no high-end desktop version of Excavator will be made.

This isn’t an exhaustive list of all features (thanks to CPU WorldReal World Tech and WikiChip for filling in some blanks) by any means, and doesn’t paint the whole story. For example, on the power side of the equation, AMD is stating that it has the ability to clock gate parts of the core and CCX that are not required to save power, and the L3 runs on its own clock domain shared across the cores. Or the latency to run certain operations, which is critical for workflow if a MUL operation takes 3, 4 or 5 cycles to complete. We have been told that the FPU load is two cycles quicker, which is something. The latency in the caches is also going to feature heavily in performance, and all we are told at this point is that L2 and L3 are lower latency than previous designs.

A number of these features we’ve already seen on Intel x86 CPUs, such as move elimination to reduce power, or the micro-op cache. The micro-op cache is a piece of the puzzle we wanted to know more about from day one, especially the rate at which we get cache hits for a given workload. Also, the use of new instructions will adjust a number of workloads that rely on them. Some users will lament the lack of true single-instruction AVX-2 support, however I suspect AMD would argue that the die area cost might be excessive at this time. That’s not to say AMD won’t support it in the future – we were told quite clearly that there were a number of features originally listed internally for Zen which didn’t make it, either due to time constraints or a lack of transistors.

We are told that AMD has a clear internal roadmap for CPU microarchitecture design over the next few generations. As long as we don’t stay for so long on 14nm similar to what we did at 28/32nm, with IO updates over the coming years, a competitive clock-for-clock product (even to Broadwell) with good efficiency will be a welcome return.

Power, Performance, and Pre-Fetch: AMD SenseMI Chipsets and Motherboards
Comments Locked

574 Comments

View All Comments

  • deltaFx2 - Wednesday, March 8, 2017 - link

    @Meteor2: No. Consumer GPUs have poor throughput for Double precision FP. So you can't push those to the GPU (unless you own those super-expensive Nvidia compute cards). Apparently, many rendering/video editing programs use GPUs for preview but do the final rendering on CPU. Quality, apparently, and might be related to DP FP. I'm not the expert, so if you know otherwise, I'd be happy to be corrected and educated. Also, you could make the same argument about AVX-256.

    The quoted paragraph is probably the only balanced statement in that entire review. Compare the tone of that review with AT review above.

    On an unrelated note, there's the larger question of running games at low res on top-end gpus and comparing frame-rates that far exceed human perception. I know, they have to do something, so why not just do this. The rationale is: " In future a faster GPU in future will create a bottleneck ". If this is true, it should be easy to demonstrate, right? Just dig through a history of Intel desktop CPUs paired with increasingly powerful GPUs and see how it trends. There's not one reviewer that has proven that this is true. It's being taken as gospel. OTOH, plenty of folks seem happy with their Sandy Bridge + Nvidia 1080, so clearly the bottleneck isn't here 5 years after SB. Maybe, just maybe, it's because the differences are imperceptible?

    Ryzen clearly has some bottlenecks but the whole gaming thing is a tempest in a tea-cup.
  • theuglyman0war - Thursday, March 9, 2017 - link

    ZBRUSH

    probably 90% of all 3d assets that are created from concept ( NOT SCANNED )
    Went through Zbrush at some point.

    Which means no GPU acceleration at all.
    Renderman
    Maxwell
    Vray
    Arnold
    still all use CPU rendering As do a mountain of other renderers.
    Arnold will be getting an option
    But the two popular GPU renderers are Otoy Octane and Redshift...
    The have their excellent expensive place. But the majority of rendering out there is still suffered through software rendering. And will always be a valid concern as long as they come FREE built into major DCC applications.
  • theuglyman0war - Thursday, March 9, 2017 - link

    Saw that same GPU trumps CPU render validity concerns...
    Comment and had a good laugh.
    I'll remember to spread that around every time I see Renderman Vray Arnold Maxwell sans GPU rendering going on.
    Or the next time a Mercury engine update negates all non Quadro GPU acceleration.

    To be fair a lot of creative pros and tech artists seem to disagree with me but...
    The only time between pulling vrts in Maya and brushing a surface in Zbrush that I really feel that I am suffering buckets of tears and desire a new CPU ( still on i7-980x ) is when I am cussing out a progress bar that is teasing me with it's slow progress. And that means CORES! encoding... un compressing... Rendering! Otherwise I could probably not notice day to day on a ten year old CPU. ( excluding CPU bound gaming of course... talking bout day to day vrt pulling )
    I was just as productive in 2007 as I am today.
  • MaidoMaido - Saturday, March 4, 2017 - link

    Been trying to find a review including practical benchmarks for common video editing / motion graphics applications like After Effects, Resolve, Fusion, Premiere, Element 3D.

    In a lot of these tasks, the multithreading is not always the best, as a result quad core 6700K often outperforms the more expensive Xeon and 5960X etc
  • deltaFx2 - Saturday, March 4, 2017 - link

    I would recommend this response to the GamersNexus hit piece: https://www.reddit.com/r/Amd/comments/5xgonu/analy...

    The i5 level performance is a lie.
  • Notmyusualid - Saturday, March 4, 2017 - link

    @ deltaFx2

    Sorry, not reading a 4k worded response. I'll wait for Anand to finish its Ryzen reviews before I draw any final conclusions.
  • Meteor2 - Tuesday, March 7, 2017 - link

    @deltaFX2 RE: in the 4k word Reddit 'rebuttal', what that person seems to be saying, is that once you've converted your $500 Ryzen 1800X into a 8C/8T chip, _then_ it beats a $240 i5, while still falling short of the $330 i7. Out-of-the-box, it has worse gaming performance than either Intel chip.

    That's not exactly a ringing endorsement.

    The analysis in the Anandtech forums, which concludes that in a certain narrow and low power band a heavily down-clocked 1800X happens to get excellent performance/W, isn't exactly thrilling either.
  • deltaFx2 - Wednesday, March 8, 2017 - link

    @ Meteor2: The anandtech forum thing: Perf/watt matters for servers and laptop. Take a look at the IPC numbers too. His average is that Zen == Broadwell IPC, and ~10% behind Sky/Kaby lake (except for AVX256 workloads). That's not too shabby at all for a $300 part.

    You completely missed the point of the reddit rebuttal. The GN reviewer drops i5s from plenty of tests citing "methodological reasons", but then says R7==i5 in gaming. The argument is that plenty of games use >4 threads and that puts i5 at a disadvantage.
  • tankNZ - Sunday, March 5, 2017 - link

    yes I agree, it's even better than okay for gaming[img]http://smsh.me/li3a.png[/img]
  • deltaFx2 - Monday, March 6, 2017 - link

    You may wish to see this though: https://forums.anandtech.com/threads/ryzen-strictl... Way, way, more detailed than any tech media review site can hope to get. No, it's got nothing to do with gaming. Gaming isn't the story here. AMD's current situation in x86 market share had little to do with gaming efficiency, but perf/watt.

    I'll quote the author: "850 points in Cinebench 15 at 30W is quite telling. Or not telling, but absolutely massive. Zeppelin can reach absolutely monstrous and unseen levels of efficiency, as long as it operates within its ideal frequency range."

Log in

Don't have an account? Sign up now