Some Final Thoughts and Comparisons
Comments Locked

106 Comments

View All Comments

  • Bulat Ziganshin - Wednesday, August 24, 2016 - link

    I think it's obvious from number of ALUs that 40% improvement is for scalar single-thread code that greartly bemnefits from access to all 4 integer ALUs. Of course, it will get the same benefit fro any code running up to 8 threads (for 8-core Zen). But anyway it should be slower than KabyLake since Intel spent much more time optimizng their CPUs

    For m/t execution, improvements will be much smaller, 10-20%, i think. Plus, 8-core CPU will probably run at smaller frequency than 4-core Buldozers or 4-core KabyLake. AFAIK, even Intel 8c-ore CPUs run at 3.2 GHz only, and it's after many years of power optimization. We also know that *selected* Zen cpus run at 3.2 GHz in benchamrks. So, i expect either < 3 GHz frequency, or 200 Wt power budget
  • atomsymbol - Wednesday, August 24, 2016 - link

    "For m/t execution, improvements will be much smaller, 10-20%, i think."

    There are bottlenecks in Bulldozer-family when a module is running two threads. An improvement of 40% for m/t Zen execution in respect to Bulldozer m/t execution is possible. It is a question of what the baseline of measurement is.
  • Bulat Ziganshin - Wednesday, August 24, 2016 - link

    M/t execution in Bulldozer already can use all 4 INT alus, so i think that 40% IPC improvement is impossible. In other words, if s/t IPC improved by 40% by moving from 2 alu to 4 alu arrangement, m/t performance that keeps the same 4 alu arrangement, hardly can be improved by more than 20%
  • looncraz - Wednesday, August 24, 2016 - link

    IPC is NOT MT, it is ST only.

    IPC is per-core, per-thread, per-clock, instruction retire rate... which generally equates to performance per clock per core per thread.
  • Bulat Ziganshin - Thursday, August 25, 2016 - link

    you can measure instruction per cycles for a thread, 3 threads, core, cpu or anything else. what's a problem??

    My point is that s/t speed on Zen is improved much more than m/t speed, compared to last in Bulldozer family. So, they advertized improvement in s/t speed, that is 40%. And m/t improvement is much less since it still the same 4 alus (although many other parts become wider).
  • atomsymbol - Thursday, August 25, 2016 - link

    AMD presentation was comparing Zen to Broadwell in a m/t workload with all CPU cores utilized.

    From

    http://www.cpu-world.com/Compare/528/AMD_A10-Serie...

    you can compute that the Blender-specific speedup of Zen over a previous AMD design is about 100/38.8=2.57
  • Bulat Ziganshin - Thursday, August 25, 2016 - link

    Can you compute IPC improvement, that we are discussing here?
  • looncraz - Thursday, August 25, 2016 - link

    Except you're absolutely wrong, the performance increases will be much higher for MT than ST.

    Bulldozer was hindered by the module design, so you had poor MT scaling - not an issue with Zen. On top of that, Zen has SMT, which should add another 20% or so more MT performance for the same number of cores.

    A 40% ST improvement for Zen could easily mean a 100% performance improvement for MT.
  • Bulat Ziganshin - Saturday, August 27, 2016 - link

    It's not "on top of that". Zen is pretty simple Bulldozer modification that fianlly allowed to use all 4 scalar ALUs in the module for the single thread. It's why scalar s/t perfroamnce should be 40% faster. OTOH, two threads in the module still share those 4 scalar ALUs as before, so m/t perfromance cannot improve much. On top of that, module was renamed to core. So, there are 2x more cores now and of course m/t performance of entire CPU will be 2x higher
  • Nenad - Thursday, September 8, 2016 - link

    It is possible that AMD already count SMT (hyperthreading) into those 40%.

    Their slide which states "40% IPC Performance Uplift" also lists all things that AMD used to achieve those 40%...and first among those listed things is "Two threads per core". So if AMD already counted that into their 40% IPC uplift, then 'real' IPC improvement (for single thread) would be much lower.

Log in

Don't have an account? Sign up now