Benchmarking Performance: CPU Office Tests

The office programs we use for benchmarking aren't specific programs per-se, but industry standard tests that hold weight with professionals. The goal of these tests is to use an array of software and techniques that a typical office user might encounter, such as video conferencing, document editing, architectural modelling, and so on and so forth. At present we have two such tools to use.

PCMark8

Despite originally coming out in 2008/2009, Futuremark has maintained PCMark8 to remain relevant in 2017. On the scale of complicated tasks, PCMark focuses more on the low-to-mid range of professional workloads, making it a good indicator for what people consider 'office' work. We run the benchmark from the commandline in 'conventional' mode, meaning C++ over OpenCL, to remove the graphics card from the equation and focus purely on the CPU. PCMark8 offers Home, Work and Creative workloads, with some software tests shared and others unique to each benchmark set. The Creative workload unfortunately seems to fail from the commandline, as the graphical test gives an output of zero (update 3/8: we've found a way around this; will update when we get CPUs retested).

Office: PCMark8 Home (non-OpenCL)

Office: PCMark8 Work (non-OpenCL)

Addendum on 3/8: Originally we posted PCM8 Home scores for Ryzen that were around 3800. On further inspection, these runs were misconfigured due to circumstances beyond our control, and test data is being re-run. The Ryzen 7 1800X in this instance scores 4515.

SYSmark 2014

SYSmark is developed by Bapco, a consortium of industry CPU companies. The goal of SYSmark is to take stripped down versions of popular software, such as Photoshop and Onenote, and measure how long it takes to process certain tasks within that software. The end result is a score for each of the three segments (Office, Media, Data) as well as an overall score. Here a reference system (Core i3-4130, 4GB DRAM, 500GB HDD) is used to provide a baseline score of 1000. A newer version of the benchmark (2014 SE) will be used in future reviews.

A note on contect for these numbers. AMD left Bapco in the last two years, due to differences of opinion on how the benchmarking suites were angled towards Intel processors and had optimizations to show bigger differences than what AMD felt was present. The following benchmarks are provided as data, but the conflict of opinion between the two companies on the validity of the benchmark is provided as context for the following numbers.

SYSmark 2014 - Office Productivity

SYSmark 2014 - Media Creation

SYSmark 2014 - Data and Financial Analysis

SYSmark 2014 - Overall

 

Benchmarking Performance: CPU Encoding Tests Benchmarking Performance: CPU Legacy Tests
Comments Locked

574 Comments

View All Comments

  • nt300 - Saturday, March 11, 2017 - link

    If AMD hadn't gone with GF's 14nm process, ZEN would probably have been delayed. I think as soon as Ryzen Optimizations come out, these chips will further outperform.
  • MongGrel - Thursday, March 9, 2017 - link


    For some reason making a casual comment about anything bad about the chip will get you banned at the drop of a hat on the tech forums, and then if you call him out they will ban you more.

    https://arstechnica.com/gadgets/2017/03/amds-momen...

  • MongGrel - Thursday, March 9, 2017 - link

    For some reason, MarkFW seems to thinks he is the reincarnation of Kyle Bennet, and whines a lot before retreating to his safe space.
  • nt300 - Saturday, March 11, 2017 - link

    I've noticed in the past that AMD has an issue with increasing L3 cache speed and/or Latencies. Hopefully they start tightening the L3 as much as possible. Can Anandtech do a comparison between Ryzen before Optimizations and after Optimizations. Ty
  • alpha754293 - Friday, March 17, 2017 - link

    Looks like that for a lot of the compute-intensive benchmarks, the new Ryzen isn't that much better than say a Core i5-7700K.

    That's quite a bit disappointing.

    AMD needs to up their FLOPS/cycle game in order to be able to compete in that space.

    Such a pity because the original Opterons were a great value proposition vs. the Intels. Now, it doesn't even come close.
  • deltaFx2 - Saturday, March 25, 2017 - link

    @Ian Cutress: When you do test gaming, if you can, I'd love to have the hypothesis behind the 'generally accepted methodology' tested out. The methodology being, to test it at lowest resolution. The hypothesis is that this stresses the CPU, and that a future, higher performance GPU will be bottlenecked by the slower CPU. Sounds logical, but is it?

    Here's the thing: Typically, when given more computing resources, people scale up their problem to utilize those resources. In other words, if I give you a more powerful GPU, games will scale up their perf requirements to match it, by doing stuff that were not possible/practical in earlier GPUs. Today's games are far more 'realistic' and are played at much higher resolutions than say 5 years ago. In which case, the GPU is always the limiting factor no matter what (unless one insists on playing 5 year old games on the biggest, baddest GPU). And I fully expect that the games of today are built to max out current GPUs, so hardware lags software.

    This has parallels with what happens in HPC: when you get more compute nodes for HPC problems, people scale up the complexity of their simulations rather than running the old, simplified simulations. Amdahl's law is still not a limiting factor for HPC, and we seem to be talking about Exascale machines now. Clearly, there's life in HPC beyond what a myopic view through the Amdahl law lens would indicate.

    Just a thought :) Clearly, core count requirements have gone up over the last decade, but is it true that a 4c/8t sandy bridge paired up with Nvidia's latest and greatest is CPU-bottlenecked at likely resolutions?
  • wavelength - Friday, March 31, 2017 - link

    I would love to see Anand test against AdoredTV's most recent findings on Ryzen https://www.youtube.com/watch?v=0tfTZjugDeg
  • LawJikal - Friday, April 21, 2017 - link

    What I'm surprised to see missing... in virtually all reviews across the web... is any discussion (by a publication or its readers) on the AM4 platform's longevity and upgradability (in addition to its cost, which is readily discussed).

    Any Intel Platform - is almost guaranteed to not accommodate a new or significantly revised microarchitecture... beyond the mere "tick". In order to enjoy a "tock", one MUST purchase a new motherboard (if historical precedent is maintained).

    AMD AM4 Platform - is almost guaranteed to, AT LEAST, accommodate Ryzen "II" and quite possibly Ryzen "III" processors. And, in such cases, only a new processor and BIOS update will be necessary to do so.

    This is not an insignificant point of differentiation.
  • PeterCordes - Monday, June 5, 2017 - link

    The uArch comparison table has some errors for the Intel columns. Dispatch/cycle: Skylake can read 6 uops per clock from the uop cache into the issue queue, but the issue stage itself is still only 4 uops wide. You've labelled Even running from the loop buffer (LSD), it can only sustain a throughput of 4 uops per clock, same 4-wide pipeline width it has been since Core2. (pre-Haswell it has to be a mix of ALU and some store or load to sustain that throughput without bottlenecking on the execution ports.) Skylake's improved decode and uop-cache bandwidth lets it refill the uop queue (IDQ) after bubbles in earlier stages, keeping the issue stage fed (since the back-end is often able to actually keep up).

    Ryzen is 6-wide, but I think I've read that it can only issue 6 uops per clock if some of them are from "double instructions". e.g. 256-bit AVX like VADDPS ymm0, ymm1, ymm2 that decodes to two separate 128-bit uops. Running code with only single-uop instructions, the Ryzen's front-end throughput is 5 uops per clock.

    In Intel terminology, "dispatch" is when the scheduler (aka Reservation Station) sends uops to the execution units. The row you've labelled "dispatch / cycle" is clearly the throughput for issuing uops from the front-end into the out-of-order core, though. (Putting them into the ROB and Reservation Station). Some computer-architecture people call that "dispatch", but it's probably not a good idea in an x86 context. (Unless AMD uses that terminology; I'm mostly familiar with Intel).

    ----

    You list the uop queue size at 128 for Skylake. This is bogus. It's always 64 per thread, with or without hyperthreading. Intel has alternated in SnB/IvB/HSW/SKL between this and letting one thread use both queues as a single big queue. HSW/BDW statically partition their 56-entry queue into two 28-entry halves when two threads are active, otherwise it's a 56-entry queue. (Not 64). Agner Fog's microarch pdf and Intel's optmization manual both confirm this (in Section 2.1.1 about Skylake's front-end improvements over previous generations).

    Also, the 4-uop per clock issue width is 4 fused-domain uops, so I was able to construct a loop that runs 7 unfused-domain uops per clock (http://www.agner.org/optimize/blog/read.php?i=415#... with 2 micro-fused ALU+load, one micro-fused store, and a dec/branch. AMD doesn't talk about "unfused" uops because it doesn't use a unified scheduler, IIRC, so memory source operands always stay with the ALU uop.

    Also, you mentioned it in the text, but the L1d change from write-through to write-back is worth a table row. IIRC, Bulldozer's L1d write-back has a small buffer or something to absorb repeated writes of the same lines, so it's not quite as bad as a classic write-through cache would be for L2 speed/power requirements, but Ryzen is still a big improvement.

Log in

Don't have an account? Sign up now