CPU Performance: Encoding Tests

With the rise of streaming, vlogs, and video content as a whole, encoding and transcoding tests are becoming ever more important. Not only are more home users and gamers needing to convert video files into something more manageable, for streaming or archival purposes, but the servers that manage the output also manage around data and log files with compression and decompression. Our encoding tasks are focused around these important scenarios, with input from the community for the best implementation of real-world testing.

All of our benchmark results can also be found in our benchmark engine, Bench.

Handbrake 1.1.0: Streaming and Archival Video Transcoding

A popular open source tool, Handbrake is the anything-to-anything video conversion software that a number of people use as a reference point. The danger is always on version numbers and optimization, for example the latest versions of the software can take advantage of AVX-512 and OpenCL to accelerate certain types of transcoding and algorithms. The version we use here is a pure CPU play, with common transcoding variations.

We have split Handbrake up into several tests, using a Logitech C920 1080p60 native webcam recording (essentially a streamer recording), and convert them into two types of streaming formats and one for archival. The output settings used are:

  • 720p60 at 6000 kbps constant bit rate, fast setting, high profile
  • 1080p60 at 3500 kbps constant bit rate, faster setting, main profile
  • 1080p60 HEVC at 3500 kbps variable bit rate, fast setting, main profile

Handbrake 1.1.0 - 720p60 x264 6000 kbps FastHandbrake 1.1.0 - 1080p60 x264 3500 kbps FasterHandbrake 1.1.0 - 1080p60 HEVC 3500 kbps Fast

 

7-zip v1805: Popular Open-Source Encoding Engine

Out of our compression/decompression tool tests, 7-zip is the most requested and comes with a built-in benchmark. For our test suite, we’ve pulled the latest version of the software and we run the benchmark from the command line, reporting the compression, decompression, and a combined score.

It is noted in this benchmark that the latest multi-die processors have very bi-modal performance between compression and decompression, performing well in one and badly in the other. There are also discussions around how the Windows Scheduler is implementing every thread. As we get more results, it will be interesting to see how this plays out.

Please note, if you plan to share out the Compression graph, please include the Decompression one. Otherwise you’re only presenting half a picture.

7-Zip 1805 Compression7-Zip 1805 Decompression7-Zip 1805 Combined

 

WinRAR 5.60b3: Archiving Tool

My compression tool of choice is often WinRAR, having been one of the first tools a number of my generation used over two decades ago. The interface has not changed much, although the integration with Windows right click commands is always a plus. It has no in-built test, so we run a compression over a set directory containing over thirty 60-second video files and 2000 small web-based files at a normal compression rate.

WinRAR is variable threaded but also susceptible to caching, so in our test we run it 10 times and take the average of the last five, leaving the test purely for raw CPU compute performance.

WinRAR 5.60b3

 

AES Encryption: File Security

A number of platforms, particularly mobile devices, are now offering encryption by default with file systems in order to protect the contents. Windows based devices have these options as well, often applied by BitLocker or third-party software. In our AES encryption test, we used the discontinued TrueCrypt for its built-in benchmark, which tests several encryption algorithms directly in memory.

The data we take for this test is the combined AES encrypt/decrypt performance, measured in gigabytes per second. The software does use AES commands for processors that offer hardware selection, however not AVX-512.

AES Encoding

More than a slight regression here in our AES testing - this is probably the most severe of all our tests for how the security fixes have affected performance.

CPU Performance: System Tests CPU Performance: Web and Legacy Tests
Comments Locked

79 Comments

View All Comments

  • Thanny - Wednesday, November 27, 2019 - link

    Zen does not support AVX-512 instructions. At all.

    AVX-512 is not simply AVX-256 (AKA AVX2) scaled up.

    Something to consider is that AVX-512 forces Intel chips to run at much slower clock speeds, so if you're mixing workloads, using AVX-512 instructions could easily cause overall performance to drop. It's only in an artificial benchmark situation where it has such a huge advantage.
  • Everett F Sargent - Monday, November 25, 2019 - link

    Obviously, AMD just caught up with Intel's 256-bit AVX2, prior to Ryzen 3 AMD only had 128-bit AVX2 AFAIK. It was the only reason I bought into a cheap Ryzen 3700X Desktop (under $600US complete and prebuilt). To get the same level of AVX support, bitwise.

    I've been using Intel's Fortran compiler since 1983 (back then it was on a DEC VAX).

    So I only do math modeling at 64-bits like forever (going back to 1975), So I am very excited that AVX-512 is now under $1KUS. An immediate 2X speed boost over AVX2 (at least for the stuff I'm doing now).
  • rahvin - Monday, November 25, 2019 - link

    I'd be curious how much the AVX512 is used by people. It seems to be a highly tailored for only big math operations which kinda limits it's practical usage to science/engineering. In addition the power use of the module was massive in the last article I read, to the point that the main CPU throttled when the AVX512 was engaged for more than a few seconds.

    I'd be really curious what percentage of people buying HEDT are using it, or if it's just a niche feature for science/engineering.
  • TEAMSWITCHER - Tuesday, November 26, 2019 - link

    If you don't need AVX512 you probably don't need or even want a desktop computer. Not when you can get an 8-core/16-thread MacBook Pro. Desktops are mostly built for show and playing games. Most real work is getting done on laptops.
  • Everett F Sargent - Tuesday, November 26, 2019 - link

    LOL, that's so 2019.
    Where I am from it's smartwatches all the way down.
    Queue Four Yorkshiremen.
  • AIV - Tuesday, November 26, 2019 - link

    Video processing and image processing can also benefit from AVX512. Many AI algorithms can benefit from AVX512. Problem for Intel is that in many cases where AVX512 gives good speedup, GPU would be even better choice. Also software support for AVX512 is lacking.
  • Everett F Sargent - Tuesday, November 26, 2019 - link

    Not so!
    https://software.intel.com/en-us/parallel-studio-x...
    It compiles and runs on both Intel and AMD. Full AVX-512 support on AVX-512 hardware.
    You have to go full Volta to get true FP64, otherwise desktop GPU's are real FP64 dogs!
  • AIV - Wednesday, November 27, 2019 - link

    There are tools and compilers for software developers, but not so much end user software actually use them. FP64 is mostly required only in science/engineering category. Image/video/ai processing is usually just fine with lower precision. I'd add that also GPUs only have small (<=32GB) RAM while intel/amd CPUs can have hundreds of GB or more. Some datasets do not fit into a GPU. AVX512 still has its niche, but it's getting smaller.
  • thetrashcanisfull - Monday, November 25, 2019 - link

    I asked about this a couple of months ago. Apparently the 3DPM2 code uses a lot of 64b integer multiplies; the AVX2 instruction set doesn't include packed 64b integer mul instructions - those were added with AVX512, along with some other integer and bit manipulation stuff. This means that any CPU without AVX512 is stuck using scalar 64b muls, which on modern microarchitectures only have a throughput of 1/clock. IIRC the Skylake-X core and derivatives have two pipes capable of packed 64b muls, for a total throughput of 16/clock.

    I do wish AnandTech would make this a little more clear in their articles though; it is not at all obvious that the 3DPM2 is more of a mixed FP/Integer workload, which is not something I would normally expect from a scientific simulation.

    I also think that the testing methodology on this benchmark is a little odd - each algorithm is run for 20 seconds, with a 10 second pause in between? I would expect simulations to run quite a bit longer than that, and the nature of turbo on CPUs means that steady-state and burst performance might diverge significantly.
  • Dolda2000 - Monday, November 25, 2019 - link

    Thanks a lot, that does explain much.

Log in

Don't have an account? Sign up now