Single-threaded Integer Performance: 7-Zip

The profile of a compression algorithm is somewhat similar to many server workloads: it can be hard to extract instruction level parallelism (ILP) and it's sensitive to memory parallelism and latency. The instruction mix is a bit different, but it's still somewhat similar to many server workloads. Testing single threaded is also a great way to check how well the turbo boost feature works in a CPU.

And as one more reason to test performance in this manner, the 7-zip source code is available under the GNU LGPL license. That allows us to recompile the source code on every machine with the -O2 optimization with gcc 4.8.2.

We added the 7-zip scores that we could find at the 7-zip benchmark page. But there is more. The numbers on the 7-zip bench page have no software details, so we could not be sure that they would be accurate. So we managed to get a brief session on a POWER8 "for development purposes" server. The hardware specs can be read below:

Yes, we only got access to 1 core (8 threads) and 2 GB of RAM. So real world server benchmarking was out of the question. Nevertheless, it's a start. To that end we tested with gcc 4.9.1 (supports POWER8) and recompiled our source with the "-O2 -mtune="power8" options on Ubuntu Linux 14.10 for POWER. 

LZMA Single-Threaded Performance: Compression

Let us first focus on the new Haswell core inside the Xeon E7, which offers a solid 10% improvement. Turbo boost brings the clockspeed of the Haswell core close enough to the Ivy Bridge core (3.3GHz vs 3.4GHz) and the improved core does the rest. Nevertheless, it is clear that we should not expect huge performance increases with a 10% faster core and 20% more cores.

Back to the more exciting stuff: the fight between Intel and IBM, between the Xeon "Haswell" and the POWER8 chip. The Haswell core is a lot more sophisticated: single threaded performance at 3.3 GHz (turbo) is no less than 50% higher than the POWER8 at 3.4 GHz. That means that the Haswell core is a lot more capable when it comes to extracting ILP out of that complex code.

However, when the IBM monster is allowed to use 8 simultaneous threads spread out over one core, something magical happens. Something that we have not seen in a long, long time: the Intel chip is no longer on top. When you use all the available threading resources in one core, the 3.4 GHz chip is a tiny bit (2%) faster than the best Intel Xeon at 3.3 GHz.

Memory Subsystem: Bandwidth 7-Zip Decompression
Comments Locked

146 Comments

View All Comments

  • PowerTrumps - Saturday, May 9, 2015 - link

    Oracle has been unable to develop a power core let alone a processor. What they have done is created servers with many cores and many threads albeit weak cores/threads. The S3 core was an improvement and no reason to think the S4 won't be decent either. However, the M7 will come (again, true to form) with 32 cores per socket. It will be like 8 mini clusters of 4 cores because they are unable to develop a single SMP chip with shared resources across all of the cores. As such, these mini clusters will have their own resources which will lead to latency and inefficiencies. Oracle is a software business and their goal is to run software on either the most cores possible or the most inefficient. They have both of these bases covered with their Intel and SPARC business.

    Also, performance per Watt is important for Intel because what you see is what you get. With Power though, when you have strong single thread performance, strong multi-thread performance and tremendous consolidation efficiency due to Power Hypervisor efficiency means ~200W doesn't matter when you can consolidate 2, 4 maybe 10 Intel chips at 135W each into a single Power chip because of this hypervisor efficiency.
  • tynopik - Friday, May 8, 2015 - link

    pg4 - datam ining
  • der - Friday, May 8, 2015 - link

    Woo...we're bout to have another GHz War here!
  • usernametaken76 - Friday, May 8, 2015 - link

    I'm sure you mean figuratively. We've been stuck between 4-5 GHz on POWER architecture for closing in on a decade.
  • zamroni - Friday, May 8, 2015 - link

    My conclusion is Samsung should buy AMD to reduce Intel dominance.
  • alpha754293 - Friday, May 8, 2015 - link

    It would have been interesting to see the LS-DYNA benchmark results again (so that you can compare it against some of the tests that you've ran previously). But very interesting...
  • JohanAnandtech - Friday, May 8, 2015 - link

    Give me some help and we'll do that again on an update version :-)
  • alpha754293 - Tuesday, May 12, 2015 - link

    Not a problem. You have my email address right? And if not, I'll just send you another email and we can get that going again. :) Thanks.
  • andychow - Friday, May 8, 2015 - link

    If Samsung bought AMD, they would lose the licence for both x86 and x86_64 production. It would in fact ensure Intel's dominance of the market.
  • Kevin G - Friday, May 8, 2015 - link

    The x86 license can be transferred as long as Intel signs off on the deal (and it is in their best interest to do so). What will probably happen is that if any company buys AMD, the new owner will enter a cross licensing agreement with Intel.

Log in

Don't have an account? Sign up now