Single-threaded Integer Performance: 7-Zip

The profile of a compression algorithm is somewhat similar to many server workloads: it can be hard to extract instruction level parallelism (ILP) and it's sensitive to memory parallelism and latency. The instruction mix is a bit different, but it's still somewhat similar to many server workloads. Testing single threaded is also a great way to check how well the turbo boost feature works in a CPU.

And as one more reason to test performance in this manner, the 7-zip source code is available under the GNU LGPL license. That allows us to recompile the source code on every machine with the -O2 optimization with gcc 4.8.2.

We added the 7-zip scores that we could find at the 7-zip benchmark page. But there is more. The numbers on the 7-zip bench page have no software details, so we could not be sure that they would be accurate. So we managed to get a brief session on a POWER8 "for development purposes" server. The hardware specs can be read below:

Yes, we only got access to 1 core (8 threads) and 2 GB of RAM. So real world server benchmarking was out of the question. Nevertheless, it's a start. To that end we tested with gcc 4.9.1 (supports POWER8) and recompiled our source with the "-O2 -mtune="power8" options on Ubuntu Linux 14.10 for POWER. 

LZMA Single-Threaded Performance: Compression

Let us first focus on the new Haswell core inside the Xeon E7, which offers a solid 10% improvement. Turbo boost brings the clockspeed of the Haswell core close enough to the Ivy Bridge core (3.3GHz vs 3.4GHz) and the improved core does the rest. Nevertheless, it is clear that we should not expect huge performance increases with a 10% faster core and 20% more cores.

Back to the more exciting stuff: the fight between Intel and IBM, between the Xeon "Haswell" and the POWER8 chip. The Haswell core is a lot more sophisticated: single threaded performance at 3.3 GHz (turbo) is no less than 50% higher than the POWER8 at 3.4 GHz. That means that the Haswell core is a lot more capable when it comes to extracting ILP out of that complex code.

However, when the IBM monster is allowed to use 8 simultaneous threads spread out over one core, something magical happens. Something that we have not seen in a long, long time: the Intel chip is no longer on top. When you use all the available threading resources in one core, the 3.4 GHz chip is a tiny bit (2%) faster than the best Intel Xeon at 3.3 GHz.

Memory Subsystem: Bandwidth 7-Zip Decompression
Comments Locked

146 Comments

View All Comments

  • TheSocket - Friday, May 8, 2015 - link

    They sure wouldn't lose the x86-64 license since they own it and Intel is licensing it from AMD.
  • melgross - Saturday, May 9, 2015 - link

    But without the license from Intel, it is worthless. There's also the question of how that works. I believe that Intel doesn't need to license back the 64 bit extensions.
  • Kevin G - Monday, May 11, 2015 - link

    This one of the reasons why it would be in Intelsat best interest to let AMD be bought out with the 32 bit license intact. The 64 bit license/patents going to a third party that doesn't want to share would be a dooms day scenario for Intel. Legally it wouldn't affect anything currently on the market but it'd throw Intel's future roadmap into the trash.
  • Death666Angel - Saturday, May 9, 2015 - link

    Pretty sure some regulatory bodies would step in if Intel were the only x86 game in town. And x86-64 is AMD property.
  • JumpingJack - Saturday, May 9, 2015 - link

    Any patents on x86 are long expired, AMD only owns the IP related to the extension of the x86 not the instruction set.
  • patrickjp93 - Monday, May 11, 2015 - link

    Not true. The U.S. government has them locked up under special military-based protections. Absolutely no one can make and sell x86 without Intel's and the DOD's permission.
  • Kevin G - Monday, May 11, 2015 - link

    Got a source for that?

    I know that DoD did some validation on x86 many years ago. (The Pentium core used by Larrabee had the DoD changes incorporated.)
  • haplo602 - Friday, May 8, 2015 - link

    hmm ... where's the RAS feature comparison/test ? did I miss it in the article ?
  • TeXWiller - Friday, May 8, 2015 - link

    In the E7v3 vs POWER comparison table, there should be 32 PCIe lanes instead 40 in the Xeon column.
  • TeXWiller - Friday, May 8, 2015 - link

    Additionally, it is the L3 in POWER8 that runs half of the core speed. L2 runs at the core speed.

Log in

Don't have an account? Sign up now