Single-threaded Integer Performance: 7-Zip

The profile of a compression algorithm is somewhat similar to many server workloads: it can be hard to extract instruction level parallelism (ILP) and it's sensitive to memory parallelism and latency. The instruction mix is a bit different, but it's still somewhat similar to many server workloads. Testing single threaded is also a great way to check how well the turbo boost feature works in a CPU.

And as one more reason to test performance in this manner, the 7-zip source code is available under the GNU LGPL license. That allows us to recompile the source code on every machine with the -O2 optimization with gcc 4.8.2.

We added the 7-zip scores that we could find at the 7-zip benchmark page. But there is more. The numbers on the 7-zip bench page have no software details, so we could not be sure that they would be accurate. So we managed to get a brief session on a POWER8 "for development purposes" server. The hardware specs can be read below:

Yes, we only got access to 1 core (8 threads) and 2 GB of RAM. So real world server benchmarking was out of the question. Nevertheless, it's a start. To that end we tested with gcc 4.9.1 (supports POWER8) and recompiled our source with the "-O2 -mtune="power8" options on Ubuntu Linux 14.10 for POWER. 

LZMA Single-Threaded Performance: Compression

Let us first focus on the new Haswell core inside the Xeon E7, which offers a solid 10% improvement. Turbo boost brings the clockspeed of the Haswell core close enough to the Ivy Bridge core (3.3GHz vs 3.4GHz) and the improved core does the rest. Nevertheless, it is clear that we should not expect huge performance increases with a 10% faster core and 20% more cores.

Back to the more exciting stuff: the fight between Intel and IBM, between the Xeon "Haswell" and the POWER8 chip. The Haswell core is a lot more sophisticated: single threaded performance at 3.3 GHz (turbo) is no less than 50% higher than the POWER8 at 3.4 GHz. That means that the Haswell core is a lot more capable when it comes to extracting ILP out of that complex code.

However, when the IBM monster is allowed to use 8 simultaneous threads spread out over one core, something magical happens. Something that we have not seen in a long, long time: the Intel chip is no longer on top. When you use all the available threading resources in one core, the 3.4 GHz chip is a tiny bit (2%) faster than the best Intel Xeon at 3.3 GHz.

Memory Subsystem: Bandwidth 7-Zip Decompression
Comments Locked

146 Comments

View All Comments

  • PowerTrumps - Saturday, May 9, 2015 - link

    I'm sure the author will update the article unless this was a Intel cheerleading piece.
  • name99 - Friday, May 8, 2015 - link

    The thing is called E7-8890. Not E7-5890?
    WTF Intel? Is your marketing team populated by utter idiots? Exactly what value is there in not following the same damn numbering scheme that your product line has followed for the past eight years or so?

    Something like that makes the chip look like there's a whole lot of "but this one goes up to 11" thinking going on at Intel...
  • name99 - Friday, May 8, 2015 - link

    OK, I get it. The first number indicates the number of glueless chips, not the micro-architecture generation. Instead we do that (apparently) with a v2 or v3 suffix.
    I still claim this is totally idiotic. Far more sensible would be to use the same scheme as the other Intel processors, and use a suffix like S2, S4, S8 to show the glueless SMP capabilities.
  • ZeDestructor - Friday, May 8, 2015 - link

    They've been using this convention since Westmere-EX actually, at which point they ditched their old convention of a prefix letter for power tier, followed by one digit for performance/scalability tier, followed by another digit for generation then the rest for individual models. Now we have 2xxx for dual socket, 4xxx for quad socket and 8xxx for 8+ sockets, and E3/E5/E7 for the scalability tier. I'm fine with either, though I have a slight preference for the current naming scheme because the generation is no longer mixed into the main model number.
  • Morawka - Saturday, May 9, 2015 - link

    man the power 8 is a beefy cpu... all that cache, you'd think it would walk all over intel.. but intel's superior cpu design wins
  • PowerTrumps - Saturday, May 9, 2015 - link

    please explain
  • tsk2k - Saturday, May 9, 2015 - link

    Where are the gaming benchmarks?
  • JohanAnandtech - Saturday, May 9, 2015 - link

    Is there still a game with software rendering? :-)
  • Gigaplex - Sunday, May 10, 2015 - link

    Llvmpipe on Linux gives a capable (feature wise) OpenGL implementation on the CPU.
  • Klimax - Saturday, May 9, 2015 - link

    Don't see POWER getting anywhere with that kind of TDP. There will be dearth of datacenters and other hosting locations retooling for such thing. And I suspect not many will even then take it as cooling and power costs will be damn too high.

    Problem is, IBM can't go lower with TDP as architecture features enabling such performance are directly responsible for such TDP. (Just L1 consumes 2W to keep few cycles latency at high frequency)

Log in

Don't have an account? Sign up now