Single-threaded Integer Performance: 7-Zip

The profile of a compression algorithm is somewhat similar to many server workloads: it can be hard to extract instruction level parallelism (ILP) and it's sensitive to memory parallelism and latency. The instruction mix is a bit different, but it's still somewhat similar to many server workloads. Testing single threaded is also a great way to check how well the turbo boost feature works in a CPU.

And as one more reason to test performance in this manner, the 7-zip source code is available under the GNU LGPL license. That allows us to recompile the source code on every machine with the -O2 optimization with gcc 4.8.2.

We added the 7-zip scores that we could find at the 7-zip benchmark page. But there is more. The numbers on the 7-zip bench page have no software details, so we could not be sure that they would be accurate. So we managed to get a brief session on a POWER8 "for development purposes" server. The hardware specs can be read below:

Yes, we only got access to 1 core (8 threads) and 2 GB of RAM. So real world server benchmarking was out of the question. Nevertheless, it's a start. To that end we tested with gcc 4.9.1 (supports POWER8) and recompiled our source with the "-O2 -mtune="power8" options on Ubuntu Linux 14.10 for POWER. 

LZMA Single-Threaded Performance: Compression

Let us first focus on the new Haswell core inside the Xeon E7, which offers a solid 10% improvement. Turbo boost brings the clockspeed of the Haswell core close enough to the Ivy Bridge core (3.3GHz vs 3.4GHz) and the improved core does the rest. Nevertheless, it is clear that we should not expect huge performance increases with a 10% faster core and 20% more cores.

Back to the more exciting stuff: the fight between Intel and IBM, between the Xeon "Haswell" and the POWER8 chip. The Haswell core is a lot more sophisticated: single threaded performance at 3.3 GHz (turbo) is no less than 50% higher than the POWER8 at 3.4 GHz. That means that the Haswell core is a lot more capable when it comes to extracting ILP out of that complex code.

However, when the IBM monster is allowed to use 8 simultaneous threads spread out over one core, something magical happens. Something that we have not seen in a long, long time: the Intel chip is no longer on top. When you use all the available threading resources in one core, the 3.4 GHz chip is a tiny bit (2%) faster than the best Intel Xeon at 3.3 GHz.

Memory Subsystem: Bandwidth 7-Zip Decompression
Comments Locked

146 Comments

View All Comments

  • PowerTrumps - Saturday, May 9, 2015 - link

    Ok, yes a data center like Verizon or ATT might not "qualify" but the point is accurate. I work with IBM's Power servers and have absolutely consolidated 5 racks of x86 into a single Power server - it was 54 Intel 2S & 4S servers into a single 64c Power7. Part of this is due to the "performance" of Power but most of the credit goes to the efficiency of the Power Hypervisor. PHYP can provide a QoS to each workload while weaving a greater amount of workloads onto fewer Power servers/cores than what the benchmarks imply.
  • newtrekemotion - Friday, May 8, 2015 - link

    I wouldn't discount Oracle so quickly. The T5 was a pretty big step forward from the T4 and the new M7 chip sounds like it could be quite the competitor with 2 TB of memory per socket and 32 cores, especially for highly threaded loads since an octo-socket system would have 2048 threads and support 16 TB of memory.. Hopefully this can bring some more competition to the market, though with only Oracle and Fujistu (maybe?) selling systems it won't have quite the impact that multiple POWER8 vendors could bring. Love them, hate them, or anywhere in between it seems Oracle is not ready to give up in this arena and it looks like they are putting more effort in than Sun was (or are at least executing on effort more than Sun did).

    Something else to note here is the process advantage that Intel has over everyone else. I might have missed it in the article, but especially for performance/watt this is important.

    In all I think the statement at the beginning of the article that this area is getting more exciting is very true. Just seems like it might be a 3 way race instead of a 2. The recent AMD announcement that they wanted to focus on HPC is interesting too though of the 4 (Intel, IBM, Oracle and AMD) they have the furthest to go and the fewest resources to do it with. The next few years are going to be very interesting and hopefully someone, or a combination can push Intel and drive the whole market forward.
  • JohanAnandtech - Friday, May 8, 2015 - link

    I was writing from a "who will be able to convert Intel Xeon people" point of view. As I wrote in the Xeon E7v2 article, Oracle's T processors have indeed vastly improved. That is all nice and well but there is no reason why someone considering a Xeon E7 would switch. Oracle's sales seems to mostly about people who are long time Oracle users. As far as I can see, OpenPOWER servers are the only real thread to Intel's server hegemony.
  • Kevin G - Saturday, May 9, 2015 - link

    Oracle does offer one reason to switch to SPARC: massive licensing discounts on Oracle software.

    If you're not using Oracle's software, then yeah, the SPARC platform is a very tough sell over x86 or POWER.
  • JohanAnandtech - Saturday, May 9, 2015 - link

    exactly. Good point.
  • PowerTrumps - Saturday, May 9, 2015 - link

    If you are running Oracle software you should know that IBM and Power are the largest platform which Oracle software runs on. Secondly, if running Oracle products licensed by the core, the only platform to control Oracle licensing is Power (not including Mainframe in this assertion). I have reduce Oracle licensing for customers anywhere from 4X to 10X. Do the math on that to appreciate those savings. Lastly, when I upgrade customers from one generation to another we talk about how much Oracle they can reduce. You don't hear that when upgrading from Sandy Bridge to Ivy Bridge to Haswell.
  • kgardas - Friday, May 8, 2015 - link

    I'm not sure about T5, but certainly latest Fujitsu's SPARC64-X+ is able to over-run POWER8 and by wide margin also older Xeon's. Just look for the spec. rate. It also won some SAP S&D 2-tier benchmark on absolute performance so I'm glad that SPARC is still competitive too...
  • Kevin G - Saturday, May 9, 2015 - link

    The top SPARC benchmarks I've seen are using far more sockets, cores, threads and memory to get to that top spot. It is nice that the system can scale to such high socket counts (40) but only if you can actually fund a project that needs that absolute performance. Drop down to 16 socket where you can get twice the performance from POWER than SPARC with the same licensing cost, what advantage does SPARC have to make people switch?

    Even then, a system like SGI's UV2000 would fall into the same niche due to its ability to scale to insane socket counts, software licensing fees be damned.
  • kgardas - Tuesday, May 12, 2015 - link

    Kevin G, actually you are right and I made an mistake. It was not intentional, I was misled by spec site claiming "24 cores, 4 chips, 6 cores/chip, 8 threads/core" for "IBM Power S824 (3.5 GHz, 24 core, RHEL)" so I've thought this is 4 socket setup and I compared it with Fujitsu M10-4 which won. Now, I've just found IBM is two socket which means it wins on socket/spec rate basis of course. Price-wise IBM is also way much cheaper than SPARC (if you don't run Oracle DB of course) so I keep my fingers crossed for OpenPOWER.
    Honestly, although this is really nice to see I still have kind of feeling that this is IBM hardware division swan's song. I would really like to be wrong here. Anyway, I still think that ARMv8 does have higher chances in getting into the Intel's business and be really a pain for Intel. On the other hand if OpenPOWER is successful in Chinese business, that would be good and some chance for us too to see lower-cost POWER machines...
  • PowerTrumps - Saturday, May 9, 2015 - link

    yes, take a look at those benchmark results and you see the Fuji M10-4S requires 640 & 512 cores. Even the Oracle M6-32 uses 384 cores. The Fuji 512c example had 33% higher SAPS with 2X the cores. The M6-32 has 50% more cores to get 21% higher SAPS. Further, looking at the SAP benchmark as a indicator of core, chip & server performance shows that SPARC & Intel are roughly 1600 - 2200 SAPS per core compared to Power8 which is 5451 SAPS for the 80 core E870. So you put this into context the 80 core Power8 has slightly less than 1/2 the SAPS of the 640 core Fujitsu M10-4S. Think of ALL the costs associated with 640 cores vs 80...ok, 160 if we want to get the SAPS roughly equal. 4X more cores to get less than 2X the results.

Log in

Don't have an account? Sign up now