Single-Threaded Integer Performance

The LZMA compression benchmark only measures a part of the performance of some real-world server applications (file server, backup, etc.). The reason why we keep using this benchmark is that it allows us to isolate the "hard to extract instruction level parallelism (ILP)" and "sensitive to memory parallelism and latency" integer performance. That is the kind of integer performance you need in most server applications.

One more reason to test performance in this manner is that the 7-zip source code is available under the GNU LGPL license. That allows us to recompile the source code on every machine with the -O2 optimization with gcc 4.8.2.

LZMA Single-Threaded Performance: Compression

The Xeon E5-2650L Haswell core is only able to boost to 2.5 GHz, while the Xeon D has a newer core (Broadwell) and is capable of 2.6 GHz. Still, the Xeon E5 is 6% faster. The most likely explanation is that the Xeon E5-2650L (65W TDP) keeps turboboost higher for a longer time than the Xeon D (45W TDP). 

The Xeon D and Atom C2750 run at the same clockspeed in this single threaded task (2.6 GHz), but you can see how much difference a wide complex architecture makes. The Broadwell Core is able to run about twice as many instructions in parallel as the Silvermont core. The Haswell/Broadwell core results clearly show that well designed wide architectures remain quite capable, even in "low ILP" (Instruction Level Parallelism) code.

Let's see how the chips compare in decompression. Decompression is an even lower IPC (Instructions Per Clock) workload, as it is pretty branch intensive and depends on the latencies of the multiply and shift instructions.

LZMA Single-Threaded Performance: Decompression

The Xeon E5 runs at 2.5 GHz, the Xeon D at 2.6 GHz, the Xeon E3-1230L at 2.8 GHz, The Xeon E3-1265L can reach 3.7 GHz. The decompression results follow the same logic. There does not seem to be a difference between a Broadwell, Haswell or Ivy Bridge core: performance is almost linear with (turboboost) clockspeed. The only exception is the Xeon E3-1240 which turboboost to 3.8 GHz, but outperforms the other by a larger than expected. The explanation is pretty simple: the higher TDP (80 W) allows the chip to sustain turbo boost clock speeds for much longer. 

Memory Subsystem: Latency Multi-Threaded Integer Performance
Comments Locked

90 Comments

View All Comments

  • Flunk - Tuesday, June 23, 2015 - link

    Yes, but it's still bad marketing. -D is associated with inferior, overly hot, bad performing Intel chips.
  • IanHagen - Tuesday, June 23, 2015 - link

    Certainly. From a marketing standpoint it's a pretty poor choice. I agree with wussupi, E4 would haven been a far better name.
  • karpodiem - Tuesday, June 23, 2015 - link

    does anyone know where to buy these online? I'm looking for just the board/processor, model # 'X10SDV-TLN4F'

    All these random/small Supermicro resellers are selling it now, based on some Google searches. They're marking it up in price by at least a hundred bucks, because availability is limited. Anyone know when Newegg might get it in stock?

    Looking to do a FreeNAS build - this board + IBM M1015 card in an ATX motherboard (6x4TB drives in RAIDZ2).
  • ats - Tuesday, June 23, 2015 - link

    The TLN4F is the one in most demand and almost no place is able to keep it in stock. There are multiple places that will order it for you for ~1K but wait times can be anywhere from 1 week to 1 month.
  • Jon Tseng - Tuesday, June 23, 2015 - link

    > And the reality is that the current SoCs with an ARM ISA do not deliver the necessary per core
    > performance: they are still micro server SoCs, at best competing with the Atom C2750. So
    > currently, there is no ARM SoC competition in the scale out market until something better than
    > the A57 hits the market for these big players.

    Dude... You really want to have a look at the latest ThunderX parts or the X-Gene 16nm shrinks before you start making unwise statements like that. These aren't waiting around for A57 they are custom ARM architecture designs. Per core performance might not be as hot as Xeon but once you start to throw 48 cores on a die I wouldn't quite call that "at best competing with Avaton".
  • smoohta - Tuesday, June 23, 2015 - link

    Link to reviews?
  • ats - Tuesday, June 23, 2015 - link

    X-Gene is in the article, any further shrinks are still entirely vapor. ThunderX isn't currently available is is likely to have significantly worse per core performance than Atom C2k series and worse than A57. All the cores in the world don't do jack if the ST isn't there. And ST performance IS a barrier even in scale out. For general scale out, C2750 was found fairly wanting because of the ST performance, and neither X-Gene nor ThunderX even compete with C2750 in ST performance... QED.
  • mczak - Tuesday, June 23, 2015 - link

    He said "currently". The X-Gene 16nm cores might offer some competition who knows - but those are X-Gene 3 whereas you can't even buy anything with X-Gene 2 28nm ones right now... Likewise, ThunderX servers have been announced, but I haven't seen any reviews yet.
  • name99 - Tuesday, June 23, 2015 - link

    Look at the ThunderX parts HOW? Cavium releases fsck-all information about them. No-one knows if they are even OoO, how wide they are, etc.
    Yes, there are 48 cores on a SoC; and presumably they will do well for tasks like memcached that like lots of low-performance parallelism. But right now, we have ZERO evidence that a ThunderX part is a better single-threaded core than A57, let alone that it's comparable to Broadwell.
  • der - Tuesday, June 23, 2015 - link

    NOICE FAM!

Log in

Don't have an account? Sign up now