Single-Threaded Integer Performance

I admit, the following two benchmarks are almost irrelevant for anyone buying a Xeon E5 based machine. But still, we have to quench our curiosity: how much have the new cores been improved? There is a lot that can be said about the sophisticated "uncore" improvements (cache coherency policies, low latency rings, and so on) that allow this multi-core monster to scale, but at the end of the day, good performance starts with a good core. And since we have listed the many subtle core improvements, we could not resist the opportunity to see how each core compares.

The results aren't totally meaningless either, as the profile of a compression algorithm is somewhat similar to many server workloads: it can be hard to extract instruction level parallelism (ILP) and it's sensitive to memory parallelism and latency. The instruction mix is a bit different, but it's still somewhat similar to many server workloads. And as one more reason to test performance in this manner, the 7-zip source code is available under the GNU LGPL license. That allows us to recompile the source code on every machine with the -O2 optimization with gcc 4.8.1.

Single Threaded LZMA Compression

It looks more boring than it is. First of all, judging by the reactions on forums, many people expected that an 18-core E5-2699 v3 at 2.3GHz would be slower than a 3.2GHz Xeon E5-2667 v3. However you actually can have it all. The Xeon E5-2699 v3 and 2695 v3 boost their clock speed to no less than 3.6GHz when only one or two cores are active. The Xeon E5-2667 v3's maximum Turbo Boost is also the same 3.6GHz, so when only a few threads are active, the Xeon E5-2667 v3 has no clock advantage over the "mega/expensive SKUs" other than the fact that the clock speed will not drop lower than 3.2GHz if all cores are running at full bore.

Despite the fact that the Xeon E5-2690 core has lower IPC, it is able to keep up as it can boost the standard clock speed from 2.9 to 3.8GHz. As it is very hard to extract more IPC out of this kind of code, the extra 200MHz is enough to keep up.

Let's see how the chips compare in decompression. Decompression is an even lower IPC (Instructions Per Clock) workload, as it is very branch intensive and depends on the latencies of the multiply and shift instructions.

Single threaded LZMA decompression

The older Xeon E5 takes the lead as decompression runs at very low IPC and is mostly depended on clock speed and low latency accesses. The new Xeon E5 v3 has slightly higher latency in both L3 cache and memory, so it falls behind.

What makes this benchmark interesting is that it proves that Turbo Boost works very well, even on an 18-core chip with a massive die. This is a big bonus, as especially in situations where you are setting up/preparing a system to be productive, it is very likely that you will be waiting for some single-threaded application to end. It also means that if one heavy request hits the server while it is running at very low load, the response time of the request will be low, keeping the impatient users happy.

Memory Subsystem: Latency Multi-Threaded Integer Performance
Comments Locked

85 Comments

View All Comments

  • MorinMoss - Friday, August 9, 2019 - link

    Hello from 2019.
    AMD has a LOT of ground to make up but it's a new world and a new race
    https://www.anandtech.com/show/14605/the-and-ryzen...
  • Kevin G - Monday, September 8, 2014 - link

    As an owner of a dual Opteron 6376 system, I shudder at how far behind that platform is. Then I look down and see that I have both of my kidneys as I didn't need to sell one for a pair of Xeons so I don't feel so bad. For the price of one E5-2660v3 I was able to pick up two Opteron 6376's.
  • wallysb01 - Monday, September 8, 2014 - link

    But the rest of the system cost is about the same. So you get 1/2 the performance for a 10% discount. YEPPY!
  • Kevin G - Monday, September 8, 2014 - link

    Nope. Build price after all the upgrades over the course of two years is some where around $3600 USD. The two Opterons accounted for a bit more than a third of that price. Not bad for 32 cores and 128 GB of memory. Even with Haswell-E being twice as fast, I'd have to spend nearly twice as much (CPU's cost twice as much as does DDR4 compared to when I bought my DDR3 memory). To put it into prespective, a single Xeon E5 2999v3 might be faster than my build but I was able to build an entire system for less than the price Intel's flagship server CPU.

    I will say something odd - component prices have increased since I purchased parts. RAM prices have gone up by 50% and the motherboard I use has seemingly increased in price by $100 due to scarcity. Enthusiast video card prices have also gotten crazy over the past couple of years so a high end video card is $100 more for top of the line in the consumer space.
  • wallysb01 - Tuesday, September 9, 2014 - link

    Going to the E5 2699 isn’t needed. A pair of 2660 v3s is probably going to be nearly 2x as fast the 6376, especially for floating point where your 32 cores are more like 16 cores or for jobs that can’t use very many threads. True a pair of 2660s will be twice as expensive. On a total system it would add about $1.5K. We’ll have to wait for the workstation slanted view, but for an extra $1.5K, you’d probably have a workstation that’s much better at most tasks.
  • Kevin G - Friday, September 12, 2014 - link

    Actually if you're aiming to double the performance of a dual Opteron 6376, two E5-2695v3's look to be a good pick for that target according to this review. A pair of those will set you pack $4848 which is more than what my complete system build cost.

    Processors are only one component. So while a dual Xeon E5-2695v3 system would be twice as fast, total system cost is also approaching double due to memory and motherboard pricing differences.
  • Kahenraz - Monday, September 8, 2014 - link

    I'm running a 6376 server as well and, although I too yearn for improved single-threaded performance, I could actually afford to own this one. As delicious as these Intel processors are, they are not priced for us mere mortals.

    From a price/performance standpoint, I would still build another Opteron server unless I knew that single-threaded performance was critical.
  • JDG1980 - Tuesday, September 9, 2014 - link

    The E5-2630 v3 is cheaper than the Opteron 6376 and I would be very surprised if it didn't offer better performance.
  • Kahenraz - Tuesday, September 9, 2014 - link

    6376s can be had very cheaply on the second-hand market, especially bundled with a motherboard. Additionally, the E5-2630 v3 requires both a premium on the board and DDR4 memory.

    I'd wager you could still build an Opteron 6376 system for half or less.
  • Kevin G - Tuesday, September 9, 2014 - link

    It'd only be fair to go with the second hand market for the E5-2630v3's but being new means they don't exist. :)

    Still going by new prices, an Opteron 6376 will be cheaper but roughly 33% from what I can tell. You're correct that the new Xeon's have a premium pricing on motherboards and DDR4 memory.

Log in

Don't have an account? Sign up now