The Intel Xeon E7 v2 Review: Quad Socket, Up to 60 Cores/120 Threads

Name: The Intel Xeon E7 v2 Review: Quad Socket, Up to 60 Cores/120 Threads
Item: The Intel Xeon E7 v2 Review: Quad Socket, Up to 60 Cores/120 Threads
Author: Johan De Gelas

by Johan De Gelas on February 21, 2014 6:00 AM EST

125 Comments | Add A Comment

125 Comments

Our Benchmark Choices

To make the comparison more interesting, we decided to include both the Quad Xeon "Westmere-EX" as well as the "Nehalem-EX". Remember these heavy duty, high RAS servers continue to be used for much longer in the data center than their dual socket counterparts. Many people considering the newest Xeon E7-4800 v2 probably still own a Xeon X7500.

Of course, the comparison would not be complete without the latest dual Xeon 2600 v2 server and at least one Opteron based server. Due to the large number of platforms and the fact that we developed a brand new HPC test (see further), we quickly ran out of time. These time constrains and the fact that we have neglected our Linux testing in recent reviews in favor of Windows 2012 and ESXi led to the decision to limit ourself to testing on top of Ubuntu Linux 13.10 (kernel 3.11). You'll see our typical ESXi and Windows benchmarks in a later review.

Benchmark Configuration

There are some differences in the RAM and SSD configurations. The use of different SSDs was due to time constraints as we wanted to test the servers as much as possible in parallel. The RAM configuration differences are a result of the platforms: for example, the quad Intel CPUs only perform at their best when each CPU gets eight DIMMs. The Opteron and Dual Xeon E5-2680 v2 server perform best with one DIMM per channel (1 DPC).

None of these differences have a tangible influence on the results of our benchmarks, as none of them were bottlenecked by the storage system or the amount of RAM that was used. The minimum amount of 64GB of RAM was more than enough for all benchmarks in this review.

We also did not attempt to do power measurements. We will try to do an apples-to-apples power comparison at a later time.

Intel S4TR1SY3Q "Brickland" IVT-EX 4U-server

The latest and greatest from Intel consists of the following components:

CPU	4x Xeon E7-4890 v2 (D1 stepping) 2.8GHz 15 cores, 37.5MB L3, 155W TDP
RAM	256GB, 32x8GB Samsung 8GB DDR3 M393B1K70DH0-YK0 at 1333MHz
Motherboard	Intel CRB Baseboard "Thunder Ridge"
Chipset	Intel C602J
PSU	2x1200W (2+0)

Total amount of DIMM slots is 96. When using 64GB LRDIMMs, this server can offer up to 6TB of RAM! In some cases, we have tested the E7-4890 v2 at a lower maximum clock in order to do clock-for-clock comparisons with the previous generation, and in a few cases we have also disabled three of the cores in order to simulate performance of some of the 12-core Ivy Bridge EX parts. For example, a E7-4890 v2 at 2.8 GHz with 3 cores disabled (12 cores total) gives you a good idea how the much less expensive E7- 8857 v2 at 3 GHz would perform: it would perform about 7% higher than the 12-core E7-4890 v2.

Intel Quanta QSCC-4R Benchmark Configuration

The previous quad Xeon E7 server, as reviewed here.

CPU	4x Xeon X7560 at 2.26GHz or 4x Xeon E7-4870 at 2.4GHz
RAM	16x8GB Samsung 8GB DDR3 M393B1K70DH0-YK0 at 1066MHz
Motherboard	QCI QSSC-S4R 31S4RMB00B0
Chipset	Intel 7500
BIOS version	QSSC-S4R.QCI.01.00.S012,031420111618
PSU	4x850W Delta DPS-850FB A S3F E62433-004 850W

The server can accept up to 64 32GB Load Reduced DIMMs (LR-DIMMs) or 2TB.

Intel's Xeon E5 server R2208GZ4GSSPP (2U Chassis)

This is the server we used in our Xeon "Ivy bridge EP" review.

CPU	2x Xeon processor E5-2680 (2.8GHz, 10c, 25MB L3, 115W)
RAM	128GB (8 x 16GB) Micron MT36JSF2G72PZ – BDDR3-1866
Internal Disks	2 x Intel MLC SSD710 200GB
Motherboard	Intel Server Board S2600GZ "Grizzly Pass"
Chipset	Intel C600
BIOS version	SE5C600.86B (August the 6th, 2013)
PSU	Intel 750W DPS-750XB A (80+ Platinum)

The Xeon E5 CPUs have four memory channels per CPU and support up to DDR3-1866, and thus our dual CPU configuration gets eight DIMMs for maximum bandwidth.

Supermicro A+ Opteron server 1022G-URG (1U Chassis)

This Opteron server is not comparable in any way with the featured Intel systems as it is not targeted at the same market and costs a fraction of the other machines. Nevertheless, here's our test configuration.

CPU	2x Opteron "Abu Dhabi" 6376 at 2.3GHz
RAM	64GB (8x8GB) DDR3-1600 Samsung M393B1K70DH0-CK0
Motherboard	SuperMicro H8DGU-F
Internal Disks	2 x Intel MLC SSD710 200GB
Chipset	AMD Chipset SR5670 + SP5100
BIOS version	R3.5
PSU	SuperMicro PWS-704P-1R 750Watt

The Opteron server in this review is only here to satisfy curiosity. We want to see how well the Opteron fares in our new Linux benchmarks.

Our Test System Integer Performance

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

125 Comments

View All Comments

Brutalizer - Tuesday, February 25, 2014 - link
Clusters can not replace SMP servers. Clusters can not run SMP workloads.
Kevin G - Tuesday, February 25, 2014 - link
I'm sorry, but it is considered best practice to run databases in pairs for redundancy. For example, here is an Oracle page explaining how clustering is used to maintain high availability: http://docs.oracle.com/cd/B28359_01/server.111/b28...

Other databases like MySQL and MS SQL Server have similar offerings.

There is a reason why big hardware like this is purchased in pair or sets of three.
EmmR - Friday, March 14, 2014 - link
Kevin G. you are actually correct. We are in the process for comparing performance of Power7+ vs Xeon v2 for SAP batch workload and we got pretty much the same arguments from our AIX guys as Brutalizer mentionned.

We are using real batch jobs rather than an synthetic benchmark and we set up each system to compare core-for-core, down to running a memory defrag on the Power system to make sure memory access is a good as possible. The only thing we could not fix is that in terms of network access, the Intel system was handicapped.

What we are seeing is that we can tune the Intel system to basically get similar performance (< 5% difference of total runtime) than from the Power7+ system (P780). This was quite unexpected but it's an illustration of how far Intel and the hardware vendors building servers/blades based on those CPUs have come.
Kevin G - Monday, March 17, 2014 - link
Looking at the Xeon E7 V2's right now is wise since they're just hitting market and the core infrastructure is expected to last three generations. It wouldn't surprise me if you can take a base system today using memory daughter cards and eventually upgrade it to Broadwell-EX and more DDR4 memory by the end of the product life cycle. This infrastructure is going to be around for awhile.

POWER7+ on the other hand is going to be replaced by the POWER8 later this year. I'd expect it to perform better than the POWER7+ though how much will have to wait for the benchmarks after it is released. There is always going to be something faster/better/cheaper coming down the road in the computing world. Occasionally waiting makes sense due to generational changes like this. Intel and IBM tend to leap frog each other and it is IBM's turn to jump.

Ultimately if you gotta sign the check next week, I'd opt for the Xeon but if you can hold off a few months, I'd see what the POWER8 brings.
EmmR - Monday, March 17, 2014 - link
Power8 will be interesting to look at, but based on current data it will have to yield a pretty impressive performance boost over Power7+ (and Xeon v2) in order to be competitive on a performance per dollar spent.
Kevin G - Monday, March 17, 2014 - link
IBM is claiming two to three times the throughput over POWER7+. Most of that gain isn't hard to see where it comes from: increasing the core count from 8 to 12. That change alone will put it ahead of the Xeon E7 v2's in terms of raw performance. Minor IPC and clock speed increases are expected too. The increase from 4 way to 8 way SMT will help some workloads, though it could also hurt others (IBM does support dynamic changes in SMT so this is straightforward to tune). The rest will likely come from system level changes like lower memory access times thanks to the L4 cache on the serial-to-parallel memory buffer and more bandwidth all around. What really interests me is that IBM is finally dropping the GX bus they introduced for coherency in the POWER4. What the POWER8 does is encapsulates coherency over a PCIe physical link. It'll be interesting to see how it plays out.

As you may suspect, the cost of this performance may be rather high. We'll have to see when IBM formally launches systems.
amilayajr - Thursday, March 6, 2014 - link
I think Brutalizer is saying that, this new Xeon CPU is pretty much for targeted market. Unix since then has been the backbone of the internet, Intel as much as they can they want to cover the general area of server market. Sure it's a nice CPU, but as reliability goes, I would rather use a slower system but reliable in terms of calculations. I would still give intel the thumbs up for trying something new or updating the cpu. As for replacing unix servers for large database enterprise servers, probably not in a long time for intel. I would say to intel to leave on the real experts on this area that just focuses on these market. Intel is just covering their turf for smaller scale server market.
Kevin G - Thursday, March 6, 2014 - link
The x86 servers have caught up in RAS features. High end features like hot memory add/remove are available on select systems. (Got a bad DIMM? Replace it while the systems is running.) Processor add/remove on a running system is also possible on newer systems but requires some system level support (though I'm not immediately familiar with a system offering it.) In most cases with the base line RAS features, Xeons are more than good enough for the job. Hardware lockstep is also an option on select systems.

Uses for ultra high end features like two bit error correction for memory, RAID5-like parity across memory channels, and hot processor add/remove are a very narrow niche. Miscellaneous features like instruction replay don't actually add much in terms of RAS (replay on Itanium is used mainly to fill up unused instruction slots in its VLIW architecture, where as lock step would catch a similar error in all cases). Really, the main reason to go with Unix is on the software side, not the hardware side anymore.
djscrew - Wednesday, March 12, 2014 - link
"Sound like we are solving a problem with hardware instead of being innovative in software."

that doesn't happen... ever... http://www.anandtech.com/show/7793/imaginations-po... ;)
mapesdhs - Sunday, February 23, 2014 - link

Brutalizer writes;
"Some examples of Scale-out servers (clusters) are all servers on the Top-500 supercomputer list. Other examples are SGI Altix / UV2000 servers or the ScaleMP server, they have 10,000s of cores and 64 TB RAM or more, i.e. cluster. Sure, they run a single unified Linux kernel image - but they are still clusters. ..."

Re the UV, that's not true at all. The UV is a shared memory system with a hardware MPI
implentation. It can scale codes well beyond just a few dozen sockets. Indeed, some key
work going on atm is how to scale relevant codes beyond 512 CPUs, not just 32 or 64.
The Cosmos installation is one such example. Calling a UV a cluster is just plain wrong.
Its shared memory architecture means it can handle very large datasets (hundreds of
GB) and extremely demanding I/O workloads; no conventional 'cluster' can do that.

Ian.

The Intel Xeon E7 v2 Review: Quad Socket, Up to 60 Cores/120 Threads

Our Benchmark Choices

Post Your Comment

125 Comments

View All Comments

Brutalizer - Tuesday, February 25, 2014 - link

Kevin G - Tuesday, February 25, 2014 - link

EmmR - Friday, March 14, 2014 - link

Kevin G - Monday, March 17, 2014 - link

EmmR - Monday, March 17, 2014 - link

Kevin G - Monday, March 17, 2014 - link

amilayajr - Thursday, March 6, 2014 - link

Kevin G - Thursday, March 6, 2014 - link

djscrew - Wednesday, March 12, 2014 - link

mapesdhs - Sunday, February 23, 2014 - link

Log in

Don't have an account? Sign up now