Intel Xeon E5 Version 3: Up to 18 Haswell EP Cores

Name: Intel Xeon E5 Version 3: Up to 18 Haswell EP Cores
Item: Intel Xeon E5 Version 3: Up to 18 Haswell EP Cores
Author: Johan De Gelas

by Johan De Gelas on September 8, 2014 12:30 PM EST

85 Comments | Add A Comment

85 Comments

SKUs and Pricing

Before we start with the benchmarks, let's first see what you get for your money. To reduce the clutter, we have not listed all of the SKUs but have tried to include useful points of comparison. Also note that we are not comparing pricing or performance with AMD at this point, as AMD has not updated its server CPU offerings for almost 2 years. The Steamroller architecture was very promising and addressed many of the bottlenecks we discovered in the earlier Opteron 6200, but unfortunately it was never made into a high end server CPU. So basically, Intel's only competition right now is the previous generation Xeons, which means Intel has to convince server buyers that upgrading to the latest Xeon pays off.

Intel Xeon E5 v2 versus v3 2-socket SKU Comparison
Xeon E5	Cores/ Threads	TDP	Clock Speed (GHz)	Price	Xeon E5	Cores/ Threads	TDP	Clock Speed (GHz)	Price
High Performance (20 – 30MB LLC)					High Performance (35-45MB LLC)
					2699 v3	18/36	145W	2.3-3.6	$4115
					2698 v3	16/32	135W	2.3-3.6	$3226
2697 v2	12/24	130W	2.7-3.5	$2614	2697 v3	14/28	145W	2.6-3.6	$2702
2695 v2	12/24	115W	2.4-3.2	$2336	2695 v3	14/28	120W	2.3-3.3	$2424
					"Advanced" (20-30MB LLC)
2690 v2	10/20	130W	3-3.6	$2057	2690 v3	12/24	135W	2.6-3.5	$2090
2680 v2	10/20	115W	2.8-3.6	$1723	2680 v3	12/24	120W	2.5-3.3	$1745
2660 v2	10/20	115W	2.2-3.0	$1389	2660 v3	10/20	105W	2.6-3.3	$1445
2650 v2	8/16	95W	2.6-3.4	$1166	2650 v3	10/20	105W	2.3-3.0	$1167
Midrange (10 – 20MB LLC)					Midrange (15-25MB LLC)
2640 v2	8/16	95W	2.0-2.5	$885	2640 v3	8/16	90W	2.6-3.4	$939
2630 v2	6/12	80W	2.6-3.1	$612	2630 v3	8/16	85W	2.4-3.2	$667
Frequency optimized (15 – 25MB LLC)					Frequency optimized (10-20MB LLC)
2687W v2	8/16	150W	3.4-4.0	$2108	2687W v3	10/20	160W	3.1-3.5	$2141
2667 v2	8/16	130W	3.3-4.0	$2057	2667 v3	8/16	135W	3.2-3.6	$2057
2643 v2	6/12	130W	3.5-3.8	$1552	2643 v3	6/12	135W	3.4-3.7	$1552
2637 v2	4/12	130W	3.5-3.8	$996	2637 v3	4/8	135W	3.5-3.7	$996
Budget (15MB LLC)					Budget (15MB LLC)
2609 v2	4/4	80W	2.5	$294	2609 v3	6/6	85W	1.9	$306
2603 v2	4/4	80W	1.8	$202	2603 v3	6/6	85W	1.6	$213
Power Optimized (15 – 25MB LLC)					Power Optimized (20-30MB LLC)
2650L v2	10/20	70W	1.7-2.1	$1219	2650L v3	12/24	65W	1.8-2.5	$1329
2630L v2	6/12	70W	2.4-2.8	$612	2630L v3	8/16	55W	1.8-2.9	$612

At the top of the product stack is the new E5-2699 v3, and it's priced accordingly: over $4000 for the most cores Intel has ever put in a Xeon processor. TDP has also gone up compared to the previous generation's top SKU, but for six additional cores that's probably reasonable.

At first glance, the 2695 v3 looks interesting for the performance hungry as it the cheapest "HCC" (High Core Count) option. You get the largest die with the two memory controllers, 35MB LLC, two rings, and TDP is limited to 120W. Of course the question is how well Turbo Boost will compensate for the relatively low base clock.

For those looking for a good balance between price/performance and power, the 2650L v3 offers a 100MHz higher clock, much higher Turbo Boost, two extra cores, and a slightly lower TDP for about $100 more. This SKU looks very tempting for people who do not need the ultimate in processing power, e.g. those looking for a host for their VMs.

Lastly, there is the 2667 v3 which has a high base clock (3.2) and a still reasonable TDP of 135W for all applications that need processing power but do not scale beyond a certain core count.

Those are the SKUs that we have included in this review, so let's see how they fare.

Improved Support for LRDIMMs Benchmark Configuration and Methodology

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

85 Comments

View All Comments

shodanshok - Tuesday, September 16, 2014 - link
Hi,
Please note that the RWT article you are endlessy posting is 10 (TEN!) years ago.

SGI tell the extact contrary of what you reports:
https://www.sgi.com/pdfs/4227.pdf

Altrix UV systems are shared memory system connecting the various boards (4-8 sockets per board) via QPI and NUMAlink. They basically are a distribuited version of your beloved scale-up server. After all, maximum memory limit is 16 TB, which is the address space _a single xeon_ can address.

I am NOT saying that commodity X86 hardware can replace proprietary, big boxes in every environment. What I am saying it that the market nice for bix unix boxes is rapidly shrinking.

So, to recap:
1) in an article about Xeon E5 (4000$ max) you talk about the the mighty M7 (which are NOT available) with will probably cost 10-20X (and even T4/T5 are 3-5X);

2) you speak about SPECInt2006 conveniently skipping about anything other that throughput, totalling ignoring latency and per-thread perf (and event in pure throughput Xeons are very competitive at a fraction of the costs)

3) you totally ignore the fact that QPI and NUMAlink enable multi-board system to act as a single one, running a single kernel image within a shared memory environment.

Don't let me wrong: I am not an Intel fan, but I must say I'm impressed with the Xeons it is releasing since 4 years (from Nehalem EX). Even their (small) Itanium niche is at risk, attacked by higher end E7 systems.

Maybe (and hopefully) Power8 and M7 will be earth-shattering, but they will surely cost much, much more...

Regards.
Brutalizer - Friday, September 19, 2014 - link
This is folly. The link I post where SGI says their "Altix" server is only for HPC clustered workloads, applies also today to the "Altix" successor: the "Altix UV". Fact is that no large Unix or Mainframe vendor has successfully scaled beyond 32/64 sockets. And now SGI, a small cluster vendor with tiny resources, claims to have 256 socket server, with tiny resources compared to the large Unix companies?? Has SGI succeeded where no one else has, pouring decades and billions of R&D?

As a response you post a link where SGI talks about their "Altix UV", and you claim that link as a evidence that the Altix UV server is not a cluster. Well, if you bothered to read your link, you would see that SGI has not change their viewpoint: it is only for HPC clustered workloads. For instance, "Altix UV" talks about MPI. MPI is only used in clusters, mainly for number crunching. I have worked with MPI in scientific computations, so I know this. No one would use MPI in a SMP server, such as the Oracle M7. Anyone talking about MPI, is also talking about clusters. For instance, enterprise software such as SAP does not use MPI.

As a coup de grace, I quote text from your link about the latest "Altix UV" server:
"...The key enabling feature of SGI Altix UV is the NUMAlink 5 interconnect, with additional performance characteristics contributed by the on-node hub and its MPI Offload Engine (MOE)...MOE is designed to take MPI process communications off the microprocessors, thereby reducing CPU overhead and lowering memory access latency, and thus improving MPI application performance and scalability. MOE allows MPI tasks to be handled in the hub, freeing the processor for computation. This highly desirable concept is being pursued by various switch providers in the HPC cluster arena;
...
But fundamentally, HPC is about what the user can achieve, and it is this holy quest that SGI has always strived to enable with its architectures..."

Maybe this is the reason you will not find SAP benchmarks on the largest "Altix UV" server? Because it is a cluster.

But of course, you are free to disprove me by posting SAP benchmarks on a large Linux server with 10.000s of cores (i.e. clusters). I agree that if that SGI cluster runs SAP faster than a SMP 32-socket server - then it does not matter if SGI is cluster or not. The point is; clusters can not run all workloads, they suck at Enterprise workloads. If they can run Enterprise workloads, then I change my mind. Because, in the end, it does not matter how the hardware is constructed, as long as it can run SAP fast enough. But clusters can not.

Post SAP benchmarks on a large Linux server. Go ahead. Prove me wrong when you say they are not clusters - in that case they would be able to handle non clustered workloads such as SAP. :)
shodanshok - Friday, September 19, 2014 - link
Brutalizer, I am NOT (NOT!!!) saying that x86 is the best-of-world in scale-up performance. After all, it remain commodity hardware, and some choices clearly reflect that. For example, while Intel put single-image systems at as much as 256 sockets, the latency induced by the switchs/interconnect surely put the real number way lower.

What I am saying in that the market that truly need big Unix boxes is rapidly shrinking, so your comment about how "mediocre" is this new 18-core monster are totally off place.

Please note that:
1) Altrix UV are SHARED MEMORY systems built out of clusters, where the "secret sauce" is the added tech behind NUMAlink. Will SAP run well on these systems? I think no: the NUMAlinks add too much latency. However, this same tech can be used in a number of cases where big unix boxes where the first choice (at least in SGI words, I don't have a similar system (unfortunately!) so I can't tell more;

2) HP has just released the SAP HANA benchmarks for 16 sockets Intel E7 in scale-up configuration (read: single system) and 12/16 TB of RAM
LINK1 :http://h30507.www3.hp.com/t5/Converged-Infrastruct...
LINK2: http://h30507.www3.hp.com/t5/Reality-Check-Server-...
LINK3: http://h20195.www2.hp.com/V2/GetPDF.aspx%2F4AA5-14...

3) Even at 8 sockets, the Intel systems are very competitive. Please read here for some benchmarks: http://www.anandtech.com/show/7757/quad-ivy-brigde...
Long story short: an 8S Intel E7-8890 (15 cores @ 2.8 GHz) beat an 8S Oracle T5-8 (16 cores @ 3.6 GHz) by a significant margin. Now think about 18 Haswell cores...

4) On top of that, event high-end E7 Intel x86 systems are way cheaper that Oracle/IBM box, while providing similar performances. The real differentation are the extreme RAS features integrated into proprietary unix boxes (eg: lockstep) that require custom, complex glue logic on x86. And yes, some unix boxes have impressive amount of memory ;)

5) This article speak about *Haswell-EP*. They are one (sometime even two...) order of magnitude cheaper that proprietary unix boxes. So, why on earth in each Xeon article you complain about how mediocre is that technology?

Regards.
Brutalizer - Monday, September 22, 2014 - link
I hear you when you say that x86 has not the best scaleup performance. I am only saying that those 256-socket x86 servers you talk of, are in practice, nothing more than a cluster. Because they are only used for clustered HPC workloads. They will never run Enterprise business software as a large SMP server with 32/64 sockets - that domain are exclusive to Unix/Mainframe servers.

It seems that we disagree on the 256-socket x86 servers, but agree on everything else (x86 are cheaper than RISC, etc). I claim they can only be used as clusters (you will only find HPC cluster benchmarks). So, those large Linux servers with 10.000 cores such as SGI Altix UV, are actually only usable as clusters.

Regarding HP SAP HANA benchmarks with the 16-socket x86 server called ConvergedSystem 9000; it is actually a Unix Superdome server (a RISC server) where HP swapped all Itanium cpus to x86 cpus. Well, it is good that there are soon 16-sockets Linux servers available on the market. But HANA is a clustered database. I would like to see the HP ConvergedSystem server running non clustered Enterprise workloads - how well would the first 16-socket Linux server perform? We have to see. And then we can compare the fresh Linux 16-socket server to the mature 32/64-socket Unix/Mainframe servers in benchmarks and see which is fastest. A clustered Linux 256-socket server sucks on SMP benchmarks, it would be useless.
Brutalizer - Monday, September 22, 2014 - link
http://www.enterprisetech.com/2014/06/02/hps-first...
"...The first of several systems that will bring technologies from Hewlett-Packard’s Superdome Itanium-based machines to big memory ProLiant servers based on Xeon processors is making its debut this week at SAP’s annual customer shindig.

Code-named “Project Kraken,” the system is commercialized as the ConvergedSystem 900 for SAP HANA and as such has been tuned and certified to run the latest HANA in-memory database and runtime environment. The machine, part of a series of high-end shared memory systems collected known as “DragonHawk,” is part of a broader effort by HP to create Superdome-class machines out of Intel’s Xeon processors.
...

The obvious question, with SAP allowing for HANA nodes to be clustered, is: Why bother with a big NUMA setup instead of a cluster? “If you look at HANA, it is really targeting three different workloads,” explains Miller. “You need low latency for transactions, and in fact, you can’t get that over a cluster...."
TiGr1982 - Tuesday, September 9, 2014 - link
Our RISC scale-up evengelist is back!

That's OK and very nice, nobody argues, but I guess one has to win a serious jackpot to afford one of these 32 socket Oracle SPARC M7-based machines :)

Jokes aside, technically, you are correct, but Xeon E5 is obviously not about the very best scale-up on the planet, because Intel is aiming more at a mainstream server market. So, Xeon E5 line resides in a totally different price range than your beasty 32 socket scale-up, so what's the point of writing about SPARC M7 here?
TiGr1982 - Tuesday, September 9, 2014 - link
Talking Intel, even Xeon E7 is much lower class line in terms of total firepower (CPU and RAM capability) than your beloved 32 socket SPARC Mx, and even Xeon E7 is much cheaper, than your Mx-32, so, again, what's the point of posting this in the article about E5?
Brutalizer - Wednesday, September 10, 2014 - link
The point is, people believes that building a huge SMP server with as many as 32-sockets is easy. Just add a few of Xeon E5 and you are good to go. That is wrong. It is exponentially more difficult to build a SMP server than a cluster. So, no one has ever sold such a huge Linux server with 32-sockets. (IBM P795 is a Unix server that people tried to compile Linux for, but it is not Linux server, it is a RISC AIX server)
TiGr1982 - Wednesday, September 10, 2014 - link
Well, I comprehend and understand your message, and I agree with you. Huge SMP scale-up servers are really hard to build, mostly because of the dramatically increasing complexity of the problem to implement the REALLY fast (both in terms of bandwidth and latency) interconnect between sockets in case when socket count grows considerably (say, up to 32), which is really required in order to get the true SMP machine.

I hope, other people get your message too.

BTW, I remember you already posted this kind of statements in the Xeon E7 v2 article comments before :-)
Brutalizer - Monday, September 15, 2014 - link
"...I hope, other people get your message too...."

Unfortunately, they dont. See "shodanshok" reply above, that the 256 socket xeon servers are not clusters. And see my reply, why they are.

Intel Xeon E5 Version 3: Up to 18 Haswell EP Cores

Post Your Comment

85 Comments

View All Comments

shodanshok - Tuesday, September 16, 2014 - link

Brutalizer - Friday, September 19, 2014 - link

shodanshok - Friday, September 19, 2014 - link

Brutalizer - Monday, September 22, 2014 - link

Brutalizer - Monday, September 22, 2014 - link

TiGr1982 - Tuesday, September 9, 2014 - link

TiGr1982 - Tuesday, September 9, 2014 - link

Brutalizer - Wednesday, September 10, 2014 - link

TiGr1982 - Wednesday, September 10, 2014 - link

Brutalizer - Monday, September 15, 2014 - link

Log in

Don't have an account? Sign up now