Intel Xeon E5-2697 v2 and Xeon E5-2687W v2 Review: 12 and 8 Cores

Name: Intel Xeon E5-2697 v2 and Xeon E5-2687W v2 Review: 12 and 8 Cores
Item: Intel Xeon E5-2697 v2 and Xeon E5-2687W v2 Review: 12 and 8 Cores
Author: Dr. Ian Cutress

by Ian Cutress on March 17, 2014 11:59 AM EST

Posted in
CPUs
Intel
Xeon
Enterprise

71 Comments | Add A Comment

71 Comments

Intel’s roadmap goes through all the power and market segments, from ultra-low-power, smartphones, tablets, notebooks, desktops, mainstream desktops, enthusiast desktops and enterprise. Enterprise differs from the rest of the market, requiring absolute stability, uptime and support should anything go wrong. High-end enterprise CPUs are therefore expensive, and because buyers are willing to pay top dollar for the best, Intel can push core counts, frequency and thus price much higher than in the consumer space. Today we look at two CPUs from this segment – the twelve core Xeon E5-2697 v2 and the eight core Xeon E5-2687W v2.

Firstly I would like to say a big thank you to GIGABYTE Server for the opportunity to test these CPUs in their motherboard, the GA-6PXSV3. This motherboard is the focus of a review at a later date.

Intel’s Enterprise Line

High-end enthusiasts always want hardware to be faster, more powerful, and contain more cores than what is available on the market today. The problem here is two-fold: cost and volume. Were Intel to produce a product for the consumer market at more than $1000, a large part of the market would complain that the ultra-high-end is too expensive. The other issue is volume – it can be hard to gauge just how many CPUs would be sold. For example, the consumer level i7-4930K was the preferred choice for many enthusiasts as it was several hundred dollars cheaper than the i7-4960X despite being a fraction slower at stock frequencies. The ultra-high-end enthusiast also wants all the bells and whistles, such as overclockability, a good range of DRAM speed support and top quality construction materials.

At some point, Intel has to draw the line. The enterprise line of CPUs is different to the consumer in more ways than we might imagine. Due to the requirements of stability, overclocking is knocked on the head for all modern Intel Xeon CPUs. For clarification, the Westmere-EX CPU line (Xeon X5760 et al., socket 1366) was the last line of overclockable Xeons. The Xeon line of CPUs must also support enterprise level memory – UDIMMs and RDIMMs, ECC and non-ECC. This leads up to quad-rank DRAM support, such as 32GB modules that themselves can cost more than a CPU.

Some enterprise CPUs are also designed to speak to other CPUs in multiprocessor systems. On the Intel side, this means a point-to-point QPI link between each CPU in the system. Johan and I have recently tested several multiprocessor systems [1,2,3,4] and as such these features develop over time, cost R&D, and are focused purely on the enterprise sector.

Virtualization is also another feature Intel limits to certain CPUs, although both some consumer and some enterprise Xeons have them. The defining counterpart tends to be overclockability – if a consumer CPU is listed as overclockable, it does not have VT-d extensions for directed I/O. For users that want ECC memory and virtualization at a lower cost, the enterprise product stack often offers lower core/lower frequency parts at lower price points.

While not necessarily verifiable, there have been reports that Xeon processors are actually the better quality samples that come from the fabs. These are CPUs that have better frequency-to-voltage characteristics and have better chance of running cooler. The main reason this report exists is that when Xeons were overclockable back in Westmere, they were more likely to overclock further than the consumer versions. Also it would make sense from Intel’s point of view – the enterprise customer is paying more for their hardware, and as such a better product in terms of energy consumption or thermals would keep those customers happy.

The Xeon Product Line

Intel splits the naming of its Xeons up according to feature set and architecture. For single processor systems using the LGA1150 socket, we get the E3 line of Xeons which at this present time are based on the Haswell architecture and all come under the E3-12xx v3 line:

Intel E3 v3 SKUs
Xeon E3 v3	Cores	TDP (W)	IGP	Base Clock	Turbo Clock	L3 Cache	Price
E3-1220L v3	2/4	13	N/A	1100	1500	4 MB	$193
E3-1220 v3	4/4	80	N/A	3100	3500	8 MB	$193
E3-1225 v3	4/4	84	P4600	3200	3600	8 MB	$213
E3-1230 v3	4/8	80	N/A	3300	3700	8 MB	$240
E3-1240 v3	4/8	80	N/A	3400	3800	8 MB	$262
E3-1245 v3	4/8	84	P4600	3400	3800	8 MB	$276
E3-1270 v3	4/8	80	N/A	3500	3900	8 MB	$328
E3-1275 v3	4/8	84	P4600	3500	3900	8 MB	$339
E3-1280 v3	4/8	82	N/A	3600	4000	8 MB	$612
E3-1285 v3	4/8	84	P4700	3600	4000	8 MB	$662
E3-1265L v3	4/8	45	HD (Haswell)	2500	3700	8 MB	$294
E3-1284L v3	4/8	47	Iris Pro 5200	1800	3200	6 MB	N/A
E3-1285L v3	4/8	65	P4700	3100	3900	8 MB	$774
E3-1230L v3	4/8	25	N/A	1800	2800	8 MB	$250

With Intel’s enthusiast socket, LGA2011, the processors are now split according to their multi-processor capability. Due to the skip-tock cadence of architecture improvements at this level the enthusiast consumer and Xeon line are both one architecture behind the mainstream LGA1150 CPU line. This results in all the LGA2011 Xeons being based on Ivy Bridge-E.

Single processor LGA2011 Xeons are under the title of E5-16xx v2. Dual processor system capable Xeons are E5-26xx v2, and quad processor system capable Xeons are E5-46xx v2. As Johan pointed out in his excellent dive into the improvements over the older architecture, these CPUs come from three die flavors:

The three dies are aimed at workstations/enthusiasts, servers and high performance computing respectively. I’m not going to repeat what Johan already posted, but it is a really good read if you have a chance to look through it.

The final batch of processors are in the high performance category, using the LGA2011-1 socket. These have been recently released as the E7 v2 line (again I will point a link to Johan’s deep dive on the specifics) under the Ivy Bridge-EX moniker. We have E7-28xx v2 for 2P, E7-48xx v2 for 4P and E7-88xx v2 for 8P systems. Cores for these CPUs go all the way up to 15 due to the three banks of five used in the die.

Turbo Modes

As with the consumer line, the base clock speed of an enterprise CPU is usually not the be-all and end-all of performance. Intel’s Turbo Boost lets the CPU speed up when fewer cores are in use, exercising the difference in power consumption of one core, two core or all-core computation. There is no hard and fast rule when it comes to the turbo modes – Intel will quote the top turbo bin in its CPU database ark.intel.com but in order to find out the scale of multi-core (but not all-core) operation, one has to look into the specification pdfs, such as this one.

With over 50 different CPUs mentioned in that document, it is hard to see which CPUs are going to offer more than others. We extracted the data:

Intel E5 SKU Comparison
Xeon E5	Cores	TDP (W)	Base Clock	Turbo Bins	L3 Cache	L3 Cache / Core	Price
E5-46xx
E5-4657L v2	12/24	115	2400	5/4/3/3/3/3/3/3/3/3/3/3	30 MB	2.500	$4,394
E5-4650 v2	10/20	95	2400	5/4/3/3/3/3/3/3/3/3	25 MB	2.500	$3,616
E5-4640 v2	10/20	95	2200	5/4/3/3/3/3/3/3/3/3	20 MB	2.000	$2,725
E5-4624L v2	10/20	70	1900	6/6/5/5/4/4/3/3/2/2	25 MB	2.500	$2,405
E5-4627 v2	8/8	130	3300	3/2/2/2/2/2/2/2	16 MB	2.000	$2,180
E5-4620 v2	8/16	95	2600	4/3/2/2/2/2/2/2	20 MB	2.500	$1,611
E5-4610 v2	8/16	95	2300	4/3/2/2/2/2/2/2	16 MB	2.000	$1,219
E5-4607 v2	6/12	95	2600	0/0/0/0/0/0	15 MB	2.500	$885
E5-4603 v2	4/8	95	2200	0/0/0/0	10 MB	2.500	$551
E5-x6xx
E5-2697 v2	12/24	130	2700	8/7/6/5/4/3/3/3/3/3/3/3	30 MB	2.500	$2,614
E5-2695 v2	12/24	115	2400	8/7/6/5/4/4/4/4/4/4/4/4	30 MB	2.500	$2,336
E5-2687W v2	8/16	150	3400	6/5/4/3/2/2/2/2	25 MB	3.125	$2,108
E5-2667 v2	8/16	130	3300	7/6/5/4/3/3/3/3	25 MB	3.125	$2,057
E5-2690 v2	10/20	130	3000	6/5/4/3/3/3/3/3/3/3	25 MB	2.500	$2,057
E5-2658 v2	10/20	95	2400	6/6/5/5/4/4/3/3/2/2	25 MB	2.500	$1,750
E5-1680 v2	8/16	130	3000	9/8/7/5/4/4/4/4	25 MB	3.125	$1,723
E5-2680 v2	10/20	115	2800	8/7/6/5/4/3/3/3/3/3	25 MB	2.500	$1,723
E5-2643 v2	6/12	130	3500	3/2/1/1/1/1	25 MB	4.167	$1,552
E5-2670 v2	10/20	115	2500	8/7/6/5/4/4/4/4/4/4	25 MB	2.500	$1,552
E5-2648L v2	10/20	70	1900	6/6/5/5/4/4/3/3/2/2	25 MB	2.500	$1,479
E5-2660 v2	10/20	95	2200	8/7/6/5/4/4/4/4/4/4	25 MB	2.500	$1,389
E5-2650L v2	10/20	70	1700	4/3/2/2/2/2/2/2/2/2	25 MB	2.500	$1,219
E5-2628L v2	8/16	70	1900	5/5/4/4/3/3/2/2	20 MB	2.500	$1,216
E5-2650 v2	8/16	95	2600	8/7/6/5/5/5/5/5	20 MB	2.500	$1,166
E5-1660 v2	6/12	130	3700	3/2/1/1/1/1	15 MB	2.500	$1,080
E5-2637 v2	4/8	130	3500	3/2/1/1	15 MB	3.750	$996
E5-2640 v2	8/16	95	2000	5/4/3/3/3/3/3/3	20 MB	2.500	$885
E5-2618L v2	6/12	50	2000	0/0/0/0/0/0	15 MB	2.500	$632
E5-2630 v2	6/12	80	2600	5/4/3/3/3/3	15 MB	2.500	$612
E5-2630L v2	6/12	60	2400	4/3/2/2/2/2	15 MB	2.500	$612
E5-1650 v2	6/12	130	3500	4/2/2/2/1/1	12 MB	2.000	$583
E5-2620 v2	6/12	80	2100	5/4/3/3/3/3	15 MB	2.500	$406
E5-1620 v2	4/8	130	3700	2/0/0/0	10 MB	2.500	$294
E5-2609 v2	4/4	80	2500	0/0/0/0	10 MB	2.500	$294
E5-1607 v2	4/4	130	3000	0/0/0/0	10 MB	2.500	$244
E5-2603 v2	4/4	80	1800	0/0/0/0	10 MB	2.500	$202
E5-x4xx
E5-2470 v2	10/20	95	2400	8/7/6/5/4/4/4/4/4/4	25 MB	2.500	$1,440
E5-2448L v2	10/20	70	1800	6/6/5/5/4/4/3/3/2/2	25 MB	2.500	$1,424
E5-2450L v2	10/20	60	1700	4/3/2/2/2/2/2/2/2/2	25 MB	2.500	$1,219
E5-2450 v2	8/16	95	2500	8/7/6/5/4/4/4/4	20 MB	2.500	$1,107
E5-2428L v2	8/16	60	1800	5/5/4/4/3/3/2/2	20 MB	2.500	$1,013
E5-2440 v2	8/16	95	1900	5/4/3/3/3/3/3/3	20 MB	2.500	$832
E5-2430L v2	6/12	60	2400	4/3/2/2/2/2	15 MB	2.500	$612
E5-2418L v2	6/12	50	2000	0/0/0/0	15 MB	2.500	$607
E5-1428L v2	6/12	60	2200	5/4/3/2/2/2	15 MB	2.500	$474
E5-2420 v2	6/12	80	2200	5/4/3/3/3/3	15 MB	2.500	$406
E5-2407 v2	4/4	80	2400	0/0/0/0	10 MB	2.500	$250
E5-2403 v2	4/4	80	1800	0/0/0/0	10 MB	2.500	$192
Pentium 1405 v2	2/2	40	1400	0/0	6 MB	3.000	$156
E5-1410 v2	4/8	80	2800	4/4/3/3	10 MB	2.500	N/A
Pentium 1403 v2	2/2	80	2600	0/0	6 MB	3.000	N/A

But even this is hard to parse. Some CPUs start off at 3.0 GHz base frequency and have a 900 MHz turbo bin, whereas others move no more than 300 MHz from their base clock. A few CPUs are worthy of attention from our analysis:

The E5-2643 v2 has the most L3 Cache per core of any CPU, at 4.16 MB/core. This is a 10c die offering all 25 MB of L3 cache, but only six cores are active. Reasons for this include database applications that need a large amount of L3 cache per core. For licensing agreements that hinge on per-core pricing, having a larger amount of L3 per core could help save some money by needing fewer cores.

The E5-2667 v2 is a better chip than the E5-2687W v2. The latter gets attention due to its 150W TDP, high base clock and having a ‘W’ in the name. This is partly why I requested it for this review. But the E5-2667 v2 sounds better – a lower TDP (130W vs. 150W), and when you apply all the turbo bins into operation, both CPUs have the same frequency vs. core loading. Both CPUs have a maximum turbo bin of 4.0 GHz, moving down identically to an all-core loading of 3.6 GHz. The E5-2667 v2 is also a cheaper option, and according to the specification sheets can use 768 GB of memory per core, compared to the E5-2687W v2 which can only manage 256 GB.

Low power CPU additions keep their turbo speeds higher for longer. If we look at the turbo bin for a mid-range low power CPU, such as the E5-2628L v2, it goes in pairs: 5/5/4/4/3/3/2/2. The non-low-power processors often end up having a high turbo bin which decreases quickly, such as the E5-2680 v2, which goes 8/7/6/5/4/3/3/3/3.

Mac Pro Xeon Options, Test Setup, Power Consumption

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

71 Comments

View All Comments

vLsL2VnDmWjoTByaVLxb - Monday, March 17, 2014 - link
> TrueCrypt is an off the shelf open source encoding tool for files and folders.

Encoding?
Brutalizer - Monday, March 17, 2014 - link
I would not say these cpus are for high end market. High end market are huge servers, with as many as 32 sockets, some monster servers even have 64 sockets! These expensive Unix RISC servers or IBM Mainframes, have extremely good RAS. For instance, some Mainframes do every calculation in three cpus, and if one fails it will automatically shut down. Some SPARC cpus can replay instructions if something went wrong. Hotswap cpus, and hotswap RAM. etc etc. These low end Xeon cpus have nothing of that.

PS. Remember that I distinguish between a SMP server (which is a single huge server) which might have 32/64 sockets and are hugely expensive. For instance, the IBM P595 32-socket POWER6 server used for the old TPC-C record, costed $35 million. No typo. One single huge 32 socket server, costed $35 million. Other examples are IBM P795, Oracle M5-32 - both have 32 sockets. Oracle have 32TB RAM which is the largest RAM server on the market. IBM Mainframes also belong to this category.

In contrast to this, every server larger than 32/64 sockets, is a cluster. For instance the SGI Altix or UV2000 servers, which sports up to 262.000 cores and 100s of TB. These are the characteristica of supercomputer clusters. These huge clusters are dirt cheap, and you pay essentially the hardware cost. Buy 100 nodes, and you pay 100 x $one node. Many small computers in a cluster.

Clusters are only used for HPC number crunching. SMP servers (one single huge server, extremely expensive because it is very difficult to scale beyond 16 sockets) are used for ERP business systems. An HPC cluster can not run business systems, as SGI explains in this link:
http://www.realworldtech.com/sgi-interview/6/
"The success of Altix systems in the high performance computing market are a very positive sign for both Linux and Itanium. Clearly, the popularity of large processor count Altix systems dispels any notions of whether Linux is a scalable OS for scientific applications. Linux is quite popular for HPC and will continue to remain so in the future,...However, scientific applications (HPC) have very different operating characteristics from commercial applications (SMP). Typically, much of the work in scientific code is done inside loops, whereas commercial applications, such as database or ERP software are far more branch intensive. This makes the memory hierarchy more important, particularly the latency to main memory. Whether Linux can scale well with a SMP workload is an open question. However, there is no doubt that with each passing month, the scalability in such environments will improve. Unfortunately, SGI has no plans to move into this SMP market, at this point in time."
Kevin G - Tuesday, March 18, 2014 - link
@Brutalizer
And here we go again. ( http://anandtech.com/comments/7757/quad-ivy-brigde... )

“These low end Xeon cpus have nothing of that.”

This is actually accurate as the E5 is Intel’s midrange Xeon series. Intel has the E7 line for those who want more RAS or scalability to 8 sockets. Features like memory hot swap can or lock step mirroring can be found in select high end Xeon systems. If you want ultra high end RAS, you can find it if you need it as well as pay the premium price premium for it.

“In contrast to this, every server larger than 32/64 sockets, is a cluster. For instance the SGI Altix or UV2000 servers, which sports up to 262.000 cores and 100s of TB. These are the characteristica of supercomputer clusters. These huge clusters are dirt cheap, and you pay essentially the hardware cost. Buy 100 nodes, and you pay 100 x $one node.”

Incorrect on several points but they’ve already been pointed out to you. The UV2000 is fully cache coherent (with up to 64 TB of memory) with a global address space that operates as one uniform, logical system that only a single OS/Hypervisor is necessary to boot and run.

Secondly, the price of the UV2000 does not scale linearly. There are NUMALink switches that bridge the coherency domains that have to be purchased to scale to higher node counts. This is expected of how the architecture scales and is similar to other large scale systems from IBM and Oracle.

“Clusters are only used for HPC number crunching.”

Incorrect. Clustering is standard in what you define as SMP applications (big business ERP). It is utilized to increase RAS and prevent downtime. This is standard procedure in this market.

“SMP servers (one single huge server, extremely expensive because it is very difficult to scale beyond 16 sockets) are used for ERP business systems. An HPC cluster can not run business systems,”

Why? As long as underlaying architecture is the same, they can run. You may not get the same RAS or scale as high in a single logical system but they’ll work. Performance is where you’d expected it on these boxes: a dual socket HPC system will perform roughly one quarter the speed of as the same chips occupying an 8 socket system.

“as SGI explains in this link:
http://www.realworldtech.com/sgi-interview/6/“

As pointed out numerous times before, that link is you cite is a decade old. SGI has moved into the SMP space with the Altix UV series. Continuing to use this link as relevant is plain disingenuous and deceptive.

As for an example of a big ERP application running on such an architecture, the US Post Office run’s Oracle Data Warehousing software on a UV1000. ( https://www.fbo.gov/index?s=opportunity&mode=f... )
Brutalizer - Tuesday, March 18, 2014 - link
Do you really think that UV (which is the successor to Altix) is that different? Windows is Windows, and it will not magically challenge Unix or OpenVMS, in some iterations later. Windows will not be superior to Unix after some development. You think that HPC- Altix will after some development, be superior to Oracle and IBM's huge investments of decades research in billions of USD? Do you think Oracle and IBM has stopped developing their largest servers?

Altix is only for HPC number crunching, says SGI in my link. Today the UV line of servers, has up to 262.000 cores and 100s of TB of RAM. Whereas the largest Unix and IBM Mainframes have 64 sockets and couple of TB RAM, after decades of research.

In a SMP server, all cpus will have to be connected to each other, for this SGI UV2000 with 32.768 cpus, you would need (n²) 540 million (half a billion) threads connecting each cpu. Do you really think that is feasible? Does it sound reasonable to you? IBM and Oracle and HP has had great problems connecting 32 sockets to each other, just look at the connections on the last picture at the bottom, do you see all connections? Now imagine half a billion of them in a server!
http://www.theregister.co.uk/2013/08/28/oracle_spa...

But on the other hand, if you keep the number of connection downs to islands, and then connect the islands to each other, you dont need half a billion. This solution would be feasible. And then you are not in SMP territory anymore: SGI say like this on page 4 about the UV2000 cluster:
www.sgi.com/pdfs/4395.pdf‎
"...SMP is based on intra-node communication using memory shared by all cores. A cluster is made up of SMP compute nodes but each node cannot communicate with each other so scaling is limited to a single compute node...."

Dont you think that a 262.000 core server and 100s of TB of RAM sounds more like a cluster, than a single fat SMP server? And why do the UV line of servers focus on OpenMPI accerators? OpenMPI is never used in SMP workloads, only in HPC.

Do you have any benchmarks where one 32.768 cpu SGI UV2000 demolishes 50-100 of the largest Oracle SPARC M6-32 in business systems? And why is the UV2000 much much cheaper than a 16/32 socket Unix server? Why does a single 32 socket Unix server cost $35 million, whereas a very large SGI cluster with 1000 of sockets is very very cheap?
Kevin G - Tuesday, March 18, 2014 - link
Wow, I think the script you're copy/pasting from needs better revision.

"Do you really think that UV (which is the successor to Altix) is that different?"

Yes. SGI changed the core achitecture to add cache coherent links between the entire system. Clusters tend to have an API on top of a networking software stack to abstract the independent systems so they may act as one. The UV line does not need to do this. For one processor to use memory and performance calculations on data residing off of CPU on the other end, a memory read operation is all that is needed on the UV. It is really that simple.

"Windows is Windows, and it will not magically challenge Unix or OpenVMS, in some iterations later."

The UV can run any OS that runs on modern x86 hardware today. Windows, Linux, Solaris (Unix) and perhaps at some point NonStop (HP's mainframe OS http://h17007.www1.hp.com/us/en/enterprise/servers... ). The x86 platform has plenty of choices to choose from.

"You think that HPC- Altix will after some development, be superior to Oracle and IBM's huge investments of decades research in billions of USD? Do you think Oracle and IBM has stopped developing their largest servers?"

What I see SGI offering is another tool alongside IBM and Oracle systems. Also you mention decades of research, then it is also fair to put SGI into that category as that link you love to spam IS A DECADE OLD. Clearly SGI didn't have this technology back in 2004 when that interview was written.

"Today the UV line of servers, has up to 262.000 cores and 100s of TB of RAM. Whereas the largest Unix and IBM Mainframes have 64 sockets and couple of TB RAM, after decades of research."

Actually this is a bit incorrect. IBM can scale to 131,072 cores on POWER7 if the coherency requirement is forgiven. Oh, and this system can run either AIX or Linux when maxed out. Source: http://www.theregister.co.uk/Print/2009/11/27/ibm_...

"In a SMP server, all cpus will have to be connected to each other, for this SGI UV2000 with 32.768 cpus, you would need (n²) 540 million (half a billion) threads connecting each cpu.
http://www.theregister.co.uk/2013/08/28/oracle_spa...

Wow, do you not read your own sources? Not only is your math horribly horribly wrong but the correct methodology is found for calculating the number of links as things scale is in the link you provided. To quote that link: "The Bixby interconnect does not establish everything-to-everything links at a socket level, so as you build progressively larger machines, it can take multiple hops to get from one node to another in the system. (This is no different than the NUMAlink 6 interconnect from Silicon Graphics, which implements a shared memory space using Xeon E5 chips...)"

The full implication here is that if the UV 2000 is not a socket machine, then neither is Oracle's soon-to-be-released 96 socket device. The topology to scale is the same in both cases per your very own source.

"SGI say like this on page 4 about the UV2000 cluster:
www.sgi.com/pdfs/4395.pdf‎"

Fundamentally false. If you were to actually *read* the source material for that quote, it is not describing the UV2000. Rather is speaking generically abou the differences between a cluster and large SMP box on page 4. If you got to page 19, it further describes the UV 2000 as a single system image unlike that of a cluster as defined on page 4.

"Dont you think that a 262.000 core server and 100s of TB of RAM sounds more like a cluster, than a single fat SMP server? And why do the UV line of servers focus on OpenMPI accerators? OpenMPI is never used in SMP workloads, only in HPC."

All I'd say about a 262,000 core server is that it wouldn't fit into a single box. Then again IBM, Oracle and HP are spreading their large servers across multiple chassis so this doesn't bother me at all. The important part is how all these boxes are connected. SGI uses NUMAlink6 which provides cache coherency and a global address space for a single system image. OpenMPI can be used inside of a cache coherent NUMA system as it provides a means to gurantee memory locality when data is used for execution. It is a means of increasing efficiency for applications that use it. However, OpenMPI libraries do not need to be installed for software to scale across all 256 sockets on the UV200. It is purely an option for programmers to take advantage of.

"And why is the UV2000 much much cheaper than a 16/32 socket Unix server? Why does a single 32 socket Unix server cost $35 million, whereas a very large SGI cluster with 1000 of sockets is very very cheap?"

First, to maintain coherency, the UV2000 only scales to 256 sockets/64 TB of memory. Second, the cost of a decked out P795 from IBM in terms of processors (8 sockets, 256 cores) and memory (2 TB) but only basic storage to boot the system is only $6.7 million whole sale. Still expensive but far less than what you're quoting. It'll require some math and reading comprehension to get to that figure but here is the source: http://www-01.ibm.com/common/ssi/ShowDoc.wss?docUR...

I couldn't find pricing for the UV2000 as a complete system but purchasing the Intel processors and memory seperately to get to a 256 socket/64 TB system would be just under $2 million. Note that that figure is just processor + memory, no blade chassis, racks or interconnect to glue everything together. That would also be several million. So yes, the UV2000 does come out to be cheaper but not drastically. That IBM pricing document does highlight why their high end systems costs so much, mainly capacity on demand. The p795 is getting a mainframe like pricing structure where you purchase the hardware and then you have to activate it as an additional cost. Not so on the UV2000.
psyq321 - Tuesday, March 18, 2014 - link
Xeon 2697 v2 is not a "low end" Xeon.

It is part of "expandable server" platform (EP), being able to scale up to 24 cores.

That is far from "low end", at least in 2014.
alpha754293 - Wednesday, March 19, 2014 - link
"High end market are huge servers, with as many as 32 sockets, some monster servers even have 64 sockets!"

Partially true. The entire cabinet might have that many sockets/processors, but on a per-system, per-"box" level, most max out between two and four. You get a few odd balls here and there that would have a daughter board for a true 8-socket system, but those are EXTREMELY rare in actuality. (Tyan, I think had one for the AMD Opterons, and they said that less than 5% of the orders were for the full fledge 8-socket systems).

"PS. Remember that I distinguish between a SMP server (which is a single huge server) which might have 32/64 sockets and are hugely expensive. For instance, the IBM P595 32-socket POWER6 server used for the old TPC-C record, costed $35 million. No typo. One single huge 32 socket server, costed $35 million. Other examples are IBM P795, Oracle M5-32 - both have 32 sockets. Oracle have 32TB RAM which is the largest RAM server on the market. IBM Mainframes also belong to this category."
Again, only partially true. The costs and stuff is correct, but the assumptions that you're writing about is incorrect. SMP is symmetric multiprocessing. BY DEFINITION, that means that "involves a multiprocessor computer hardware and software architecture where two or more identical processors connect to a single, shared main memory, have full access to all I/O devices, and are controlled by a single OS instance that treats all processors equally, reserving none for special purposes." (source: wiki) That means that it is a monolithic system, again, of which, few are TRULY such systems. If you've ever ACTUALLY witnessed the startup/bootup sequence of an ACTUAL IBM mainframe, the rest of the "nodes" are actually booted up typically by PXE or something very similiar to that, and then the "node" is ennumerated into the resource pool. But, for all other intents and purposes, they are semi-independent, standalone systems, because SMP systems do NOT have the capability to pass messages and/or memory calls (reads/writes/requests) without some kind of a transport layer (for example MPI).

Furthermore, the old TPC-C that you mention, they do NOT process as one monolithic sequential series of events in parallel (so think of like how PATA works...), but rather more like a JBOD SATA (i.e. the processing of the next transaction does NOT depend on ALL of the current block of transactions to be completed, UNLESS there is an inherent dependency issue, which I don't think would be very common in TPC-C). Like bank accounts, they're all treated as discrete and separate, independent entities, which means you can send all 150,000 accounts against the 32-socket or 64-socket system and it'll just pick up the next account when the current one is done, regardless.

The other failure in your statement or assumption is that's why there's something called HA - high avialability. Which means that they can dynamically hotswap an entire node if there's a CPU failure, so that the node can be downed and yanked out for service/repair while another one is hotswapped in. So it will failover to a spare hotswap node, work on it, and then either fall over back to the newly replaced node or it would rotate the new one into the hotswap failover pool. (There are MANY different ways of doing that and MANY different topologies).

The statement you made about having 32TB of RAM is again, partially true. But NONE of the single OS instances EVER have full control of all 32TB at once, which again, by DEFINITION, means that it is NOT truly SMP. (Course, if you ever get a screenshot which shows that, I'd LOVE to see it. I'd LOVE to get corrected on that.)

"In contrast to this, every server larger than 32/64 sockets, is a cluster."
Again, not entirely true. You can actually get 4 socket systems that are comprised of two dual-socket nodes and THAT is enough to meet the requirements of a cluster. Heck, if you pair up two single-socket consumer-grade systems, that TOO is a cluster. That's kinda how Beowulf clusters got started - cuz it was an inexpensive way (compare to the aforementioned RISC UNIX based systems) to gain computing power without having to spend a lot of money.

'These huge clusters are dirt cheap"
Sure...if you consider IBM's $100 million contract award "cheap".

"Clusters are only used for HPC number crunching. SMP servers (one single huge server, extremely expensive because it is very difficult to scale beyond 16 sockets) are used for ERP business systems. An HPC cluster can not run business systems, as SGI explains in this link:"
So there's the two problems with this - 1) it's SGI - so of course they're going to promote what they ARE capable of vs. what they don't WANT to be capable of. 2) Given the SGI-biased statements, this, again, isn't EXACTLY ENTIRELY true either.

HPCs CAN run ERP systems.

"HPC vendors are increasingly targeting commercial markets, whereas commercial vendors, such as Oracle, SAP and SAS, are seeing HPC requirements." (Source: http://www.information-age.com/it-management/strat...

But that also depends on the specific implementation of the ERP system given that SAP is NOT the ONLY ERP system that's available out there, but it's probably one of the most popular one, if not THE most popular one. (There's a whole thing about distributed relational databases so that the database can reside in smaller chunks across multiple nodes, in-memory, which are then accessed via a high speed interconnect like Myrinet or Infiniband or something along those lines.)

Furthermore, the fact that ERP runs across large mainframes (it grows as the needs grows), is an indications of HPC's place in ERP. Alternatively, perhaps rather than using it for the backend, HPC can be used on the front end by supporting many, many, many virtualized front-end clients.

Like I said, most of the numbers that you wrote are true, but the assumptions behind them isn't exactly all entirely true.

See also: http://csserver.evansville.edu/~mr56/Publications/...
Kevin G - Wednesday, March 19, 2014 - link
"That means that it is a monolithic system, again, of which, few are TRULY such systems. If you've ever ACTUALLY witnessed the startup/bootup sequence of an ACTUAL IBM mainframe, the rest of the "nodes" are actually booted up typically by PXE or something very similiar to that, and then the "node" is ennumerated into the resource pool. But, for all other intents and purposes, they are semi-independent, standalone systems, because SMP systems do NOT have the capability to pass messages and/or memory calls (reads/writes/requests) without some kind of a transport layer (for example MPI)."

Not exactly. IBM's recent boxes don't boot themselves. Each box has a service processor that initializes the main CPU's and determines if there are any additional boxes connected via external GX links. If it finds external boxes, some negotiation is done to join them into one large coherent system before an attempt to load an OS is made. This is all done in hardware/firmware. Adding/removing these boxes can be done but there are rules to follow to prevent data loss.

It'll be interesting to see what IBM does with their next generation of hardware as the GX bux is disappearing.

"The statement you made about having 32TB of RAM is again, partially true. But NONE of the single OS instances EVER have full control of all 32TB at once, which again, by DEFINITION, means that it is NOT truly SMP. (Course, if you ever get a screenshot which shows that, I'd LOVE to see it. I'd LOVE to get corrected on that.)"

Actually on some of these larger systems, a single OS can see the entire memory pool and span across all sockets. The SGI UV2000 and SPARC M6 are fully cache coherent across a global memory address space.

As for a screenshot, I didn't find one. I did find a video going over some of the UV 2000 features displaying all of this though. It is only a 64 socket, 512 core, 1024 thread, 2 TB of RAM configuration running a single instance of Linux. :)
https://www.youtube.com/watch?v=YUmBu6A2ykY

IBM's topology is weird in that while a global memory address space is shared across nodes, it is not cache coherent. IBM's POWER7 and their recent BlueGene systems can be configured like this. I wouldn't call these setups clusters as there is no software overhead to read/write to remote memory addresses but it isn't fully SMP either due to multiple coherency domains.
silverblue - Monday, March 17, 2014 - link
The A10-7850K is a 2M/4T CPU.
Ian Cutress - Monday, March 17, 2014 - link
Thanks for the correction, small brain fart on my part when generating the graphs.

Intel Xeon E5-2697 v2 and Xeon E5-2687W v2 Review: 12 and 8 Cores

Post Your Comment

71 Comments

View All Comments

vLsL2VnDmWjoTByaVLxb - Monday, March 17, 2014 - link

Brutalizer - Monday, March 17, 2014 - link

Kevin G - Tuesday, March 18, 2014 - link

Brutalizer - Tuesday, March 18, 2014 - link

Kevin G - Tuesday, March 18, 2014 - link

psyq321 - Tuesday, March 18, 2014 - link

alpha754293 - Wednesday, March 19, 2014 - link

Kevin G - Wednesday, March 19, 2014 - link

silverblue - Monday, March 17, 2014 - link

Ian Cutress - Monday, March 17, 2014 - link

Log in

Don't have an account? Sign up now