Intel’s roadmap goes through all the power and market segments, from ultra-low-power, smartphones, tablets, notebooks, desktops, mainstream desktops, enthusiast desktops and enterprise. Enterprise differs from the rest of the market, requiring absolute stability, uptime and support should anything go wrong. High-end enterprise CPUs are therefore expensive, and because buyers are willing to pay top dollar for the best, Intel can push core counts, frequency and thus price much higher than in the consumer space. Today we look at two CPUs from this segment – the twelve core Xeon E5-2697 v2 and the eight core Xeon E5-2687W v2.

Firstly I would like to say a big thank you to GIGABYTE Server for the opportunity to test these CPUs in their motherboard, the GA-6PXSV3. This motherboard is the focus of a review at a later date.

Intel’s Enterprise Line

High-end enthusiasts always want hardware to be faster, more powerful, and contain more cores than what is available on the market today. The problem here is two-fold: cost and volume. Were Intel to produce a product for the consumer market at more than $1000, a large part of the market would complain that the ultra-high-end is too expensive. The other issue is volume – it can be hard to gauge just how many CPUs would be sold. For example, the consumer level i7-4930K was the preferred choice for many enthusiasts as it was several hundred dollars cheaper than the i7-4960X despite being a fraction slower at stock frequencies. The ultra-high-end enthusiast also wants all the bells and whistles, such as overclockability, a good range of DRAM speed support and top quality construction materials.

At some point, Intel has to draw the line. The enterprise line of CPUs is different to the consumer in more ways than we might imagine. Due to the requirements of stability, overclocking is knocked on the head for all modern Intel Xeon CPUs. For clarification, the Westmere-EX CPU line (Xeon X5760 et al., socket 1366) was the last line of overclockable Xeons. The Xeon line of CPUs must also support enterprise level memory – UDIMMs and RDIMMs, ECC and non-ECC. This leads up to quad-rank DRAM support, such as 32GB modules that themselves can cost more than a CPU.

Some enterprise CPUs are also designed to speak to other CPUs in multiprocessor systems. On the Intel side, this means a point-to-point QPI link between each CPU in the system. Johan and I have recently tested several multiprocessor systems [1,2,3,4] and as such these features develop over time, cost R&D, and are focused purely on the enterprise sector.

Virtualization is also another feature Intel limits to certain CPUs, although both some consumer and some enterprise Xeons have them. The defining counterpart tends to be overclockability – if a consumer CPU is listed as overclockable, it does not have VT-d extensions for directed I/O. For users that want ECC memory and virtualization at a lower cost, the enterprise product stack often offers lower core/lower frequency parts at lower price points.

While not necessarily verifiable, there have been reports that Xeon processors are actually the better quality samples that come from the fabs. These are CPUs that have better frequency-to-voltage characteristics and have better chance of running cooler. The main reason this report exists is that when Xeons were overclockable back in Westmere, they were more likely to overclock further than the consumer versions. Also it would make sense from Intel’s point of view – the enterprise customer is paying more for their hardware, and as such a better product in terms of energy consumption or thermals would keep those customers happy.

The Xeon Product Line

Intel splits the naming of its Xeons up according to feature set and architecture. For single processor systems using the LGA1150 socket, we get the E3 line of Xeons which at this present time are based on the Haswell architecture and all come under the E3-12xx v3 line:

Intel E3 v3 SKUs
Xeon E3 v3 Cores TDP (W) IGP Base Clock Turbo Clock L3 Cache Price
E3-1220L v3 2/4 13 N/A 1100 1500 4 MB $193
E3-1220 v3 4/4 80 N/A 3100 3500 8 MB $193
E3-1225 v3 4/4 84 P4600 3200 3600 8 MB $213
E3-1230 v3 4/8 80 N/A 3300 3700 8 MB $240
E3-1240 v3 4/8 80 N/A 3400 3800 8 MB $262
E3-1245 v3 4/8 84 P4600 3400 3800 8 MB $276
E3-1270 v3 4/8 80 N/A 3500 3900 8 MB $328
E3-1275 v3 4/8 84 P4600 3500 3900 8 MB $339
E3-1280 v3 4/8 82 N/A 3600 4000 8 MB $612
E3-1285 v3 4/8 84 P4700 3600 4000 8 MB $662
E3-1265L v3 4/8 45 HD (Haswell) 2500 3700 8 MB $294
E3-1284L v3 4/8 47 Iris Pro 5200 1800 3200 6 MB N/A
E3-1285L v3 4/8 65 P4700 3100 3900 8 MB $774
E3-1230L v3 4/8 25 N/A 1800 2800 8 MB $250

With Intel’s enthusiast socket, LGA2011, the processors are now split according to their multi-processor capability. Due to the skip-tock cadence of architecture improvements at this level the enthusiast consumer and Xeon line are both one architecture behind the mainstream LGA1150 CPU line.  This results in all the LGA2011 Xeons being based on Ivy Bridge-E.

Single processor LGA2011 Xeons are under the title of E5-16xx v2. Dual processor system capable Xeons are E5-26xx v2, and quad processor system capable Xeons are E5-46xx v2.  As Johan pointed out in his excellent dive into the improvements over the older architecture, these CPUs come from three die flavors:

The three dies are aimed at workstations/enthusiasts, servers and high performance computing respectively. I’m not going to repeat what Johan already posted, but it is a really good read if you have a chance to look through it.

The final batch of processors are in the high performance category, using the LGA2011-1 socket. These have been recently released as the E7 v2 line (again I will point a link to Johan’s deep dive on the specifics) under the Ivy Bridge-EX moniker. We have E7-28xx v2 for 2P, E7-48xx v2 for 4P and E7-88xx v2 for 8P systems. Cores for these CPUs go all the way up to 15 due to the three banks of five used in the die.

Turbo Modes

As with the consumer line, the base clock speed of an enterprise CPU is usually not the be-all and end-all of performance. Intel’s Turbo Boost lets the CPU speed up when fewer cores are in use, exercising the difference in power consumption of one core, two core or all-core computation. There is no hard and fast rule when it comes to the turbo modes – Intel will quote the top turbo bin in its CPU database ark.intel.com but in order to find out the scale of multi-core (but not all-core) operation, one has to look into the specification pdfs, such as this one.

With over 50 different CPUs mentioned in that document, it is hard to see which CPUs are going to offer more than others. We extracted the data:

Intel E5 SKU Comparison
Xeon E5 Cores TDP (W) Base Clock Turbo Bins L3 Cache L3 Cache / Core Price
E5-46xx
E5-4657L v2 12/24 115 2400 5/4/3/3/3/3/3/3/3/3/3/3 30 MB 2.500 $4,394
E5-4650 v2 10/20 95 2400 5/4/3/3/3/3/3/3/3/3 25 MB 2.500 $3,616
E5-4640 v2 10/20 95 2200 5/4/3/3/3/3/3/3/3/3 20 MB 2.000 $2,725
E5-4624L v2 10/20 70 1900 6/6/5/5/4/4/3/3/2/2 25 MB 2.500 $2,405
E5-4627 v2 8/8 130 3300 3/2/2/2/2/2/2/2 16 MB 2.000 $2,180
E5-4620 v2 8/16 95 2600 4/3/2/2/2/2/2/2 20 MB 2.500 $1,611
E5-4610 v2 8/16 95 2300 4/3/2/2/2/2/2/2 16 MB 2.000 $1,219
E5-4607 v2 6/12 95 2600 0/0/0/0/0/0 15 MB 2.500 $885
E5-4603 v2 4/8 95 2200 0/0/0/0 10 MB 2.500 $551
E5-x6xx
E5-2697 v2 12/24 130 2700 8/7/6/5/4/3/3/3/3/3/3/3 30 MB 2.500 $2,614
E5-2695 v2 12/24 115 2400 8/7/6/5/4/4/4/4/4/4/4/4 30 MB 2.500 $2,336
E5-2687W v2 8/16 150 3400 6/5/4/3/2/2/2/2 25 MB 3.125 $2,108
E5-2667 v2 8/16 130 3300 7/6/5/4/3/3/3/3 25 MB 3.125 $2,057
E5-2690 v2 10/20 130 3000 6/5/4/3/3/3/3/3/3/3 25 MB 2.500 $2,057
E5-2658 v2 10/20 95 2400 6/6/5/5/4/4/3/3/2/2 25 MB 2.500 $1,750
E5-1680 v2 8/16 130 3000 9/8/7/5/4/4/4/4 25 MB 3.125 $1,723
E5-2680 v2 10/20 115 2800 8/7/6/5/4/3/3/3/3/3 25 MB 2.500 $1,723
E5-2643 v2 6/12 130 3500 3/2/1/1/1/1 25 MB 4.167 $1,552
E5-2670 v2 10/20 115 2500 8/7/6/5/4/4/4/4/4/4 25 MB 2.500 $1,552
E5-2648L v2 10/20 70 1900 6/6/5/5/4/4/3/3/2/2 25 MB 2.500 $1,479
E5-2660 v2 10/20 95 2200 8/7/6/5/4/4/4/4/4/4 25 MB 2.500 $1,389
E5-2650L v2 10/20 70 1700 4/3/2/2/2/2/2/2/2/2 25 MB 2.500 $1,219
E5-2628L v2 8/16 70 1900 5/5/4/4/3/3/2/2 20 MB 2.500 $1,216
E5-2650 v2 8/16 95 2600 8/7/6/5/5/5/5/5 20 MB 2.500 $1,166
E5-1660 v2 6/12 130 3700 3/2/1/1/1/1 15 MB 2.500 $1,080
E5-2637 v2 4/8 130 3500 3/2/1/1 15 MB 3.750 $996
E5-2640 v2 8/16 95 2000 5/4/3/3/3/3/3/3 20 MB 2.500 $885
E5-2618L v2 6/12 50 2000 0/0/0/0/0/0 15 MB 2.500 $632
E5-2630 v2 6/12 80 2600 5/4/3/3/3/3 15 MB 2.500 $612
E5-2630L v2 6/12 60 2400 4/3/2/2/2/2 15 MB 2.500 $612
E5-1650 v2 6/12 130 3500 4/2/2/2/1/1 12 MB 2.000 $583
E5-2620 v2 6/12 80 2100 5/4/3/3/3/3 15 MB 2.500 $406
E5-1620 v2 4/8 130 3700 2/0/0/0 10 MB 2.500 $294
E5-2609 v2 4/4 80 2500 0/0/0/0 10 MB 2.500 $294
E5-1607 v2 4/4 130 3000 0/0/0/0 10 MB 2.500 $244
E5-2603 v2 4/4 80 1800 0/0/0/0 10 MB 2.500 $202
E5-x4xx
E5-2470 v2 10/20 95 2400 8/7/6/5/4/4/4/4/4/4 25 MB 2.500 $1,440
E5-2448L v2 10/20 70 1800 6/6/5/5/4/4/3/3/2/2 25 MB 2.500 $1,424
E5-2450L v2 10/20 60 1700 4/3/2/2/2/2/2/2/2/2 25 MB 2.500 $1,219
E5-2450 v2 8/16 95 2500 8/7/6/5/4/4/4/4 20 MB 2.500 $1,107
E5-2428L v2 8/16 60 1800 5/5/4/4/3/3/2/2 20 MB 2.500 $1,013
E5-2440 v2 8/16 95 1900 5/4/3/3/3/3/3/3 20 MB 2.500 $832
E5-2430L v2 6/12 60 2400 4/3/2/2/2/2 15 MB 2.500 $612
E5-2418L v2 6/12 50 2000 0/0/0/0 15 MB 2.500 $607
E5-1428L v2 6/12 60 2200 5/4/3/2/2/2 15 MB 2.500 $474
E5-2420 v2 6/12 80 2200 5/4/3/3/3/3 15 MB 2.500 $406
E5-2407 v2 4/4 80 2400 0/0/0/0 10 MB 2.500 $250
E5-2403 v2 4/4 80 1800 0/0/0/0 10 MB 2.500 $192
Pentium 1405 v2 2/2 40 1400 0/0 6 MB 3.000 $156
E5-1410 v2 4/8 80 2800 4/4/3/3 10 MB 2.500 N/A
Pentium 1403 v2 2/2 80 2600 0/0 6 MB 3.000 N/A

But even this is hard to parse. Some CPUs start off at 3.0 GHz base frequency and have a 900 MHz turbo bin, whereas others move no more than 300 MHz from their base clock. A few CPUs are worthy of attention from our analysis:

The E5-2643 v2 has the most L3 Cache per core of any CPU, at 4.16 MB/core. This is a 10c die offering all 25 MB of L3 cache, but only six cores are active. Reasons for this include database applications that need a large amount of L3 cache per core. For licensing agreements that hinge on per-core pricing, having a larger amount of L3 per core could help save some money by needing fewer cores.

The E5-2667 v2 is a better chip than the E5-2687W v2. The latter gets attention due to its 150W TDP, high base clock and having a ‘W’ in the name. This is partly why I requested it for this review. But the E5-2667 v2 sounds better – a lower TDP (130W vs. 150W), and when you apply all the turbo bins into operation, both CPUs have the same frequency vs. core loading. Both CPUs have a maximum turbo bin of 4.0 GHz, moving down identically to an all-core loading of 3.6 GHz. The E5-2667 v2 is also a cheaper option, and according to the specification sheets can use 768 GB of memory per core, compared to the E5-2687W v2 which can only manage 256 GB.

Low power CPU additions keep their turbo speeds higher for longer. If we look at the turbo bin for a mid-range low power CPU, such as the E5-2628L v2, it goes in pairs: 5/5/4/4/3/3/2/2. The non-low-power processors often end up having a high turbo bin which decreases quickly, such as the E5-2680 v2, which goes 8/7/6/5/4/3/3/3/3.

Mac Pro Xeon Options, Test Setup, Power Consumption
POST A COMMENT

71 Comments

View All Comments

  • XZerg - Monday, March 17, 2014 - link

    this bench also shows that the haswell had almost no CPU related performance benefits over IVB (if not slowed down performance) looking at 3770k vs 4770k and that haswell ups the gpu performance only.

    i really question intel's skuing of haswell...
    Reply
  • Nintendo Maniac 64 - Monday, March 17, 2014 - link

    Emulation? Reply
  • BMNify - Monday, March 17, 2014 - link

    its a shame they didn't do a UHD x264 encode here as that would have shown a haswell AVX2 improvement (something like 90% over AVX), and why people will have to wait for the xeons to catch up to at least AVX2 if not AVX3.1 Reply
  • psyq321 - Wednesday, March 19, 2014 - link

    There is no "90% speedup over AVX" between HSW and IVB architectures.

    AVX (v1) is floating point only and thus was useless for x264. For floating point workloads you would be very lucky to get 10% improvement by jumping to AVX2. The only difference between AVX and AVX2 for floating point is the FMA instruction and gather, but gather is done in microcode for Haswell, so it is not actually much faster than manually gathering data.

    Now, x264 AVX2 is a big improvement because it is an integer workload, and with AVX (v1) you could not do that. So x264 is jumping from SSE4.x to AVX2, which is a huge jump and it allows much more efficient processing.

    For integer workloads that can be optimized so that you load and process eight 32-bit values at once, AVX2 Xeon EPs/EXs will be a big thing. Unfortunately, this is not so easy to do for a general-purpose algorithms. x264 team did the great job, but I doubt you will be using 14 core single Haswell EP (or 28 core dual CPU) for H.264 transcoding. This job can be done probably much more efficient with dedicated accelerators.

    As for the scientific applications, they already benefit from AVX v1 for floating point workloads. AVX2 in Haswell is just a stop-gap as the gather is microcoded, but getting code ready for hardware gather in the future uArch is definitely a good way to go.

    Finally, when Skylake arrives with AVX 3.1, this will be the next big jump after AVX (v1) for scientific / floating point use cases.
    Reply
  • Kevin G - Monday, March 17, 2014 - link

    Shouldn't both the Xeon E5-2687W v2 support 384 GB of memory? 4 channels * 3 slots per channel * 32 GB DIMM per slot? (Presumably it could be twice that using eight rank 64 GB DIMMs but I'm not sure if Intel has validated them on the 6 and 10 core dies.) Registered memory has to be used for the E6-2687w v2 to get to 256 GB, just is the chip not capable of running a third slots per channel? Seems like a weird handicap. I can only imagine this being more of a design guideline rule than anything explicit. The 150W CPU's are workstation focused which tend to only have 8 slots maximum.

    Also a bit weird is the inclusion of the E5-2400 series on the first page's table. While they use the same die, they use a different socket (LGA 1356) with triple memory support and only 24 PCI-e lanes. With the smaller physical area and generally lower TDP's, they're aimed squarely the blade server market. Socket LGA 2011 is far more popular in the workstation and 1U and up servers.
    Reply
  • jchernia - Monday, March 17, 2014 - link

    A 12 core chip is a server chip - the workstation/PC benchmarks are interesting, but the really interesting benchmarks would be on the server side. Reply
  • Ian Cutress - Monday, March 17, 2014 - link

    Johan covered the server side in his article - I link to it many times in the review:
    http://www.anandtech.com/show/7285/intel-xeon-e5-2...
    Reply
  • BMNify - Monday, March 17, 2014 - link

    a mass of other's might argue a 12 core/24 thread chip or better is a potential "real-time" UHD x264 encoding machine , its just out of most encoders budgets, so NO SALE.... Reply
  • Nintendo Maniac 64 - Monday, March 17, 2014 - link

    Uh, where's the test set up for the 7850K? Reply
  • Nintendo Maniac 64 - Monday, March 17, 2014 - link

    Also I believe I found a typo:

    "Haswell provided a significant post to emulator performance"

    Shouldn't this say 'boost' rather than 'post'?
    Reply

Log in

Don't have an account? Sign up now