During September we managed to get hold of some Haswell-EP samples for a quick run through our testing suite. The Xeon E5 v3 range extends beyond that of the E5 v2 with the new architecture, support for DDR4 and more SKUs with more cores. These are generally split into several markets including workstation, server, low power and high performance, with a few SKUs dedicated for communications or off-map SKUs with different levels of support. Today we are testing two 10 core models, the Xeon E5-2687W v3 and the Xeon E5-2650 v3.

Intel Xeon E5 v3: The Information

Our initial Haswell-EP coverage from Johan was super extensive and well worth a read for anyone interested in the Xeon platform. My focus here will be light in comparison, mentioning key points that as an ex-workstation user I find interesting. This will be the first of several reviews on the Xeon processors, which we have split up to focus more on each area.

The core layouts for each of the different levels of processor are from three designs, emulating the single and dual ring bus type arrangements depending on the number of cores in each SKU. As with the Xeon E5 v2 processors, the big block of cache is in the middle of the cores and data is transferred via the ring bus. From the core designs, pairs of cores can be disabled to make lower core count CPUs, and much like the previous generation, some low core / high cache models might be possible.

In the 10-12 core image above we essentially get two classes of cores – one in the big stack to the left and another to the right. The processor is designed to treat all cores equally, although the Cluster on Die snoop mode new to E5 v3 will organize the cache data into what acts like two big sections in a NUMA style-arrangement. This allows data relevant to cores that need it to stay close and hopefully reduce read/write latencies, but is all transparent to the user. Johan goes into more detail on this front in his review.

This column arrangement is also why we do not see the progressive jump in cores we would expect. In the consumer space, we have had 1, 2, 4, 6, 8 cores, and one might expect 12 and 16 on the horizon, but 10, 14 and 18 seem a little off canter, along witht the 15-core design from Ivy Bridge-EP. Using this column design, Intel has to balance the number of cores per ring and the number of cores per column. In the large 18 core design there are 10 cores in the secondary ring and six in a single column – ideally fewer columns would be preferable however more rings allows data to transfer more frequently. It becomes a bit of a balance in terms of design, efficiency, performance and yield at the end of the day, especially when dealing with up to 5.69B transistors in 662 mm2.

CPU Specification Comparison
  CPU Node Cores GPU Transistor Count
(Schematic)
Die Size
Server CPUs
Intel Haswell-EP 14-18C 22nm 14-18 N/A 5.69B 662mm2
Intel Haswell-EP 10C-12C 22nm 6-12 N/A 3.84B 492mm2
Intel Haswell-EP 6C-8C 22nm 4-8 N/A 2.6B 354mm2
Intel Ivy Bridge-EP 12C-15C 22nm 10-15 N/A 4.31B 541mm2
Intel Ivy Bridge-EP 10C 22nm 6-10 N/A 2.89B 341mm2
Consumer CPUs
Intel Haswell-E 8C 22nm 8 N/A 2.6B 356mm2
Intel Haswell GT2 4C 22nm 4 GT2 1.4B 177mm2
Intel Haswell ULT GT3 2C 22nm 2 GT3 1.3B 181mm2
Intel Ivy Bridge-E 6C 22nm 6 N/A 1.86B 257mm2
Intel Ivy Bridge 4C 22nm 4 GT2 1.2B 160mm2
Intel Sandy Bridge-E 6C 32nm 6 N/A 2.27B 435mm2
Intel Sandy Bridge 4C 32nm 4 GT2 995M 216mm2
Intel Lynnfield 4C 45nm 4 N/A 774M 296mm2
AMD Trinity 4C 32nm 4 7660D 1.303B 246mm2
AMD Vishera 8C 32nm 8 N/A 1.2B 315mm2

Intel should be offering certain configurations with more L3 cache, given that in their press materials the one they labelled '10C-12C' will actually be offered as a cut down to six cores for release. These CPUs, whichever way you slice them, are still massive.

Today our review revolves around two of the 10 core options from Intel.

Intel Xeon E5 v3 SKU Comparison
Xeon E5 Cores/
Threads
TDP Clock Speed
(GHz)
Price
High Performance (35-45MB LLC)
2699 v3 18/36 145W 2.3-3.6 $4115
2698 v3 16/32 135W 2.3-3.6 $3226
2697 v3 14/28 145W 2.6-3.6 $2702
2695 v3 14/28 120W 2.3-3.3 $2424
"Advanced" (20-30MB LLC)
2690 v3 12/24 135W 2.6-3.5 $2090
2680 v3 12/24 120W 2.5-3.3 $1745
2660 v3 10/20 105W 2.6-3.3 $1445
2650 v3 10/20 105W 2.3-3.0 $1167
Midrange (15-25MB LLC)
2640 v3 8/16 90W 2.6-3.4 $939
2630 v3 8/16 85W 2.4-3.2 $667
2620 v3 6/12 85W 2.4-3.2 $422
Frequency optimized (10-20MB LLC)
2687W v3 10/20 160W 3.1-3.5 $2141
2667 v3 8/16 135W 3.2-3.6 $2057
2643 v3 6/12 135W 3.4-3.7 $1552
2637 v3 4/8 135W 3.5-3.7 $996
Budget (15MB LLC)
2609 v3 6/6 85W 1.9 $306
2603 v3 6/6 85W 1.6 $213
Power Optimized (20-30MB LLC)
2650L v3 12/24 65W 1.8-2.5 $1329
2630L v3 8/16 55W 1.8-2.9 $612

The E5-2687W v3 is an interesting model of the bunch, particularly due to the importance of the E5-2687W v2 from the previous generation. The v2 version was lauded due to the difference in peak frequencies compared to the higher core count models, but this changes with Haswell-EP.

For Ivy Bridge-EP:

- The 8-core E5-2687W v2 gave 3.6 GHz in full-load, TDP of 150W for $2108,
- The 12 core E5-2697 v2 gave 3.0 GHz in full-load, TDP of 130W for $2614

With Haswell-EP:

- The 10-core E5-2687W v3 gives 3.2 GHz for 160W at $2057,
- The 14-core E5-2697 v3 gives 3.1 GHz for 145W at $2702 or
- The 18-core E5-2699 v3 gives 2.8 GHz for 145W at $4115

If we compare the difference between the E5-2687W and E5-2697, first with v2 and then v3, it makes the new Haswell ‘W for Workstation’ CPU a little less enticing. Previously it was a trade-off between cores and frequency, and depending on the software having a high turbo mode helps with the v2 CPUs.

To make matters worse for the E5-2687W v3, if we compare single thread speeds, the E5-2697 v3 reaches 3.6 GHz compared to the E5-2687W v3 at 3.5 GHz, which puts the W processor at a disadvantage.

It is worth noting that Intel puts these two processors in different parts of the product stack, to technically they should not be 'competing' against each other:

The E5-2687W v3 is firmly for Workstations only, rather than servers, whereas the E5-2697 v3 should end up in 2U servers. 

The other processor in this review, the E5-2650 v3 sits in the ‘Advanced’ section in the SKU stack, giving 2.6 GHz at load or 3.0 GHz for single threaded speed, but lists at only 105W for $1166 tray price.

Using this information and a few SKUs that are off-roadmap, the turbo modes of the 10 core processors are:

All the 10 core processors reach their full-core turbo when five cores are in use, and are on the top turbo frequency when one or two cores are active.

The Chipset

When we reviewed a pair of the E5 v2 processors back in March, the main server based chipsets at the time revolved around the C600 series, codename ‘Patsburg’. For the v3 processors, this moves to the C610 series, also known as Wellsburg. The C612 chipset is the primary server component at this point, offering many of the features we have already seen in our X99 reviews:

- Up to 10 SATA 6 Gbps,
- 6 ports of USB 3.0,
- 8 ports of USB 2.0
- Up to 8 PCIe 2.0, with x1/x2/x4 supported

New features for C610 series include:

 - Reduced TDP, Average Power and Package (now 7W, 25mm x 25mm)
 - Intel SVT
 - USB 3.0 XHCI Debug
 - Support for MCTP Protocol and End Points
 - Support for Management Traffic over DMI
 - SPI Enhancements

Intel vPro, SPS 3.0, RSTe and CAS are also supported.

For the SATA/USB3/PCIe bencwidth combinations, Intel has implemented an extended from of Flex IO. It almost looks much the same at Z87 and Z97, offering 22 rather than 18 differential signal pairs. A certain amount of these pairs are fixed to USB3 / PCIe / SATA but two pairs are muxed:

This slide shows 18 signal pairs, although I mentioned 22. This is because the last four are from a secondary AHCI controller giving four more SATA 6 Gbps ports. Like X99, the downside of these secondary SATA ports is that they are not RAID capable due to limitations within the silicon.

MTCP over PCIe is also an interesting new addition to Wellsburg, allowing cross CPU communication from controllers attached to the other side of the system:

The DRAM

We still have a consumer class DDR4 review in the works, but the upgrade from DDR3 to DDR4 for Haswell-EP is more significant. The decrease in power consumption is often listed is the easiest-to-explain benefit, giving an approximate 2W saving at-the-wall per memory module:

One important aspect of DDR4 will be the higher memory frequency, especially when more DIMMs per channel are installed. It might also come to pass that some server motherboard manufacturers will end up supporting the DDR4-2133 at 3DPC, similar to some efforts made with Patsburg.

In a lot of Intel materials we received, it was worth noting that non-ECC UDIMM support is not often listed with the new Haswell-EP CPUs, but we can confirm that in our testing, all of our CPUs worked with standard consumer grade UDIMMs.

Market Positioning, Test Setup, and Overclocking?
POST A COMMENT

27 Comments

View All Comments

  • personne - Monday, October 13, 2014 - link

    This review should be called Intel Xeon E5-2687W v3 and E5-2650 v3 on Windows Review. I'd think a large number of these servers would be used for other operating systems. Reply
  • Ian Cutress - Monday, October 13, 2014 - link

    I have some Linux benchmarks in the pipeline that I'm testing but aren't ready for prime time yet.
    I'll need to get some CPUs back in my office to test with that though, these Xeons are usually only loaner samples and it gets difficult to retest them.
    Reply
  • personne - Monday, October 13, 2014 - link

    Thanks. I admit it really aggravates me, in 2014, to see screenshots of applications as some sort of qualifier. So I hope you can generate some really useful discrete data for a critical audience. Reply
  • Marthisdil - Monday, October 13, 2014 - link

    I think a large number of these servers will be used in ESX (or other hypervisor) hosts, so these benchmarks don't really mean a ton. Reply
  • Flunk - Tuesday, October 14, 2014 - link

    This review is all workstation loads, so it's not that helpful even if you are using Windows. I think most of the Windows Systems these very pricey Xeons end up in will be servers. IIS, database and active directory performance testing would be more appropriate. Reply
  • elerick - Monday, October 13, 2014 - link

    I do find some value in the benchmarks proved by this review. For a review to include a high end workstation with DDR4 to have gaming benchmarks it proves that game engines do not take advantage of the extra bandwidth. The only factor is CPU architecture @ frequency + graphics cards.

    I would have liked to see how this CPU handles server applications and storage such as ZFS. More and more converged infrastructure is becoming hardware vendor agnostic ESXi 6 has some pretty cool features that make sense with Super Micro hardware taking advantage of the latest CPU
    Reply
  • iwod - Monday, October 13, 2014 - link

    I think next year Xeon will be much more interesting with 14nm. I am hoping to see an increase from 12 to 16, and 18 to 24/32 Core. Along with much cheaper DDR4. Reply
  • Jon Tseng - Monday, October 13, 2014 - link

    Hey Ian any more thoughts on power consumption vs. Ivy Bridge in day-to-day use, not just load.

    To me the obvious advantage of Grantley on paper is bringing all that Haswell power-gating/idle goodness to the server environment. The technology which lets Haswell spin out battery life in a laptop should also deliver energy and cost savings in a DC - which matters given power consumption (this is assuming your DC has decent periods of under-utilization - i.e. not an HPC plant!).

    Curious if any thoughts/data on this... J
    Reply
  • isa - Monday, October 13, 2014 - link

    I feel personally threatened by the "idea-limited" constraint. I resemble that remark. But I compensate with kool LEDs on my PC. Reply
  • Carl Bicknell - Monday, October 13, 2014 - link

    One thing that really needs spelling out is the clock speed under full load on all cores. That's much more informative than giving the default or the range.

    For the 2687W it's 3.2GHz default, and 3.4Ghz with turbo on all cores. That's pretty disappointing Intel.
    Reply

Log in

Don't have an account? Sign up now