Intel Xeon E5-2687W v3 and E5-2650 v3 Review: Haswell-EP with 10 Cores
by Ian Cutress on October 13, 2014 10:00 AM EST- Posted in
- CPUs
- IT Computing
- Intel
- Xeon
- Enterprise
- Enterprise CPUs
CPU Benchmarks
The dynamics of CPU Turbo modes, both Intel and AMD, can cause concern during environments with a variable threaded workload. There is also an added issue of the motherboard remaining consistent, depending on how the motherboard manufacturer wants to add in their own boosting technologies over the ones that Intel would prefer they used. In order to remain consistent, we implement an OS-level unique high performance mode on all the CPUs we test which should override any motherboard manufacturer performance mode.
HandBrake v0.9.9: link
For HandBrake, we take two videos (a 2h20 640x266 DVD rip and a 10min double UHD 3840x4320 animation short) and convert them to x264 format in an MP4 container. Results are given in terms of the frames per second processed, and HandBrake uses as many threads as possible.
Low quality conversion loves faster individual cores, hence the W processor wins out due to its higher full-load frequency. Nonetheless, the fast consumer grade processors win here by a large margin.
In full double-4K mode, the balance of cores, frequency and architecture upgrade puts the E5-2687W v3 above the 12-core E5-2697 v2.
Agisoft Photoscan – 2D to 3D Image Manipulation: link
Agisoft Photoscan creates 3D models from 2D images, a process which is very computationally expensive. The algorithm is split into four distinct phases, and different phases of the model reconstruction require either fast memory, fast IPC, more cores, or even OpenCL compute devices to hand. Agisoft supplied us with a special version of the software to script the process, where we take 50 images of a stately home and convert it into a medium quality model. This benchmark typically takes around 15-20 minutes on a high end PC on the CPU alone, with GPUs reducing the time.
Dolphin Benchmark: link
Many emulators are often bound by single thread CPU performance, and general reports tended to suggest that Haswell provided a significant boost to emulator performance. This benchmark runs a Wii program that raytraces a complex 3D scene inside the Dolphin Wii emulator. Performance on this benchmark is a good proxy of the speed of Dolphin CPU emulation, which is an intensive single core task using most aspects of a CPU. Results are given in minutes, where the Wii itself scores 17.53 minutes.
A single emulation instance benefits from a fast single core.
WinRAR 5.0.1: link
WinRAR seems to enjoy Haswell-EP over Ivy-EP, although it stills needs a high frequency to achieve top speeds.
PCMark8 v2 OpenCL
A new addition to our CPU testing suite is PCMark8 v2, where we test the Work 2.0 suite in OpenCL mode.
Hybrid x265
Hybrid is a new benchmark, where we take a 4K 1500 frame video and convert it into an x265 format without audio. Results are given in frames per second.
Hybrid also takes advantage of the new architecture, giving a 5% advantage to the E5-2687W v3 despite two fewer cores.
Cinebench R15
3D Particle Movement
3DPM is a self-penned benchmark, taking basic 3D movement algorithms used in Brownian Motion simulations and testing them for speed. High floating point performance, MHz and IPC wins in the single thread version, whereas the multithread version has to handle the threads and loves more cores.
FastStone Image Viewer 4.9
FastStone is the program I use to perform quick or bulk actions on images, such as resizing, adjusting for color and cropping. In our test we take a series of 170 images in various sizes and formats and convert them all into 640x480 .gif files, maintaining the aspect ratio. FastStone does not use multithreading for this test, and results are given in seconds.
Web Benchmarks
General usability is a big factor of experience, especially as we move into the HTML5 era of web browsing. For our web benchmarks, we take four well known tests with Chrome 35 as a consistent browser.
Sunspider 1.0.2
Mozilla Kraken 1.1
WebXPRT
Google Octane v2
27 Comments
View All Comments
JarredWalton - Monday, October 13, 2014 - link
For ten cores I wouldn't expect a huge bump over the "minimum guaranteed" speed. It's one thing to boost a few cores by a large amount, but the whole problem with multi-core designs is that if you load up all the cores then either you have massive power consumption or you need to curtail the clocks. Honestly, running ten cores at 100% and still hitting 3.1GHz is impressive in my book -- and it still consumes up to 160W.Carl Bicknell - Monday, October 13, 2014 - link
I got my numbers a bit wrong: the 2687W is 3.1 GHz default and 3.2 GHz all cores on turbo, according to wikipedia.That's disappointing.
Apart from anything else, they've managed to get their best 12 (yes twelve!) core CPU (E5-2690 v3) to operate at 3.1 GHz turbo all cores in a 135 W design.
With two fewer cores and an extra 25 watts I'd hope for more than a mere 100 MHz performance.
NovoRei - Monday, October 13, 2014 - link
Ian, could you comment on performance with pure AVX2 and mixed AVX instructions and where the W version stands?Thanks.
Laststop311 - Monday, October 13, 2014 - link
4100 for an 18 core ill take 2ruthan - Tuesday, October 14, 2014 - link
I would like to see, benchmarks some of those low power - 6/12 or 12/24 - 55W a 65W models.pokazene_maslo - Tuesday, October 14, 2014 - link
Is it possible to override turbo boost to force all cores to run at maximum turbo freqency? (E5-2687W-v3 running all cores at 3.5GHz)alpha754293 - Tuesday, October 14, 2014 - link
Well, the thing with these "big" multicore systems is no different than testing large SMP system. You have to use programs for applications that where it make sense to use it. For engineering analyses and simulations, even HOW a problem is divided up (from a single, much larger problem) can have an impact on not only the speed for the analysis/simulation, but also the accuracy of the simulation, and you have to have a pretty sound understanding of the math and physics involved in order to make the best determination.And for some applications, there is such a thing and you CAN have TOO many cores (where you've divided up a problem so much that it's now so small that it can't fully load a core up anymore, and that the process of dividing and re-assembling the results takes an extremely large amount of time.) (You can run into that with some of the FEA analysis).
I was working with Johan and studying a while slew of parameters using LS-DYNA to study how the various ways of decomposing a problem can have an impact on the crash test simulation results, and how swap performance means EVERYTHING when it comes to mechanical engineering simluations.
mapesdhs - Thursday, October 16, 2014 - link
Oddly enough this can be the case with animation rendering aswell. I know a movie studio
which uses a system that can exclude cores from a render pipeline so there is more RAM
and cache bandwidth available with a fewer number of cores. This can matter because
sometimes complex film renders can use huge amounts of data. Someone at SPI told me
one frame of a big movie can involve 500GB of data.
Interesting how the same issue can crop up in such widely different fields.
Ian.
RAMdiskSeeker - Tuesday, October 14, 2014 - link
Could you please test these motherboards for supporting ECC unbuffered DIMMs, reporting that ECC is active, and overclocking potential with ECC DIMMs? It would be good to know whether Xeon chips on non-server motherboards can use ECC.nutral - Tuesday, October 14, 2014 - link
What still is strange to me is that there is still no workstation cpu focused on a workstation with single threaded software. Wouldn't an i7 cpu still be much faster than this workstation cpu?