The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads

Name: The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads
Item: The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads
Author: Johan De Gelas

by Johan De Gelas on March 31, 2016 12:30 PM EST

112 Comments | Add A Comment

112 Comments

TSX

TSX or Transactional Synchronization Extensions is Intel's cache-based transactional memory system. Intel launched TSX with Haswell, but a bug threw a spanner in the works. Broadwell in turn got it right. The chicken is finally there, now it's time to enjoy the eggs.

Faster Virtualization

Virtualization overhead is (for most people) a thing of the past. The performance overhead with bare metal hypervisors (ESXi, Hyper-V, Xen, KVM..) is less than a few percent. There is one exception however: applications where I/O dominates. And of course, the packet switching telco applications are the prime examples. Intel, VMware and the server vendors really want to convert the telcos from their Firewall/Router/VPN "black boxes" to virtual ones using Software Defined Networking (SDN) infrastructure. To that end, Intel has continued to work on reducing the virtualization performance overhead. Virtualization overhead can be described as the number of VM exits (VM stops and hypervisor takes over) times the VM exit latency. In IO intensive application, VM exits happen frequently, which in turn leads to hard to predict and high IO latency, exactly what the telco people hate.

Intel wants to conquer the telco's datacenter by turning it into a SDN

So Intel worked on both factors. So Broadwell-DP VM exit latency is once again reduced from 500 cycles to 400.

It seems that the "ticks" also get a VM exit reduction. This slide of the Ivy Bride EP presentation gives you a very good overview of the VM exits in a network intensive application; in this case a networkd bandwidth benchmark application.

I quote from our Ivy Bridge-EP review:

The Ivy Bridge core is now capable of eliminating the VMexits due to "internal" interrupts, interrupts that originate from within the guest OS (for example inter-vCPU interrupts and timers). The virtual processor will then need to access the APIC registers, which will require a VMexit. Apparently, the current Virtual Machine Monitors do not handle this very well, as they need somewhere between 2000 to 7000 cycles per exit, which is high compared to other exits.

The solution is the Advanced Programmable Interrupt Controller virtualization (APICv). The new Xeon has microcode that can be read by the Guest OS without any VMexit, though writing still causes an exit. Some tests inside the Intel labs show up to 10% better performance.

In summary, Intel eliminated the green and dark blue components of the VM exit overhead with APICv. Broadwell now takes on the VM exits due to the external interrupts.

The technology on Broadwell-EP to do this is called posted interrupt. Essentially, posted interrupts enables direct interrupt delivery to the virtual machine without incurring a VM exit, courtesy of an interrupt remapping table. It is very similar to VT-D, which allowed DMA remapping thanks to the physical to virtual memory mapping table. Telco applications - among others - are very latency sensitive. Intel's Edwin Verplancke gave us one such example: before posted interrupts, a telco application had a latency varying from 4 to 47 (!) µsec, depending on the load. Posted interrupts made this a lot less variable, and latency varied from 2.4 to 5.2 µsecs.

As far as we are aware, KVM and Xen seem to have already implemented support for posted interrupts.

Sharing Cache and Memory Resources Xeon E5 v4 SKUs and Pricing

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

112 Comments

View All Comments

jhh - Thursday, March 31, 2016 - link
The article says TSX-NI is supported on the E5, but if one looks at Intel ARK, it say it's not. Do the processors say they support TSX-NI? Or is this another one of the things which will be left for the E7?
JohanAnandtech - Friday, April 1, 2016 - link
Intel's official slides say: "supports TSX". All SKUs, no exceptions.
Oxford Guy - Thursday, March 31, 2016 - link
Bigger, badder, still obsolete cores.
patrickjp93 - Friday, April 1, 2016 - link
Obsolete? Troll.
Oxford Guy - Tuesday, April 5, 2016 - link
Unlike you, propagandist, I know what Skylake is.
benzosaurus - Thursday, March 31, 2016 - link
"You can replace a dual Xeon 5680 with one Xeon E5-2699 v4 and almost double your performance while halving the CPU power consumption."

I mean you can, but you can buy 4 X5680s for a quarter the price of a single E5-2699v4. It takes a lot of power savings to make that worthwhile. The pricing in the server market's always seemed weirdly non-linear to me.
warreo - Friday, April 1, 2016 - link
Presumably, it's not just about TCO. Space is at a premium in a datacenter, and so being able to fit more performance per sq ft also warrants a higher price, just like how notebook parts have historically been more expensive than their desktop equivalents.
ShieTar - Friday, April 1, 2016 - link
But you don't get 4 1366-Systems for the price of one 2011-3 System. Depending on your Memory, Storage and Interconnect Needs, even two full Systems based on the Xeon 5680 may cost you more than one system based on the E5-2699 v4. One less Infiniband-Adapter can easily save you 500$ in Hardware.

And you are not only halving the CPU power consumption, but also the power consumption of the rest of the system that you no longer use, so instead of 140W you are saving probably at least 200W per System, which can already add up to more than 1k$ in electricity and cooling bills for a 24/7 machine running for 3 years.

And last, but by no means least, less parts means less space, less chance for failure, less maintenance effort. If you happily waste a few hours here or there to maintain your own workstation, you don't do the math, but if you have to pay somebody to do it, salaries matter quickly. With an MTBF for an entire server rarely being much higher than 40.000, and recovery/repair easily taking you a person-day of work, each system generates about 1.7 hours of work per year. Cost of work (it's more than salaries, of course) probably comes up to 100$ for a skilled technical administrator, thus producing another 500$ over 3 years of added operational cost.

And of course, space matters as well. If your data center is filled, it can be more cost effective to replace the old CPUs with new expensive ones, rather than build a new facility to fill with more old Systems.

If you add it all up, I doubt you can get a System with an Xeon 5680 and operate it over 3 years for anything below 20.000$. So going from two 20.000$-Systems to a single 24.000$ Dollar System (because of an extra 4000$ for the big CPU) should save you a lot of money in the long run.
JohanAnandtech - Friday, April 1, 2016 - link
Where do you get your pricing info from? I can not imagine that server vendors still sell X5680s.
extide - Friday, April 1, 2016 - link
Yeah, if you go used. No enterprise sysadmin worth his salt is ever going to put used gear that is not in warranty, and in support into production.

The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads

TSX

Faster Virtualization

Post Your Comment

112 Comments

View All Comments

jhh - Thursday, March 31, 2016 - link

JohanAnandtech - Friday, April 1, 2016 - link

Oxford Guy - Thursday, March 31, 2016 - link

patrickjp93 - Friday, April 1, 2016 - link

Oxford Guy - Tuesday, April 5, 2016 - link

benzosaurus - Thursday, March 31, 2016 - link

warreo - Friday, April 1, 2016 - link

ShieTar - Friday, April 1, 2016 - link

JohanAnandtech - Friday, April 1, 2016 - link

extide - Friday, April 1, 2016 - link

Log in

Don't have an account? Sign up now