The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads

Name: The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads
Item: The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads
Author: Johan De Gelas

by Johan De Gelas on March 31, 2016 12:30 PM EST

112 Comments | Add A Comment

112 Comments

TSX

TSX or Transactional Synchronization Extensions is Intel's cache-based transactional memory system. Intel launched TSX with Haswell, but a bug threw a spanner in the works. Broadwell in turn got it right. The chicken is finally there, now it's time to enjoy the eggs.

Faster Virtualization

Virtualization overhead is (for most people) a thing of the past. The performance overhead with bare metal hypervisors (ESXi, Hyper-V, Xen, KVM..) is less than a few percent. There is one exception however: applications where I/O dominates. And of course, the packet switching telco applications are the prime examples. Intel, VMware and the server vendors really want to convert the telcos from their Firewall/Router/VPN "black boxes" to virtual ones using Software Defined Networking (SDN) infrastructure. To that end, Intel has continued to work on reducing the virtualization performance overhead. Virtualization overhead can be described as the number of VM exits (VM stops and hypervisor takes over) times the VM exit latency. In IO intensive application, VM exits happen frequently, which in turn leads to hard to predict and high IO latency, exactly what the telco people hate.

Intel wants to conquer the telco's datacenter by turning it into a SDN

So Intel worked on both factors. So Broadwell-DP VM exit latency is once again reduced from 500 cycles to 400.

It seems that the "ticks" also get a VM exit reduction. This slide of the Ivy Bride EP presentation gives you a very good overview of the VM exits in a network intensive application; in this case a networkd bandwidth benchmark application.

I quote from our Ivy Bridge-EP review:

The Ivy Bridge core is now capable of eliminating the VMexits due to "internal" interrupts, interrupts that originate from within the guest OS (for example inter-vCPU interrupts and timers). The virtual processor will then need to access the APIC registers, which will require a VMexit. Apparently, the current Virtual Machine Monitors do not handle this very well, as they need somewhere between 2000 to 7000 cycles per exit, which is high compared to other exits.

The solution is the Advanced Programmable Interrupt Controller virtualization (APICv). The new Xeon has microcode that can be read by the Guest OS without any VMexit, though writing still causes an exit. Some tests inside the Intel labs show up to 10% better performance.

In summary, Intel eliminated the green and dark blue components of the VM exit overhead with APICv. Broadwell now takes on the VM exits due to the external interrupts.

The technology on Broadwell-EP to do this is called posted interrupt. Essentially, posted interrupts enables direct interrupt delivery to the virtual machine without incurring a VM exit, courtesy of an interrupt remapping table. It is very similar to VT-D, which allowed DMA remapping thanks to the physical to virtual memory mapping table. Telco applications - among others - are very latency sensitive. Intel's Edwin Verplancke gave us one such example: before posted interrupts, a telco application had a latency varying from 4 to 47 (!) µsec, depending on the load. Posted interrupts made this a lot less variable, and latency varied from 2.4 to 5.2 µsecs.

As far as we are aware, KVM and Xen seem to have already implemented support for posted interrupts.

Sharing Cache and Memory Resources Xeon E5 v4 SKUs and Pricing

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

112 Comments

View All Comments

Kevin G - Thursday, March 31, 2016 - link
Much like how Apple skipped Haswell-EP, they also skipped a generation of cards from AMD and nVidia. So even if Apple doesn't wait for new GPUs, their is certainly an update on the GPU side.

The more interesting possibility would be if Apple were to go with Xeon D in the Mac Pro instead of Broadwell-EP. Apple would need a big PLX chip considering the number of lanes they's want to use but it is possible.
bill.rookard - Thursday, March 31, 2016 - link
Another issue is that they're not under any pressure from any competition to really innovate. I don't even remember the last time I read anything about Opteron servers... let alone something about any NEW Opterons.
ComputerGuy2006 - Thursday, March 31, 2016 - link
A sign of things to come for Broadwell-e?

Seems like a tricky situation. Because skylake-e will come with a new platform in 2017, while broadwell-e isn't the fastest IPC and there are crazy rumors it will might cost $1500 (lol Intel). We also have Zen later this year that might give good performance with good cost/perf ratio.
extide - Thursday, March 31, 2016 - link
Yeah so Intel only gives us the LCC part for the -E platform, so we will see the 10-core SKU as the top, It will either be $1000, or $1500 ... so yeah not sure how that will end up. Although there will be 8 and 6 core options that should be pretty affordable.

Hopefully they do an 8 core part with 28 lanes for under $500, as THAT would be a great deal!
dragonsqrrl - Sunday, April 3, 2016 - link
I'm hoping the 8 core SKU is around $600, the position the x930K traditionally occupies. What makes me a little worried is that there will be 4 SKUs instead of 3 this time (one 10 core, one 8 core, and two 6 core), and I'm not sure there's enough room under the $600 price point for two 6 core processors.
jasonelmore - Thursday, March 31, 2016 - link
Can it run Star Citizen?
theduckofdeath - Thursday, March 31, 2016 - link
A question we'll never get an answer to? :D
JohanAnandtech - Friday, April 1, 2016 - link
It probably runs mostly on Xeons. Well, the back end that is :-)
extide - Thursday, March 31, 2016 - link
BOOM, 454mm^2 on the worlds best process. The "other" 14/16nm processes use bigger geometry than Intel's 14nm process.

Now we just need those other guys to catch up so we can see 450+mm GPU's!
Kevin G - Thursday, March 31, 2016 - link
Intel still has plenty of room to increase die size. The largest chip they've produced was the Tukwila Itanium 2 at 699 mm^2. Granted that was a 65 nm design but Haswell-EX is a juggarnaught at 662 mm^2 on Intel's more recent 22 nm process. Seems reasonable that SkyLake-EX could go to 32 cores as Intel has >200 mm^2 of rectal limit left.

As for GPU's, they're also huge. nVidia's GM200 is 601 mm^2 and AMD's Fiji is 'only' 596 mm^2 both on 28 nm process. TSMC's 20 nm process was skipped so even using the looser 16 nm FinFET, GPU's will see a significant shrink compared to the those high end chips.

The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads

TSX

Faster Virtualization

Post Your Comment

112 Comments

View All Comments

Kevin G - Thursday, March 31, 2016 - link

bill.rookard - Thursday, March 31, 2016 - link

ComputerGuy2006 - Thursday, March 31, 2016 - link

extide - Thursday, March 31, 2016 - link

dragonsqrrl - Sunday, April 3, 2016 - link

jasonelmore - Thursday, March 31, 2016 - link

theduckofdeath - Thursday, March 31, 2016 - link

JohanAnandtech - Friday, April 1, 2016 - link

extide - Thursday, March 31, 2016 - link

Kevin G - Thursday, March 31, 2016 - link

Log in

Don't have an account? Sign up now