A Modest Tick

As Broadwell is a tick - a die shrink of an existing architecture, rather than a new architecture - so you should expect modest IPC improvements. Most Xeon E5 v4 SKUs have slightly lower clockspeeds compared to their Haswell v3 brethren, so overall the single threaded performance has hardly improved. Clock for clock, Intel tells us that their simulation tools show that Broadwell delivers about 5% better performance per clock in non-AVX2 traces.


First Y-axis + bars: simulated single threaded performance improvement. Blue line + second Y-axis is the cumulative improvement.

In that sense, Broadwell is basically a Haswell made on Intel's 14nm second generation tri-gate transistor process. Intel did make a few subtle improvements to the micro-architecture:

  • Faster divider: lower latency & higher throughput 
  • AVX multiply latency has decreased from 5 to 3 
  • Bigger TLB (1.5k vs 1k entries)
  • Slightly improved branch prediction (as always)
  • Larger scheduler (64 vs 60)

None of these improvements will yield large performance improvements. The larger improvements must come from other features.

New Features

Compared to Haswell-EP, Broadwell-EP also includes some new features. The first one is the improved power control unit. 

On Haswell, one AVX instruction on one core forced all cores on the same socket to slow down their clockspeed by around 2 to 4 speed bins (-200,-400 MHz) for at least 1 ms, as AVX has a higher power requirement that reduces how much a CPU can turbo. On Broadwell, only the cores that run AVX code will be reducing their clockspeed, allowing the other cores to run at higher speeds. 

The other performance feature is the vastly improved PCLMULQDQ (carry-less multiplication) instruction: throughput has been doubled, and latency reduced from 7 cycles to 5.

This increases AES (symmetric) encryption performance by 20-25%, and CRCs (Cyclic Redundancy check) are up to 90% faster. Broadwell also has some new ADCX/ADOX instructions to speed up asymmetric encryption algorithms such as the popular RSA. These improvements are implemented in OpenSSL 1.0.2-beta3. But don't expect too much from it.. The compute intensive asymetric encryption is mostly used to initiate a secure connection. Most modern web applications keep their sessions "alive", and as a result, events that require asymmetric encryption happen a lot less frequentely . Symmetric encryption (like AES) which is used to send encrypted data is a lot lighter, so even on a fully encrypted website with long encrypted data streams, encryption is only a small percentage (<5%) of the total computing load.

Broadwell-EP: The 14nm Xeon E5 Sharing Cache and Memory Resources
Comments Locked

112 Comments

View All Comments

  • patrickjp93 - Friday, April 1, 2016 - link

    Knight's Landing: 730 mm^2, also on the 14nm platform
  • extide - Friday, April 1, 2016 - link

    Is it really that big..? Wow, I knew it was big, but didn't know it was that big. Got a source on that?
  • Kevin G - Friday, April 8, 2016 - link

    I'll second a link for a source. I knew it'd be big but that big?
  • extide - Friday, April 1, 2016 - link

    I know you meant Reticle, but that was a pretty funny typo, heh.
  • Kevin G - Friday, April 8, 2016 - link

    Autocorrect has gotten the best of me yet again.
  • extide - Friday, April 1, 2016 - link

    And, I know how big GM200 and Fiji are, but I am talking about big GPU's on 14/16nm. All signs are currently pointing to <300mm^2 for the first round of 14/16nm GPU's.
  • lorribot - Thursday, March 31, 2016 - link

    Given the way Microsoft and others are now licensing by the core and in large non splitable packages (Windows 2016 Datacenter is in blocks of 16 cores, a dual socket server with 44 cores would need 48 core licences) the increasing core count has limited appeal over small numbers of faster cores when looking at virtualised environments.
    Those still in the physical world will still have to pay per core but may have to buy 4 std Windows licenses.
    when it comes to doing your testing, it should reflect these costs and compare total bang per buck when dealing with performance.
    Red Hat still licences per socket but don't be surprised if they go per core too.
  • JohanAnandtech - Friday, April 1, 2016 - link

    Back in 2008, I had a sales person explaining the license models of Microsoft to me in our lab. From that point on, we have invested most of our time and resources in linux server software. :-D
  • extide - Friday, April 1, 2016 - link

    Enterprise linux isn't free, either ya know
  • rahvin - Friday, April 1, 2016 - link

    Support isn't free on the FOSS side but the software is. Redhat is never going to charge more per "cores" for support, that's ridiculous and would result in rivals stealing their support contracts. If licensing costs are that bad that you are dumping hardware you really should be looking at moving services to Linux and Visualizing the windows servers so you can limit the core count and provide more horsepower.

    Anyone putting Microsoft on bare hardware these days is nuts. Although the consolation is that they get to pay MS's exorbitant tax on software. Linux should be the core component of any IT services and virtualized servers where you need proprietary server software.

Log in

Don't have an account? Sign up now