Haswell Architecture Improvements

We have discussed the advantages that the Haswell core brings here in more detail. In a nutshell:

  • The core can sustain about 10% more integer instructions per clock cycle than its predecessor, Ivy Bridge. 
  • Virtualized applications should perform slightly better thanks to the lower VM exit/entry latency.
  • HPC applications could/should benefit much more if they are recompiled to make use of the improved AVX2 and Fused Multiply Add (FMA) support
  • Database transactional applications should benefit more thanks to the lower synchronization latency.
  • In-memory databases should benefit if they are adapted to make use of the AVX-2 256 bit integer vector operations.  

Again, the same is true about the Xeon E5-2600v3. So what makes the E7 special? 

Transactional Synchronization Extensions: I'll be back 

There is one "new" - or rather "now working" - feature: TSX or the famous Transactional Synchronization eXtensions. These extensions are all about making locking more "optimistic" (you let the CPU handle the bookkeeping to maintain consistency). TSX is quite powerful, but also can be a liability in the wrong use case. Developers will need a deep understanding of the locking and parallel programming to be able to make good use of TSX, as 

  1. ... you still have to rewrite your code (inserting hints)
  2. TSX may reduce performance in some situations: if indeed a pessimistic lock was necessary, the transaction has to be re-executed with a "traditional" conservative way of locking. You could call it a "lock misprediction".  

Introducing TSX in software requires assessing the different locks in application, using different libraries and quite a bit of of tuning. SAP and Intel did this for the expensive in-memory data mining SAP HANA software.  

 

The upgrade from "Ivy Bridge EX" to "Haswell-EX" yielded 50% performance, while introducing TSX roughly doubled performance. So in TSX enabled data mining software, Haswell-EX has the potential to reduce the waiting time by a factor of 3 and more. 

Xeon E7 v3 System and Memory Architecture Xeon E7 v3 SKUs and prices
Comments Locked

146 Comments

View All Comments

  • DanNeely - Friday, May 8, 2015 - link

    Intel's 94% market share is still only ~184k systems. That's tiny compared to the mainstream x86 market; and doesn't give a lot of (budgetary) room to make radical changes to CPU vs just scaling shared designs to a huger layout.
  • theeldest - Friday, May 8, 2015 - link

    184k for 4S systems. The number of 2S systems *greatly* outnumbers the 184k.
  • Samus - Sunday, May 10, 2015 - link

    by 100 orders of magnitude, easily.

    2S systems are everywhere these days, I picked up a Lenovo 2S Xeon system for $600 NEW (driveless, 4GB RAM) from CDW.

    4S, on the other hand, is considerably more rare and starts at many thousands, even with 1 CPU included.
  • erple2 - Sunday, May 10, 2015 - link

    Well, maybe 2 orders of magnitude. 100 orders of magnitude would imply, based on the 184k 4S systems, more 2S systems than atoms in the universe. Ok, I made that up, I don't know how many atoms are in the universe, but 10^100 is a really big number. Well, 10^105, if we assume 184k 4S systems.

    I think you meant 2 orders of magnitude.
  • mapesdhs - Sunday, May 10, 2015 - link

    Yeah, that made me smile too, but we know what he meant. ;)
  • evolucion8 - Monday, May 11, 2015 - link

    That would be right if Intel cores are wide enough which aren't compared to IBM. For example, according to this review, enabling two way SMT boosted the performace to 45% and adding two more threads added 30% more performance. On the other hand, enabling two way SMT on the latest i7 architecture can only go up to 30% on the best case scenario.
  • chris471 - Friday, May 8, 2015 - link

    Great article, and I'm looking forward to see more Power systems.

    I would have loved to see additional benchmarks with gcc flags -march=native -Ofast. Should not change stream triad results, but I think 7zip might profit more on Power than on Xeon. Most software is not affected by the implied -ffast-math.
  • close - Friday, May 8, 2015 - link

    It reminds me of the time when Apple gave up on PowerPC in mobiles because the new G5s were absolute power guzzlers and made space heaters jealous. And then gave up completely and switched to Intel because the 2 dual core PowerPC 970MP CPUs at 2.5GHz managed to pull 250W of power and needed liquid cooling to be manageable.

    IBM is learning nothing from past mistakes. They couldn't adapt to what the market wanted and the more nimble competition was delivering 25-30 years ago when fighting Microsoft, it already lost business to Intel (which is actually only nimble by comparison), and it's still doing business and building hardware like we're back in the '70s mainframe age.
  • name99 - Friday, May 8, 2015 - link

    You are assuming that the markets IBM sells into care about the things you appear to care about (in particular CPU performance per watt). This is a VERY dubious assumption.
    The HPC users MAY care (but I'd need to see evidence of that). For the business users, the cost of the software running on these systems dwarfs the lifetime cost of their electricity.
  • SuperVeloce - Saturday, May 9, 2015 - link

    They surely care. Why wouldn't they. A whole server rack or many of them in fact do use quite a bit of power. And cooling the server room is very expensive.

Log in

Don't have an account? Sign up now