Haswell Architecture Improvements

We have discussed the advantages that the Haswell core brings here in more detail. In a nutshell:

  • The core can sustain about 10% more integer instructions per clock cycle than its predecessor, Ivy Bridge. 
  • Virtualized applications should perform slightly better thanks to the lower VM exit/entry latency.
  • HPC applications could/should benefit much more if they are recompiled to make use of the improved AVX2 and Fused Multiply Add (FMA) support
  • Database transactional applications should benefit more thanks to the lower synchronization latency.
  • In-memory databases should benefit if they are adapted to make use of the AVX-2 256 bit integer vector operations.  

Again, the same is true about the Xeon E5-2600v3. So what makes the E7 special? 

Transactional Synchronization Extensions: I'll be back 

There is one "new" - or rather "now working" - feature: TSX or the famous Transactional Synchronization eXtensions. These extensions are all about making locking more "optimistic" (you let the CPU handle the bookkeeping to maintain consistency). TSX is quite powerful, but also can be a liability in the wrong use case. Developers will need a deep understanding of the locking and parallel programming to be able to make good use of TSX, as 

  1. ... you still have to rewrite your code (inserting hints)
  2. TSX may reduce performance in some situations: if indeed a pessimistic lock was necessary, the transaction has to be re-executed with a "traditional" conservative way of locking. You could call it a "lock misprediction".  

Introducing TSX in software requires assessing the different locks in application, using different libraries and quite a bit of of tuning. SAP and Intel did this for the expensive in-memory data mining SAP HANA software.  

 

The upgrade from "Ivy Bridge EX" to "Haswell-EX" yielded 50% performance, while introducing TSX roughly doubled performance. So in TSX enabled data mining software, Haswell-EX has the potential to reduce the waiting time by a factor of 3 and more. 

Xeon E7 v3 System and Memory Architecture Xeon E7 v3 SKUs and prices
Comments Locked

146 Comments

View All Comments

  • PowerTrumps - Saturday, May 9, 2015 - link

    Ok, yes a data center like Verizon or ATT might not "qualify" but the point is accurate. I work with IBM's Power servers and have absolutely consolidated 5 racks of x86 into a single Power server - it was 54 Intel 2S & 4S servers into a single 64c Power7. Part of this is due to the "performance" of Power but most of the credit goes to the efficiency of the Power Hypervisor. PHYP can provide a QoS to each workload while weaving a greater amount of workloads onto fewer Power servers/cores than what the benchmarks imply.
  • newtrekemotion - Friday, May 8, 2015 - link

    I wouldn't discount Oracle so quickly. The T5 was a pretty big step forward from the T4 and the new M7 chip sounds like it could be quite the competitor with 2 TB of memory per socket and 32 cores, especially for highly threaded loads since an octo-socket system would have 2048 threads and support 16 TB of memory.. Hopefully this can bring some more competition to the market, though with only Oracle and Fujistu (maybe?) selling systems it won't have quite the impact that multiple POWER8 vendors could bring. Love them, hate them, or anywhere in between it seems Oracle is not ready to give up in this arena and it looks like they are putting more effort in than Sun was (or are at least executing on effort more than Sun did).

    Something else to note here is the process advantage that Intel has over everyone else. I might have missed it in the article, but especially for performance/watt this is important.

    In all I think the statement at the beginning of the article that this area is getting more exciting is very true. Just seems like it might be a 3 way race instead of a 2. The recent AMD announcement that they wanted to focus on HPC is interesting too though of the 4 (Intel, IBM, Oracle and AMD) they have the furthest to go and the fewest resources to do it with. The next few years are going to be very interesting and hopefully someone, or a combination can push Intel and drive the whole market forward.
  • JohanAnandtech - Friday, May 8, 2015 - link

    I was writing from a "who will be able to convert Intel Xeon people" point of view. As I wrote in the Xeon E7v2 article, Oracle's T processors have indeed vastly improved. That is all nice and well but there is no reason why someone considering a Xeon E7 would switch. Oracle's sales seems to mostly about people who are long time Oracle users. As far as I can see, OpenPOWER servers are the only real thread to Intel's server hegemony.
  • Kevin G - Saturday, May 9, 2015 - link

    Oracle does offer one reason to switch to SPARC: massive licensing discounts on Oracle software.

    If you're not using Oracle's software, then yeah, the SPARC platform is a very tough sell over x86 or POWER.
  • JohanAnandtech - Saturday, May 9, 2015 - link

    exactly. Good point.
  • PowerTrumps - Saturday, May 9, 2015 - link

    If you are running Oracle software you should know that IBM and Power are the largest platform which Oracle software runs on. Secondly, if running Oracle products licensed by the core, the only platform to control Oracle licensing is Power (not including Mainframe in this assertion). I have reduce Oracle licensing for customers anywhere from 4X to 10X. Do the math on that to appreciate those savings. Lastly, when I upgrade customers from one generation to another we talk about how much Oracle they can reduce. You don't hear that when upgrading from Sandy Bridge to Ivy Bridge to Haswell.
  • kgardas - Friday, May 8, 2015 - link

    I'm not sure about T5, but certainly latest Fujitsu's SPARC64-X+ is able to over-run POWER8 and by wide margin also older Xeon's. Just look for the spec. rate. It also won some SAP S&D 2-tier benchmark on absolute performance so I'm glad that SPARC is still competitive too...
  • Kevin G - Saturday, May 9, 2015 - link

    The top SPARC benchmarks I've seen are using far more sockets, cores, threads and memory to get to that top spot. It is nice that the system can scale to such high socket counts (40) but only if you can actually fund a project that needs that absolute performance. Drop down to 16 socket where you can get twice the performance from POWER than SPARC with the same licensing cost, what advantage does SPARC have to make people switch?

    Even then, a system like SGI's UV2000 would fall into the same niche due to its ability to scale to insane socket counts, software licensing fees be damned.
  • kgardas - Tuesday, May 12, 2015 - link

    Kevin G, actually you are right and I made an mistake. It was not intentional, I was misled by spec site claiming "24 cores, 4 chips, 6 cores/chip, 8 threads/core" for "IBM Power S824 (3.5 GHz, 24 core, RHEL)" so I've thought this is 4 socket setup and I compared it with Fujitsu M10-4 which won. Now, I've just found IBM is two socket which means it wins on socket/spec rate basis of course. Price-wise IBM is also way much cheaper than SPARC (if you don't run Oracle DB of course) so I keep my fingers crossed for OpenPOWER.
    Honestly, although this is really nice to see I still have kind of feeling that this is IBM hardware division swan's song. I would really like to be wrong here. Anyway, I still think that ARMv8 does have higher chances in getting into the Intel's business and be really a pain for Intel. On the other hand if OpenPOWER is successful in Chinese business, that would be good and some chance for us too to see lower-cost POWER machines...
  • PowerTrumps - Saturday, May 9, 2015 - link

    yes, take a look at those benchmark results and you see the Fuji M10-4S requires 640 & 512 cores. Even the Oracle M6-32 uses 384 cores. The Fuji 512c example had 33% higher SAPS with 2X the cores. The M6-32 has 50% more cores to get 21% higher SAPS. Further, looking at the SAP benchmark as a indicator of core, chip & server performance shows that SPARC & Intel are roughly 1600 - 2200 SAPS per core compared to Power8 which is 5451 SAPS for the 80 core E870. So you put this into context the 80 core Power8 has slightly less than 1/2 the SAPS of the 640 core Fujitsu M10-4S. Think of ALL the costs associated with 640 cores vs 80...ok, 160 if we want to get the SAPS roughly equal. 4X more cores to get less than 2X the results.

Log in

Don't have an account? Sign up now