The Core

As Ian already discussed, the new Xeon E7 v2 is a 6, 8, 10, 12 or 15-core Ivy Bridge Xeon, similar to the Xeon E5-2600 v2. The big difference of course is that this new Xeon E7 v2 can be plugged into a quad- or native octal-socket server. These processors have three QuickPath Interconnects to be able to communicate over one hop. More sockets are possible with third party "glue logic".

Compared to the old Xeon E7 based on the "Westmere" core, the new Xeon E7 v2 "Ivy Bridge EX" features a vast amount of improvements. We will not list all of them, but just to give you an idea of how much progress has been made since the Westmere core:

  • µop cache (less decoding)
  • Improved branch prediction
  • Deeper and larger OoO buffers
  • Turbo Boost 2.0
  • AVX instructions
  • Divider is twice as fast
  • MOVs take no execution slots
  • Improved prefetchers
  • Improved shift/rotate and split/load
  • Better balance between Hyper-Threading and single-threaded performance; buffers are dynamically allocated to threads
  • Faster memory controller

Most of the improvement were fine tuning but the combined effect of them should result in a tangible performance boost in integer performance. For software that uses AVX, the performance boost could be very substantial. Even in software that uses older SSE(2) code, we found that the Sandy Bridge/Ivy Bridge generations were 20% faster, clock for clock, and we should see similar results here.

The Uncore

Just like the Xeon E5-2600 v2, the Ivy Bridge EX cores and 2.5MB L3 cache slices are stacked in columns connected with three fast rings, which connect all cores and all other the units (called agents) on the SoC. These rings also make sure that the L3 slices can act as one unified 37.5MB L3 cache with 450GB/s of bandwidth. The latency to the L3 cache is very low: 15.5ns (at 2.8GHz) versus 20ns for Westmere-EX (Xeon E7-4780 at 2.4GHz). PCIe I/O now happens on the die as well, and each CPU can support 32 PCIe lanes.

Finally, some coherency improvements are also implemented. Modified cache lines are send straight to the requester, without any write back to the memory agent. Overall, the collective sum of the improvement should prove quite capable.

Intel Aiming High Now with High Bandwidth Memory
Comments Locked

125 Comments

View All Comments

  • Kevin G - Saturday, February 22, 2014 - link

    Not 100% sure since I'm not an IEEE member to view it, but this paper maybe the source for the POWER7+ figures:
    http://ieeexplore.ieee.org/xpl/articleDetails.jsp?...
  • Phil_Oracle - Monday, February 24, 2014 - link

    TDP is great for comparing chip to chip, but what really matters is system performance/watt. And although Intel's latest Xeon E7 v2 may have better TDP specs than either Power7+ or SPARC T5, when you look at the total system performance/watt, SPARC T5 actually leads today due to its higher throughput, core count, 4 x more threads, built-in encryption engines and higher optimization with the Oracle SW stack.
  • Flunk - Friday, February 21, 2014 - link

    8 core consumer chips now please. If you have to take the GPU off go for it.
  • DanNeely - Friday, February 21, 2014 - link

    Assuming you mean 8 identical cores, until mainstream consumer apps appear that can use more CPU resources than the 4HT cores in Intel's high end consumer chips but which can't benefit from GPU acceleration become common it's not going to happen.

    I suppose Intel could do a big.little type implementation with either core and atom or atom and the super low power 486ish architecture they announced a few months ago in the future. But in addition to thinking it was worthwhile for the power savings, they'd also need to license/work around arm's patents. I suppose a mobile version might happen someday; but don't really see a plausible benefit for laptop/desktop systems that don't need continuous connected standby like phones do.
  • Kevin G - Friday, February 21, 2014 - link

    Intel hasn't announced any distinct plans to go this route, they're at least exploring the idea at some level. The SkyLake and Knights Landing are to support the same ISA extensions and in principle a program could migrate between the two types of cores.
  • StevoLincolnite - Saturday, February 22, 2014 - link

    Er. You don't need apps to use more than 4 threads to make use of an 8 core processor.
    Whatever happened to running several demanding applications at once? Surely I am not the only one who does this...
    My Sandy-Bridge-E processor being a few years old is starting to show it's age in such instances, I would cry tears of blood for an 8-Core Haswell based processor to replace my current 6-core chip.
  • psyq321 - Monday, March 10, 2014 - link

    Well, you can buy bigger Ivy Bridge EP Xeon CPU and fit it in your LGA2011 system.

    This way you can go up to 12 cores and not have to wait for 8-core Haswell E.
  • SirKnobsworth - Friday, February 21, 2014 - link

    8 core Haswell-E chips are due out later this year. You can already buy 6 core Ivy Bridge-E chips with no integrated graphics.
  • TiGr1982 - Friday, February 21, 2014 - link

    Did you know:
    Haswell-E is supposed to be released in Q3 this year, to have up to 8 Haswell cores with HT, fit in the new revision of Socket LGA2011 (incompatible with the current desktop LGA2011), and work with DDR4 and X99 chipset. No GPU there, since it's a byproduct of server Haswell-EP.
  • Harry Lloyd - Friday, February 21, 2014 - link

    That will not help much, unless they release a 6-core chip for around 300 $, replacing the lowest LGA2011 4-core chips. It is about time.

Log in

Don't have an account? Sign up now