Taking place this week in Frankfurt, Germany is the 2015 International Supercomputing Conference. One of the two major supercomputing conferences of the year, ISC tends to be the venue of choice for major high performance computing announcements for the second half of the year and is where the summer Top 500 supercomputer list is unveiled.

In any case, Intel sends word over that they are at ISC 2015 showing off the “Knights Landing” Xeon Phi, which is ramping up for commercial deployment later this year. Intel unveiled a number of details about Knights Landing at last year’s ISC, where it was announced that the second-generation Xeon Phi would be based on Intel’s Silvermont cores (replacing the P54C cores in Knights Corner) and built on Intel’s 14nm process. Furthermore Knights Landing would also include up to 16GB of on-chip Multi-Channel DRAM (MCDRAM), an ultra-wide stacked memory standard based around Hybrid Memory Cube.

Having already revealed the major architecture details in the last year, at this year’s show Intel is confirming that Knights Landing remains on schedule for its commercial launch later this year. This interestingly enough will make Knights Landing the second processor to ship this year with an ultra-wide stacked memory technology, after AMD’s Fiji GPU, indicating how quickly the technology is being adopted by processor manufacturers. More importantly for Intel of course, this will be the first such product to be targeted specifically at HPC applications.

Meanwhile after having previously announced that the design would include up to 72 cores - but not committing at the time to shipping a full 72 core part due to potential yield issues - Intel is now confirming that one or more 72 core SKUs will be available. This indicates that Knights Landing is yielding well enough to ship fully enabled, something the current Knights Corner never achieved (only shipping with up to 61 of 62 cores enabled). Notably this also narrows down the expected clockspeeds for the top Knights Landing SKU; with 72 cores capable of processing 32 FP64 FLOPs/core (thanks to 2 AVX-512 vector units per core), Intel needs to hit 1.3GHz to reach their 3 TFLOPs projection.

Moving on, Knights Landing’s partner interconnect technology, Omni-Path, is also ramping up for commercial deployment. After going through a few naming variants, Intel has settled on the Omni-Path Fabric 100 series, to distinguish it from planned future iterations of the technology. We won’t spend too much on this, but it goes without saying that Intel is looking to move to a vertically integrated ecosystem and capture the slice of HPC revenue currently spent on networking with Infiniband and other solutions.

Finally, in order to develop that vertically integrated ecosystem, Intel is announcing that they have teamed up with HP to build servers around Intel’s suite of HPC technologies (or as Intel calls it, their Scalable System Framework). HP will be releasing a series of systems under the company’s Apollo brand of HPC servers that will integrate Knights Landing, Omni-Path 100, and Intel’s software stack. For Intel the Apollo HPC systems serve two purposes: to demonstrate the capabilities of their ecosystem and the value of their first-generation networking fabric, and of course to get complete systems on the market and into the hands of HPC customers.

Source: Intel

Comments Locked

53 Comments

View All Comments

  • patrickjp93 - Tuesday, July 14, 2015 - link

    Really, because the current Knight's Corner chips are spanking the Nvidia Tesla out of contracts left and right.
  • marraco - Monday, July 13, 2015 - link

    How pathetic is the number of cores sold to the desktop consumer.
  • SaberKOG91 - Monday, July 13, 2015 - link

    Most consumer workloads benefit from fewer faster threads. These cards run at lower frequencies, but benefit from data parallel execution units. That's useful when you want a good mix of integer and floating point HPC performance, but not at all helpful for highly sequential client software.
  • SirNuke - Monday, July 13, 2015 - link

    Different horses for different courses. Desktop workloads usually only need a few simultaneous threads and only infrequently take advantage of parallelism.
  • SaberKOG91 - Monday, July 13, 2015 - link

    That said, we could easily see efficiency improvements from 4-way SMT with how wide the cores are. Disappointed that we still only see 2-way SMT over a decade later.
  • testbug00 - Monday, July 13, 2015 - link

    And, who would be writing that scheduler? SMT2 isn't that hard relative to no SMT. Over SMT2 and things get much trickier, from my understanding.
  • SaberKOG91 - Tuesday, July 14, 2015 - link

    The SMT scheduling is part of the out-of-order execution engine, not the kernel, so it's a lot easier to scale that in hardware. Xeon Phi cores are 4-way, making it easier to port that scheduling to another microarchitecture. Power7 was 4-way, and Power8 is 8-way.

    This paper is a solid read if you want to know how difficult it is: http://lazowska.cs.washington.edu/SMT.pdf

    Frankly, the biggest limitation is cache bandwidth. You need a much wider cache to keep the threads happy. However, the efficiency gains are tremendous. P4 with HT saw up to 28% improvement in performance for as little as 5-6% area bump. And that was considered a poor implementation.
  • tipoo - Monday, July 13, 2015 - link

    Look at the die size of the simpler cores in the Phi, vs Haswell or Broadwell cores.
  • tipoo - Monday, July 13, 2015 - link

    I almost wonder what an Intel dedicated gaming GPU would be like, they have a lot of the right pieces in play. Their GPU architecture is decent enough, and with their on package memory, I wonder what that could be if they scaled up the EUs and bandwidth accordingly.
  • DigitalFreak - Monday, July 13, 2015 - link

    Phi was originally supposed to be available in a video card configuration as well, but that idea was dropped prior to launch.

Log in

Don't have an account? Sign up now