Continuing our ISC 2014 news announcements for the week, next up is Intel. Intel has taken to ISC to announce further details about the company’s forthcoming Knights Landing processor, the next generation of Intel’s Xeon Phi processors.

While Knights Landing in and of itself is not a new announcement – Intel has initially announced it last year – Intel has offered very few details on the makeup of the processor until now. However with Knights Landing scheduled to ship in roughly a year from now, Intel is ready to offer up further details about the processor and the capabilities.

As previously announced, as the successor to Intel’s existing Knights Corner (1st generation Xeon Phi), Knights Landing makes the jump from using Intel’s enhanced Pentium 1 (P54C) x86 cores to using the company’s modern Silvermont x86 cores, which currently lie at the heart of the Intel’s Atom processors. These Silvermont cores are far more capable than the older P54C cores and should significantly improve Intel’s single threaded performance. All the while these cores are further modified to incorporate AVX units, allowing AVX-512F operations that provide the bulk Knights Landing’s computing power and are a similarly potent upgrade over Knights Corner’s more basic 512-bit SIMD units.

All told, Intel is planning on offering Knights Landing processors containing up to 72 of these cores, with double precision floating point (FP64) performance expected to exceed 3 TFLOPs. This will of course depend in part on Intel’s yields and clockspeeds – Knights Landing will be a 14nm part, a node whose first products won’t reach end-user hands until late this year – so while Knights Landing’s precise performance is up in the air, Intel is making it extremely clear that they are aiming very high.

Which brings us to this week and Intel’s latest batch of details. With last year focusing on the heart of the beast, Intel is spending ISC 2014 explaining how they intend to feed the beast. A processor that can move that much data is not going to be easy to feed, so Intel is going to be turning to some very cutting edge technologies to do it.

First and foremost, when it comes to memory Intel has found themselves up against a wall. With Knights Corner already using a very wide (512-bit) GDDR5 memory bus, Intel is in need of an even faster memory technology to replace GDDR5 for Knights Landing. To accomplish this, Intel and Micron have teamed up to bring a variant of Hybrid Memory Cube (HMC) technology to Knights Landing.


Hybrid Memory Cube (HMC)

Through the HMC Consortium, both Intel and Micron have been working on developing HMC as a next-generation memory technology. By stacking multiple DRAM dies on top of each other, connecting those dies to a controller at the bottom of the stack using Through Silicon Vias (TSVs), and then placing those stacks on-package with a processor, HMC is intended to greatly increase the amount of memory bandwidth that can be used to feed a processor. This is accomplished by putting said memory as close to the processor as possible to allow what’s essentially an extremely wide memory interface, through which an enormous amount of memory bandwidth can be created.


Image Courtesy InsideHPC.com

For Knights Landing, Intel and Micron will be using a variant of HMC designed just for Intel’s processor. Called Multi-Channel DRAM (MCDRAM), Intel and Micron have taken HMC and replaced the standard memory interface with a custom interface better suited for Knights Landing. The end result is a memory technology that can scale up to 16GB of RAM while offering up to 500GB/sec of memory bandwidth (nearly 50% more than Knights Corner’s GDDR5), with Micron providing the MCDRAM modules. Given all of Intel’s options for the 2015 time frame, the use of a stacked DRAM technology is among the most logical and certainly most expected (we've already seen NVIDIA plan to follow the same route with Pascal); however the use of a proprietary technology instead of HMC for Knights Landing comes as a surprise.

Moving on, while Micron’s MCDRAM solves the immediate problem of feeding Knights Landing, RAM is only half of the challenge Intel faces. The other half of the challenge for Intel is in HPC environments where multiple Knights Landing processors will be working together on a single task, in which case the bottleneck shifts to getting work to these systems. Intel already has a number of fabrics at hand to connect Xeon Phi systems, including their own True Scale Fabric technology, but like the memory situation Intel needs a much better solution than what they are using today.

For Knights Landing Intel will be using a two part solution. First and foremost, Intel will be integrating their fabric controller on to the Knights Landing processor itself, doing away with the external fabric controller, the space it occupies, and the potential bottlenecks that come from using a discrete fabric controller. The second part of Intel’s solution comes from developing a successor to True Scale Fabric – dubbed Omni Scale Fabric – to offer even better performance than Intel’s existing fabric solution. At this point Intel is being very tight lipped about the Omni Scale Fabric specifications and just how much of an improvement in inter-system communications Intel is aiming for, but we do know that it is part of a longer term plan. Eventually Intel intends to integrate Omni Scale Fabric controllers not just in to Knights Landing processors but traditional Xeon CPUs too, further coupling the two processors by allowing them to communicate directly through the fabric.

Last but not least however, thanks in large part to the consolidation offered by using MCDRAM, Intel is also going to be offering Knights Landing in a new form factor. Along with the traditional PCIe card form factor that Knights Corner is available in today, Knights Landing will also be available in a socketed form factor, allowing it to be installed alongside Xeon processors in appropriate motherboards. Again looking to remove any potential bottlenecks, by socketing Knights Landing Intel can directly connect it to other processors via Quick Path Interconnect as opposed to the slower PCI-Express interface. Furthermore by being socketed Knights Landing would inherit the Xeon processor’s current NUMA capabilities, sharing memory and memory spaces with Xeon processors and allowing them to work together on a workload heterogeneously, as opposed to Knights Landing operating as a semi-isolated device at the other end of a PCIe connection. Ultimately Intel is envisioning programs being written once and run across both types of processors, and with Knights Landing being binary compatible with Haswell, socketing Knights Landing is the other part of the equation that is needed to make Intel’s goals come to fruition.

Wrapping things up, with this week’s announcements Intel is also announcing a launch window for Knights Landing. Intel is expecting to ship Knights Landing in H2’15 – roughly 12 to 18 months from now. In the meantime the company has already lined up its first Knights Landing supercomputer deal with the National Energy Research Scientific Computing Center, who will be powering their forthcoming Cori supercomputer with 9300 Knights Landing nodes. Intel currently enjoys being the CPU supplier for the bulk of the Top500 ranked supercomputers, and with co-processors becoming increasingly critical to these supercomputers Intel is shooting to become the co-processor vendor of choice too.

Source: Intel

POST A COMMENT

39 Comments

View All Comments

  • puplan - Friday, June 27, 2014 - link

    "nearly 50% more than Knights Corner’s GDDR5" - the slide shows "5x bandwidth vs. GDDR5". Reply
  • iceman-sven - Friday, June 27, 2014 - link

    5x bandwidth vs. DDR4 not GDDR5 Reply
  • Ryan Smith - Friday, June 27, 2014 - link

    The 5x comment is on a per chip/module basis. Out of total bandwidth, KC was ~350GB a sec versus KL targeting over 500GB/sec Reply
  • celestialgrave - Friday, June 27, 2014 - link

    All the talk about the importance of co-processors, anyone else remember how big a deal it was when integrating math co-processors way back when?
    History sure likes to repeat itself (in a good way this time at least)
    Reply
  • frozentundra123456 - Friday, June 27, 2014 - link

    Yea, I remember back in the late 80's using chromatography control and analytical software that had a math co-processor. I think the computer's main cpu was something like 50 mhz, all the programs and data were stored on floppy disks. Installing the board for the coprocessor and getting the software to work properly was a real bear. Reply
  • Laststop311 - Wednesday, July 02, 2014 - link

    coprocessors, i remember having one with a 486 sx or dx can't remember so long ago im old :(

    could of even been with a 386
    Reply
  • Tikcus9666 - Saturday, July 05, 2014 - link

    if i remember correctly the 486 dx had a co processor within the CPU, the sx variant didn't and had to be added, although I'm old and that was 20 + years ago (and cba to check) Reply
  • Wolfpup - Tuesday, July 01, 2014 - link

    I still want to run Folding @ Home on one of these :-D Reply
  • jamescox - Thursday, July 31, 2014 - link

    I don't see these competing that well against gpus. Binary compatibility isn't usually that much of an issue in HPC. Using something like OpenCL allows you to run across a wide variety of hardware architectures. The run-time compilation should not be much of a factor since it is very fast compared to the rest of the processing. Also, adding more extensions to the ISA seems unnecessary and possibly counter productive. It gets you "closer to the metal", but it also locks the implementation in place. Using an intermediate layer allows greater changes to the underlying implementation in the future. Reply

Log in

Don't have an account? Sign up now