As conference season is in full swing, this week’s big technical conference is the 2012 International Supercomputing Conference (ISC) taking place over in Hamburg, Germany. ISC is one of the traditional venues for major supercomputing and high performance computing (HPC) announcements and this year is no exception. Several companies will be showing off their wares, but perhaps the biggest announcement of the week is from Intel. After having worked on the project for over half a decade in some form or another they’re finally ready to take a stab at the parallel computing market by bringing their first Many Integrated Core (MIC) product to market. Knights Corner, the codename for the first such product, will be the launch product for a brand new family of Intel co-processors, which the company is introducing as the Xeon Phi family.

As a bit of background on the subject, as many of our regular readers are aware Intel has been working for a while now on various high performance highly parallel CPU and GPU designs based on their x86 architecture. Initially intended to fill a gap in the High Performance Computing space where users have workloads that are highly parallel (as opposed to highly serial), these designs would be able to quickly tear through highly parallel workloads by using a large collection of small, simple x86 cores that would be far better suited to the task than the large, complex x86 cores that are necessary for a modern CPU.

The first and still most famous of these projects was Larrabee, which initially unveiled in 2008 was Intel’s first attempt at building such an HPC processor in the form of a graphics capable CPU. Larrabee was to be Intel’s answer to practically NVIDIA’s entire desktop GPU lineup, with Larrabee intended to confront GeForce on the graphics side and the then-fledgling Tesla on the HPC side, both served by a single processor similar to how NVIDIA uses the same GPUs in both Tesla and GeForce products. Larrabee of course never came to fruition, and in 2010 Intel canceled it while continuing their research into parallel processing.

Larrabee’s successor was named shortly thereafter under a new architecture called Many Integrated Core (MIC), which in many ways was a direct continuation from where Larrabee left off. MIC kept the concept of multiple simple X86 cores, but threw away any pretense of graphics in favor of focusing solely on HPC computing. Even at more than 2 years out from launch Intel already had a plan for MIC, announcing the codename of the first processor – Knights Corner – which would have 50+ cores and be manufactured on Intel’s 22nm process.

This brings us to the present and Intel’s latest announcement. With Intel’s 22nm process in full production Intel is adhering to their previously announced plans and is getting ready to bring MIC to the market. So with ISC 2012 as the logical backdrop for such a product, Intel is announcing that Knights Corner will be launching into retail as the Xeon Phi family of co-processors.

At this point we don’t have the full technical details of the Xeon Phi family – Intel is still holding their cards close to their chest at this time – but with this announcement we do finally have some additional details on the hardware and how Intel intends to market it. The first generation of Xeon Phi products will be composed of an unknown number of products in the form of PCIe cards. Intel hasn’t nailed down the specific number of cores, keeping it at a nebulous 50+, but we do know that Intel is sticking to the goal of offering 1TFLOP of real world double-precision (FP64) performance; for comparison Tesla M2090 and Radeon HD 7970 have a theoretical FP64 throughput of 665GFLOPs and 947GFLOPs respectively. As for memory, Xeon Phi boards will come with at least 8GB of GDDR5, which marks the first time Intel has ever paired up a CPU with what’s otherwise graphics memory. Meanwhile the fact that it’s 8GB means we’re looking at either a 256-bit or 512-bit memory bus.

Intel isn’t using the Xeon Phi announcement to bring a great deal of attention to the underlying architecture, but all indications are that it’s closely related to what we first saw with Larrabee, with Intel confirming that it is indeed using an enhanced Pentium 1 (P54C) core with the addition of vector and FP64 hardware. Intel has also confirmed that Xeon Phi will offer 512-bit SIMD operations, which means we’re almost certainly looking at a 16-wide vector ALU in each core, the same kind of vector unit that Larrabee was detailed to have.


High Level Overview Of Larrabee's Vector ALU

We also don’t have any deep details about its fabrication – all indications are that Knights Corner is going to be large for an Intel processor – but Intel has reiterated that it’s being built on their 22nm process. Traditionally Intel has reserved their leading edge process for their higher margin mainstream products such as Core and Xeon processors, with Atom, Itanium, and other low-margin/niche products being a node (or more behind). Xeon Phi will be the first niche product to be built on Intel’s 22nm process with Atom following it up in the future.

Meanwhile on the software side of things in an interesting move Intel is going to be equipping Xeon Phi co-processors with their own OS, in effect making them stand-alone computers (despite the co-processor designation) and significantly deviating from what we’ve seen on similar products (i.e. Tesla). Xeon Phis will be independently running an embedded form of Linux, which Intel has said will be of particular benefit for cluster users. Drivers of course will still be necessary for a host device to interface with the co-processor, with the implication being that these drivers will be fairly thin and simple since the co-processor itself is already running a full OS.

All of this of course is designed to further build upon x86. The fundamental purpose of the Xeon Phi family is to bring highly threaded processing to x86, allowing x86 developers to quickly integrate the co-processor into their existing workloads and code as opposed to having to target another ISA and any idiosyncrasies it may bring. With that said it’s interesting to note that while Xeon Phi co-processors can either be used as a proper co-processor alongside a traditional Xeon processor or as a standalone device, Intel’s marketing group is focusing on the latter to differentiate themselves from  NVIDIA’s Tesla products. So while it’s possible to use both Xeon and Xeon Phi processors together on a single project it’s not clear just how common that’s going to be. Intel looks to be largely exploiting x86 for the familiarity of the ISA as opposed for the ability for code to run on either kind of Xeon.

Last but not least, Intel hasn’t put any hard date on availability but they have said they expect Xeon Phi co-processors to go into full production later this year, and in the meantime Intel has already produced enough co-processors to build a MIC based supercomputer that’s ranked #150 on the new TOP 500 list. Given the typical gap between volume production and when a product is available for purchase it’s likely that Xeon Phi co-processors won’t be available until the end of the year – if not next year – but regardless the timing is such that Intel will be going up against NVIDIA’s GK110-based Tesla K20, which is similarly expected by the end of the year. Meanwhile given AMD’s HPC ambitions with GCN we’re also not ready to rule them out, so all 3 parties may have major compute products out by the start of 2013.

Wrapping things up, as always we’ll be keeping on top of the Xeon Phi family and should have more details later this year once Intel nails down final specifications and pricing. So until then stay tuned.

POST A COMMENT

54 Comments

View All Comments

  • wumpus - Tuesday, June 19, 2012 - link

    If Intel is beating their chest over "theoretical MFLOPS" I would simply assume that is because they can only claim "machoflops" (haven't heard that term in forever) instead of actually pushing the doubles through.

    One other issue is that if they are gunning for the top500 list, LuxMark and SiSoft don't matter, they need to use LINPAC. There may internal issues between using numbers that aren't based on LINPAC and aren't as high as the competition (using something else).

    Also, pointing out the NVIDIA’s GK110-based Tesla K20 is pretty much a joke considering it runs 20% the fp power of the AMD and Intel systems mentioned (for single point DSP work it should be unstoppable, but don't expect it to be useful for much HPC work).

    Finally, I wonder what it must be like to work at AMD or Nvidia and watch Intel casually launch a swing-for-the-fences product that challenges your bread and butter. They might have a 15 year history of complete fail (on these high end coprocessors), but it looks like the engineers/groups on the project change and you have to worry each time they try.
    Reply
  • Braincruser - Tuesday, June 19, 2012 - link

    Considering the rumors i have heard so far, Tesla K20 will be focused on double point calculation and carrying around 50-80% of the single floating point performance in double. Around 1.5 TFlops. But these are only rumors and as such i would avoid them till we see it in action. Reply
  • mczak - Tuesday, June 19, 2012 - link

    The chip details about gk110 aren't rumors any more - 15 SMX (with 192SP ALUs and 64DP ALUs) are confirmed. So 1/3 DP rate. The exact flops rate though isn't known since neither clock speed nor the actual active unit count (there's a good chance at least one SMX is always disabled) is known. But it should end up in the neighbourhood of 4 GFlops single / 1.3 GFlops double. Reply
  • dragonsqrrl - Tuesday, June 19, 2012 - link

    ... you mean TFLOPs right? Reply
  • mczak - Wednesday, June 20, 2012 - link

    oops yes. 20 years ago it would have been GFlops :-) Reply
  • Khato - Tuesday, June 19, 2012 - link

    Technically no, because Intel's only numbers are per-node which leaves a question of whether they're allowing the CPUs to contribute or not. If they are including the CPUs, then a single Xeon Phi gets around 700 GFlops in linpack(rmax.)

    And yes, I consider a presentation by Intel using the industry standard benchmark to be a 'real benchmark'. Far more real than what the GPU companies typically throw around in their PR materials.
    Reply
  • Ryan Smith - Tuesday, June 19, 2012 - link

    Intel has already hit 1TFLOPs on LINPACK, though it's not clear whether this is being shown in a live demo or not.,

    http://www.hpcwire.com/hpcwire/2012-06-18/intel_wi...
    Reply
  • mczak - Tuesday, June 19, 2012 - link

    Disregarding the issues of "real" vs. "theoretical" flops (which we don't really know enough about, if for instance intel has a 512bit memory interface that could indeed also give an advantage), this is only for DP flops. But I think SP flops shouldn't be completely neglected, and the 7970 (as well as the nvidia K10 though of course this one stinks with DP) very easily beats Knights Corner there.
    There's a lot more than raw DP flops though which counts so it may still be quite ok. It doesn't have any of the graphics "baggage" and the "many-core" approach is certainly a bit different.
    Reply
  • Haserath - Tuesday, June 19, 2012 - link

    If Intel's main selling point is "easier to code for," they probably don't have much of an advantage otherwise. Reply
  • Assimilator87 - Tuesday, June 19, 2012 - link

    But can it Fold? Reply

Log in

Don't have an account? Sign up now