As conference season is in full swing, this week’s big technical conference is the 2012 International Supercomputing Conference (ISC) taking place over in Hamburg, Germany. ISC is one of the traditional venues for major supercomputing and high performance computing (HPC) announcements and this year is no exception. Several companies will be showing off their wares, but perhaps the biggest announcement of the week is from Intel. After having worked on the project for over half a decade in some form or another they’re finally ready to take a stab at the parallel computing market by bringing their first Many Integrated Core (MIC) product to market. Knights Corner, the codename for the first such product, will be the launch product for a brand new family of Intel co-processors, which the company is introducing as the Xeon Phi family.

As a bit of background on the subject, as many of our regular readers are aware Intel has been working for a while now on various high performance highly parallel CPU and GPU designs based on their x86 architecture. Initially intended to fill a gap in the High Performance Computing space where users have workloads that are highly parallel (as opposed to highly serial), these designs would be able to quickly tear through highly parallel workloads by using a large collection of small, simple x86 cores that would be far better suited to the task than the large, complex x86 cores that are necessary for a modern CPU.

The first and still most famous of these projects was Larrabee, which initially unveiled in 2008 was Intel’s first attempt at building such an HPC processor in the form of a graphics capable CPU. Larrabee was to be Intel’s answer to practically NVIDIA’s entire desktop GPU lineup, with Larrabee intended to confront GeForce on the graphics side and the then-fledgling Tesla on the HPC side, both served by a single processor similar to how NVIDIA uses the same GPUs in both Tesla and GeForce products. Larrabee of course never came to fruition, and in 2010 Intel canceled it while continuing their research into parallel processing.

Larrabee’s successor was named shortly thereafter under a new architecture called Many Integrated Core (MIC), which in many ways was a direct continuation from where Larrabee left off. MIC kept the concept of multiple simple X86 cores, but threw away any pretense of graphics in favor of focusing solely on HPC computing. Even at more than 2 years out from launch Intel already had a plan for MIC, announcing the codename of the first processor – Knights Corner – which would have 50+ cores and be manufactured on Intel’s 22nm process.

This brings us to the present and Intel’s latest announcement. With Intel’s 22nm process in full production Intel is adhering to their previously announced plans and is getting ready to bring MIC to the market. So with ISC 2012 as the logical backdrop for such a product, Intel is announcing that Knights Corner will be launching into retail as the Xeon Phi family of co-processors.

At this point we don’t have the full technical details of the Xeon Phi family – Intel is still holding their cards close to their chest at this time – but with this announcement we do finally have some additional details on the hardware and how Intel intends to market it. The first generation of Xeon Phi products will be composed of an unknown number of products in the form of PCIe cards. Intel hasn’t nailed down the specific number of cores, keeping it at a nebulous 50+, but we do know that Intel is sticking to the goal of offering 1TFLOP of real world double-precision (FP64) performance; for comparison Tesla M2090 and Radeon HD 7970 have a theoretical FP64 throughput of 665GFLOPs and 947GFLOPs respectively. As for memory, Xeon Phi boards will come with at least 8GB of GDDR5, which marks the first time Intel has ever paired up a CPU with what’s otherwise graphics memory. Meanwhile the fact that it’s 8GB means we’re looking at either a 256-bit or 512-bit memory bus.

Intel isn’t using the Xeon Phi announcement to bring a great deal of attention to the underlying architecture, but all indications are that it’s closely related to what we first saw with Larrabee, with Intel confirming that it is indeed using an enhanced Pentium 1 (P54C) core with the addition of vector and FP64 hardware. Intel has also confirmed that Xeon Phi will offer 512-bit SIMD operations, which means we’re almost certainly looking at a 16-wide vector ALU in each core, the same kind of vector unit that Larrabee was detailed to have.

High Level Overview Of Larrabee's Vector ALU

We also don’t have any deep details about its fabrication – all indications are that Knights Corner is going to be large for an Intel processor – but Intel has reiterated that it’s being built on their 22nm process. Traditionally Intel has reserved their leading edge process for their higher margin mainstream products such as Core and Xeon processors, with Atom, Itanium, and other low-margin/niche products being a node (or more behind). Xeon Phi will be the first niche product to be built on Intel’s 22nm process with Atom following it up in the future.

Meanwhile on the software side of things in an interesting move Intel is going to be equipping Xeon Phi co-processors with their own OS, in effect making them stand-alone computers (despite the co-processor designation) and significantly deviating from what we’ve seen on similar products (i.e. Tesla). Xeon Phis will be independently running an embedded form of Linux, which Intel has said will be of particular benefit for cluster users. Drivers of course will still be necessary for a host device to interface with the co-processor, with the implication being that these drivers will be fairly thin and simple since the co-processor itself is already running a full OS.

All of this of course is designed to further build upon x86. The fundamental purpose of the Xeon Phi family is to bring highly threaded processing to x86, allowing x86 developers to quickly integrate the co-processor into their existing workloads and code as opposed to having to target another ISA and any idiosyncrasies it may bring. With that said it’s interesting to note that while Xeon Phi co-processors can either be used as a proper co-processor alongside a traditional Xeon processor or as a standalone device, Intel’s marketing group is focusing on the latter to differentiate themselves from  NVIDIA’s Tesla products. So while it’s possible to use both Xeon and Xeon Phi processors together on a single project it’s not clear just how common that’s going to be. Intel looks to be largely exploiting x86 for the familiarity of the ISA as opposed for the ability for code to run on either kind of Xeon.

Last but not least, Intel hasn’t put any hard date on availability but they have said they expect Xeon Phi co-processors to go into full production later this year, and in the meantime Intel has already produced enough co-processors to build a MIC based supercomputer that’s ranked #150 on the new TOP 500 list. Given the typical gap between volume production and when a product is available for purchase it’s likely that Xeon Phi co-processors won’t be available until the end of the year – if not next year – but regardless the timing is such that Intel will be going up against NVIDIA’s GK110-based Tesla K20, which is similarly expected by the end of the year. Meanwhile given AMD’s HPC ambitions with GCN we’re also not ready to rule them out, so all 3 parties may have major compute products out by the start of 2013.

Wrapping things up, as always we’ll be keeping on top of the Xeon Phi family and should have more details later this year once Intel nails down final specifications and pricing. So until then stay tuned.

Comments Locked


View All Comments

  • maximumGPU - Tuesday, June 19, 2012 - link

    how much easier will it be? With all the advancement in gpu programming, and with Microsoft integrating C++ AMP (accelerated massive parallelism) into VS2012, Intel would have trouble selling these if that's their strongest argument.
  • Jaybus - Tuesday, June 19, 2012 - link

    It is algorithm design that is easier, not tool usage or language features.

    It depends on the problem. GPGPU is only good at data-parallel algorithms. If you don't have a lot of data that can be broken into many chunks that can each be processed independently, then it won't work well. Developing for GPUs is an ongoing attempt to eliminate branching, because branching can very easily stall the pipeline. In other words, it is often better to pre-calculate all possibilities in parallel, then choose the correct one in the end. It can quickly get complicated trying to remove if / then logic.

    MIC, though, uses general purpose CPU cores that don't have the same issues with branching, yet has a 16-wide vector unit. While not nearly as wide as a GPU, it is still sort of the best of both worlds. The flexibility makes it easier to program. And, for some problems that are not so data-parallel, it makes it much easier.
  • dragonsqrrl - Tuesday, June 19, 2012 - link

    How is Intel kicking Nvidia while they're down? You're speaking as though Xeon Phi is already available, while the latest road maps indicate that Nvidia's Tesla K20 will be launching first. And I'm not sure if you've realized this, but the theoretical fp64 performance of a fully enabled gk110 should be quite a bit higher than 1 TFLOP, assuming reasonable clocks. gk110's DP performance can operate at 1/3 fp32, and gk104 is already capable of pushing 3 TFLOPs fp32 with 1536 cores. So even assuming the gk110 in K20 will be clocked significantly lower (which is pretty much a certainty), Nvidia should have absolutely no problem exceeding 1TFLOP theoretical fp64 performance. Real world performance is another story though. For that we'll just have to wait for benchmarks.

    As for the HD7970, I'm not even sure how it's relevant. Pro's in the market for a Tesla or Xeon Phi won't even consider an HD7970 as an option. It has neither the industry nor the driver support to be a viable option in this area. However like Ryan said, given AMD's shift in focus with Southern Islands we may very well see a viable option based on GCN before the year is out.
  • HighTech4US - Wednesday, October 31, 2012 - link

    Intel seems to be kicking their own backside if all they can obtain is 1 TF DP from their 22nm process.

    Nvidia's K20 (GK110) is getting 1.3 TF DP on TSMC's 28nm process.

    We're basing our numbers off of the figures published by HPCWire.

    For a given clockspeed of 732MHz and DP performance of 1.3TFLOPs, it has to be 14 SMXes. The math doesn't work for anything else.
  • Casper42 - Tuesday, June 19, 2012 - link

    It really has nothing to do with the common Xeon platform known today.
    Granted the Xeon started way back around the Pentium II era and the MIC uses modified Pentium cores, but I find it a little sad that with their marketing budget they couldn't come up with a better name.

    Atom, Core, Xeon
    Seems like all they needed was a good 4 letter name that vaguely resembles something from a Science Textbook.
  • A5 - Tuesday, June 19, 2012 - link

    Xeon = Workstation. Makes sense to me?
  • Casper42 - Tuesday, June 19, 2012 - link

    Except that Xeon primarily = Server, not workstation.
  • Daeros - Tuesday, June 19, 2012 - link

    Is anyone else wondering why Intel quit with the GPU project? Was it fear of more anti-trust litigation? From the comparisons I have seen, Intel is able to more than compete in the iGPU arena in terms of performance/die area, and I find it hard to believe Intel would experience fabrications woes on the order of what GloFo has gone through. Just wondering...
  • fluxtatic - Tuesday, June 19, 2012 - link

    Legend has a wild GPU driver-writing accident killed Intel's father. To this day, Intel can't bring themselves to write a proper graphics driver. The horror is just too much.
  • Spunjji - Tuesday, June 19, 2012 - link

    +1 :D

Log in

Don't have an account? Sign up now