Today we have an announcement out of left field. Intel has formally revealed it has been working on a new series of processors that combine its high-performance x86 cores with AMD Radeon Graphics into the same processor package using Intel’s own EMIB multi-die technology. If that wasn’t enough, Intel also announced that it is bundling the design with the latest high-bandwidth memory, HBM2.

Intel announced its EMIB technology over the last twelve months, with the core theme being the ability to put multiple and different silicon dies onto the same package at a much higher bandwidth than a standard multi-chip package but at a much lower cost than using a silicon interposer. At Intel’s Manufacturing Day earlier this year, they even produced a slide (above) showcasing what might be possible: a processor package with the x86 cores made on one technology, the graphics made in another, perhaps different IO and memory or wireless technologies too. With EMIB, processor design can become a large game of Lego.

EMIB came to market with the latest Intel Altera FPGAs. By embedding the EMIB required silicon design into the main FPGA and each of the chipsets, the goal was to add multiple memory blocks as well as data transfer blocks in a mix and match scenario, allowing large customers to have the design tailored to what they require. The benefits of EMIB were clear, without the drawbacks of standard MCP design or the cost of interposers: it would also allow a design to go beyond the monolithic reticle limit of standard lithography processes. It was always expected that EMIB would have to find its way into the general processor market, as we start to see high-end server offerings approaching 900 mm2 over multiple silicon dies in a single package.

Since the EMIB announcements, Intel’s Manufacturing Day, and Hot Chips, word has been circulating about how Intel is going to approach this from a consumer standpoint. As part of the requirements of Intel’s own integrated graphics solutions, a 2011 cross-licensing deal with NVIDIA was in place – this deal was set to expire from April 1st 2017, and no mention of extending that deal was ever made public. A couple of rumors floated around that Intel were set to make a deal with AMD instead, as despite their x86 rivalry they were a preferred partner in these matters. Numerous outlets with connections in both AMD and Intel had difficulty prizing any information out. Historically Intel refuses to comment on such matters in advance. Other potential leaks include published benchmarks over at SiSoft, although nothing has been made concrete until today.

Intel’s official statements on the announcement offer a few details worth diving into.

The new product, which will be part of our 8th Gen Intel Core family, brings together our high-performing Intel Core H-series processor, second generation High Bandwidth Memory (HBM2) and a custom-to-Intel third-party discrete graphics chip from AMD’s Radeon Technologies Group* – all in a single processor package.

Intel interestingly uses a singular word for ‘product’, although this does not indicate if it is a family or literally a single SKU in the works. On Intel’s Core-H series processors, these are currently Kaby Lake based running at 45W, with Intel’s integrated GT2 graphics. It would be interesting to see if the graphics of the Core-H are then stripped out as a new silicon design, or if they are re-spinning the full Core-H silicon as a result and just displaying the integrated cores, or are able to run both graphics segments independently (it is likely a new spin of silicon, if I were a betting man). The use of HBM2 is not unsuprirising – Intel has successfully integrated HBM2 into its Altera EMIB-based products so we would suspect that this is not going to be overly difficult.

The next bit is the interesting one: ‘custom-to-Intel … discrete graphics chip’ from AMD RTG. This means that none of AMD’s current product stack has silicon dedicated to EMIB, but AMD is going to leverage its semi-custom design to provide graphics chiplets for Intel to add to its silicon.

‘In close collaboration, we designed a new semi-custom graphics chip, which means this is also a great example of how we can compete and work together, ultimately delivering innovation that is good for consumers… Similarly, the power sharing framework is a new connection tailor-made by Intel among the processor, discrete graphics chip and dedicated graphics memory. We’ve added unique software drivers and interfaces to this semi-custom discrete GPU that coordinate information among all three elements of the platform.’

One of the questions about running multiple chips in a single package is how to manage all the bandwidth and power. AMD has recently solved that issue in its server processors and inside its APUs by using their Infinity Fabric, which if I were to guess would not be under the purview of this collaboration. It states that with collaboration that the chip shares a power framework, which will be an interesting deep dive when we get information as to whether Intel offering separate power rails for the CPU and GPU segments, using an integrated voltage regulator (like Broadwell did), or doing something similar to AMD by using a unified power rail sharing mechanism with digital LDOs as was announced with Ryzen Mobile only a couple of weeks ago.

‘Look for more to come in the first quarter of 2018, including systems from major OEMs based on this exciting new technology.’

It looks like Intel is ready to make some announcements over the next few months on this project, and CES is just around the corner in January.

Though taking a step back, we have to consider what this means and what market Intel is aiming for. AMD recently launched (with products coming soon) their Ryzen Mobile platform, designed with quad-core Zen and up to 10 CUs of Vega graphics. The announcements from Intel and AMD do not state what graphics core they are using (they could be one generation behind for competitive reasons?) however it does state that they are using Core-H series processors, which are typically in the 45W range. AMD currently hasn’t announced anything in that segment, and deciding to focus Ryzen Mobile at the thin and ultralight notebook categories first. If AMD does bring Ryzen Mobile up to more powerful devices, then this new product will be in direct competition.

Looking at the image provided by Intel on the new product arrangement actually adds a new question or two to the bucket list. Here we have an Intel chip on the right, the AMD custom graphics in the middle, and the HBM2 chip next to it. The Intel chip is a long way away from the AMD chip, which would suggest that these two are not connected via EMIB if the mockup was accurate. The close proximity of the big chip in the middle to what looks like a HBM2 stack does suggest that it is connected via EMIB, as given by how close the chips in the Altera products are:

EMIB is being used, but it does not look like it is being used for all the chips together. It’s worth noting that neither Intel nor AMD offered pre-briefings on this announcement, so there are a lot of unanswered questions hanging around as a result.

A final thought. Apple uses a lot of Intel's 45W processors for iMacs; offering AMD graphics (Apple's preferred pro-graphics partner) into the segment that previously Intel's Crystalwell/eDRAM based products exist might be the next step on that product cycle evolution.

Source: Intel

Source: AMD

More Commentary

After an hour or two to digest, we have some new thoughts.

Firstly, judging by the wording and Intel's launch video, it can basically be confirmed that EMIB is only being used between the GPU and the HBM2. The distance between the CPU and GPU is too far for EMIB, so is likely just PCIe through the package which is a mature implementation. This configuration might also help with power dissipation if the chips are further apart.

The agreement between AMD and Intel is that Intel is buying chips from AMD, and AMD is providing a driver support package like they do with consoles. There is no cross-licensing of IP going on: Intel likely provided AMD with the IP to make the EMIB chipset connections for the graphics but that IP is only valid in the designs that AMD is selling to Intel (it's a semi-custom foundry business, these agreements are part of the job).

With Intel buying chips from AMD, it stands to reason they could be buying more than one configuration, depending on how Intel wanted to arrange the product stack. Intel could pair a smaller 10 CU design with a dual core, and a bigger 20+ CU design with a quad-core mobile processor. A couple of benchmark sources seem to believe that there is at least two configurations in Polaris-like configurations, with up to 24 CUs in the high-end model. We will obviously wait before confirming this, as Polaris is not originally built for HBM2 memory. Normally with HBM2 it requires a GPU that is designed to be fed by HBM - data management is a key operation. However, if it works 'naturally', then it should be a case of attaching the HBM2 controller IP to the GPU and away you go.

In an ideal world, it would make sense for AMD to sell Intel their Polaris designs, and for their own products say at least one generation ahead. With AMD's financial success of late, they could be in a position to do this, or Intel might be offering top dollar for the latest design. Neither company have commented on the arrangement between the two companies yet other than their press releases.

In discussions with Peter Bright from Ars Technica, we have concluded that it is likely for the Intel GPU to still keep its own integrated graphics, and the system could act in a switching graphics arrangement. This would be easy if the CPU and GPU are connected via PCIe, as all the mechanisms are in place. With the Intel integrated GPU already there, video playback would be accelerated and kept on die then sent to the display controller - it would allow the GPU and the HBM2 to power down, saving energy. If the GPU and HBM2 were kept powered up, then we would see reductions in battery life for future devices.

It has been discussed if this is a play just for Apple, given that Apple was behind Intel implementing eDRAM on its Crystalwell processors, and the latest generation of Crystalwell parts seem to be in Apple iMacs almost exclusively. That being said, Intel has stated that they have multiple partners interested in the design, and we should expect more information with devices in Q1. With Intel saying 'devices', it stands to reason that there are various OEMs waiting to work with the hardware.

As for the types of devices that we will be seeing, this one is a little confusing. Intel quoted Core-H series CPUs, which are 35W/45W parts. This also gels with comments saying that these new parts and Ryzen Mobile would not be in direct competition. However, in the demo video provided, it is clear that the potential for this design to go into thin and light notebooks like 2-in-1s and ultra-portables is on Intel's mind. Does that mean Intel is targeting 15W? Well if Intel is buying multiple configurations of chips from AMD, then strapping a dual-core i5 to 10 CU graphics part is more than plausible. If AMD is selling Intel the older Polaris design, the AMD has that advantage at least.

POST A COMMENT

252 Comments

View All Comments

  • smilingcrow - Monday, November 6, 2017 - link

    Every chip is more efficient if you reduce the clocks and especially the voltage.
    The issue as that Vega is aimed at the power hungry high end gaming GPU market where performance is king and efficiency much less so.
    In other words its a failure compared with the competition on both counts no matter how people try and spin the data.
    Reply
  • sirmo - Monday, November 6, 2017 - link

    14nm GloFo process doesn't scale well with clocks.. It is much more efficient at moderate clocks than Vega 64 would make it seem.

    Also Vega architecture (and AMD's GCN architecture as a whole), scales much better with less CUs. You can see this by just comparing Vega 56 and 64.. the extra CUs don't really improve the performance that much. GCN and Vega suffer from lower stream processor utilization the higher in SPs you go. This means that sub 32CU configurations should perform much more efficiently especially if you ensure they are also clocked at the 14nm sweet spot clocks. I do not think Nvidia has an efficiency edge in this case.
    Reply
  • tuxRoller - Monday, November 6, 2017 - link

    So you think the issue is their command processor? How did you come to this conclusion? Reply
  • neblogai - Tuesday, November 7, 2017 - link

    The issue is likely the drivers, with driver team being too busy to ready drivers for new Vega features for gaming. Gaming Vega should use primitive shading for faster culling, so that geometry engines properly feed the CUs. But such important Vega features (primitive shaders and draw stream binning rasterizer) are not working in games currently. Vega64 design comes with 16 CUs per compute engine, so it suffers the worst- Raven Ridge comes with up to 11CUs, these new semi-custom for Intel look to be 12CU per compute engine (24CU /2). You can check this guys channel for some interesting speculation: https://www.youtube.com/watch?v=m5EFbIhslKU Reply
  • vladx - Tuesday, November 7, 2017 - link

    You'd be wrong there, Nvidia also fabs certain Pascal models on GloFo's 14nm pricess and those are indeed more power efficient than AMD's. Reply
  • JasonMZW20 - Tuesday, November 7, 2017 - link

    Stream processor usage is fine when they're used correctly, which is up to developers and various low-level API tech (Vulkan, DX12), as well as the driver team at AMD. Nvidia prefers an ROP-heavy design for consumer graphics, while AMD likes to balance their designs out (64CU/64ROP). ROPs are always bandwidth limited, so unless you've drastically reduced bandwidth requirements across the architecture, it's generally not wise to unbalance the design and favor ROPs. Nvidia does it because they've worked diligently on DCC and various other compression techniques, as well as triangle culling without overdraw along with tile-based deferred rendering.

    Vega has something called NGG, which is their Next-Generation Geometry engine (a more programmable geometry engine that isn't limited to fixed function ops), and is somewhat confusingly called Primitive Shaders. Fixed-function geometry is also included in hardware, and functions at 4 primitives/clock via 4 geometry units (hence why Vega performs like an upclocked Fiji in many cases), which is what most games not designed for NGG will use. NGG is capable of more than 17 primitives/clock and can be combined with DSBR. I don't think NGG is currently enabled, but even if it was, it'd need developer support to be used as Primitive Shaders are basically programmable geometry shaders that don't go through the fixed-function geometry engines (NGG has a new pathway that probably leverages the compute power of Vega).

    I've noticed in Wolfenstein 2 that power consumption is quite a bit lower at 1600MHz/1.05v vs DX12 benchmarks running at 1575MHz/1.031-1.043v. We're talking about 50W lower (185-193w in Wolf 2 at max settings vs 243w in Superposition/1080p Extreme at 1575MHz/1.025-1.031MHz). So, clearly, Wolf 2 on Vulkan is using some of Vega's new features like Rapid Packed Math and triangle culling (DSBR), which help its efficiency and throughput. GPU usage is always 99-100% in both cases, so I don't think there's an issue with stream processor utilization. Games like GTA V, though, clearly have an issue with Vega and it'll be up to AMD to solve that.
    Reply
  • artk2219 - Monday, November 6, 2017 - link

    I wonder the manner in which AMD will be screwed over by Intel this time. Reply
  • IGTrading - Monday, November 6, 2017 - link

    Best question we've heard today.

    Insightful!

    #devilsinthedetails and Intel is his twin. :)
    Reply
  • haknukman - Tuesday, November 7, 2017 - link

    the only way intel screw Amd is not paying for the chip! Reply
  • TEAMSWITCHER - Monday, November 6, 2017 - link

    "Teaser trailers" for movies have more information. Reply

Log in

Don't have an account? Sign up now