Intel's Architecture Day 2018: The Future of Core, Intel GPUs, 10nm, and Hybrid x86

Name: Intel's Architecture Day 2018: The Future of Core, Intel GPUs, 10nm, and Hybrid x86
Item: Intel's Architecture Day 2018: The Future of Core, Intel GPUs, 10nm, and Hybrid x86
Author: Dr. Ian Cutress

by Dr. Ian Cutress on December 12, 2018 9:00 AM EST

148 Comments | Add A Comment

148 Comments

The Next Generation Gen11 Graphics: Playable Games and Adaptive Sync!

Some of the first words out of the mouth of Raja Koduri about graphics is that Intel has a duty to its one billion customers with integrated graphics to give them something that is useful, and that it is time for Intel to provide graphics which people can actually play games on. Given his expertise on the matter, it shouldn’t sound too far-fetched: more people play games than ever before, and these users want to play no matter what their hardware. To that end, Raja stated that Gen11 graphics is the first step in a new graphics policy to provide the performance and features to let gamers play the most popular games, no matter what implementation.

Gen11: Intel’s first GT2 TFLOPS Graphics

In 2015, Intel launched the Skylake processor with Gen9 integrated graphics. Rather than moving straight to Gen10 the next time around, we were given Gen 9.5 in both Kaby Lake and Coffee Lake, which supposedly draw features from what would have been Gen 10. Actually, the graphics for Intel’s failed 10nm Cannon Lake chip were meant to be called Gen10, however Intel never released a Cannon Lake processor with working integrated graphics, and because Gen11 goes above and beyond what Gen10 would have been, we’ve gone straight to Gen11. Make sense? Well Intel didn’t even bother to acknowledge Gen10 in its history graph:

We will see Gen11 graphics being paired with Sunny Cove cores on 10nm sometime in 2019 according to the roadmaps. However rather than give a detailed architecture layout for the new product, we instead were given a rather high level diagram.

From here we can deduce a few things. We were told that this configuration is the GT2 config, which will have 64 execution units, up from 24 in Gen9.5. These 64 EUs are split into four slices, with each slice being made of two sub-slices of 8 EUs a piece. Each sub-slice will have an instruction cache and a 3D sampler, while the bigger slice gets two media samplers, a PixelFE, and additional load/store hardware. Intel lists Gen11 targeting efficiency, performance, advanced 3D and media capabilities, and a better gaming experience.

Intel didn’t go into too much detail regarding how the EUs are at higher performance, however the company did say that the FPU interfaces inside the EU are redesigned and it still has support for fast (2x) FP16 performance as seen in Gen9.5. Each EU will support seven threads as before, which means that the entire GT2 design will essentially have 512 concurrent pipelines. In order to help feed these pipes, Intel states that it has redesigned the memory interface, as well as increasing the L3 cache of the GPU to 3 MB, a 4x increase over Gen9.5, and it is now a separate block in the unslice section of the GPU.

Other features include tile-based rendering, which Intel stated the graphics hardware will be able to enable/disable on a render pass basis. This will make Intel the final member of the PC GPU vendor community to implement this, following NVIDIA in 2014 and AMD in 2017. While not a panacea to all performance woes, a good tile rendering setup plays well to the bandwidth limitations of an integrated GPU. Meanwhile Intel's lossless memory compression has also improved, with Intel listing a best case performance boost of 10% or a geometric mean boost of 4%. The GTI interface now supports 64 bytes per clock read and write to increase throughput, which works with the better memory interface.

Coarse Pixel Shading, Intel's implementation of multi-rate shading and similar in scope to NVIDIA’s own Variable Pixel Shading, is also supported. This allows the GPU to reduce the amount of total shading work required by shading some pixels on a less than 1:1 basis. Intel showed two demos for CPS, where pixel shading was reduced either as a function of object distance from camera (so you do less work when things are further away), or reduced as a function of how close the object is to the center of the screen, designed to help features like foveated rendering for VR. With a 2x2 pixel stencil applied – meaning only one pixel shading operation was done per block of 4 pixels – Intel stated a ~30% increase in frame rates in supported games. Unfortunately this needs to be applied on a game-by-game basis in order to prevent significant image quality losses, so the performance gains won't be immediate or universal.

For the media block, Intel says that the Gen11 design includes a ground up HEVC encoder design, with high quality encode and decode support. Intel cited the fact that its media fixed function units are already used in the datacenter for video processing, and home users can take advantage of the same hardware. Intel also stated that by using parallel decoders it can either support concurrent video streams or they can be combined to support a single large stream, and this scalable design will allow future hardware to push the peak resolutions up to 8K and beyond.

The highlight of the display engine is support for Adaptive Sync technologies. We were told that it was announced back at the launch of Skylake, but now it is finally ready to go into Intel’s integrated graphics. This goes in hand with HDR support due to its high-precision data path.

One thing in this presentation that Intel didn’t mention directly is that Gen11 graphics would appear to have Type-C video output support, potentially indicating that Intel has integrated the necessary mux into the chipset itself, removing another IC from the motherboard design.

Sunny Cove Microarchitecture: A Peek At the Back End Demonstrating Sunny Cove and Gen11 Graphics

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

148 Comments

View All Comments

Raqia - Thursday, December 13, 2018 - link
Your point is taken and Keller did say it was in its infancy, but I am interested in whether what we're seeing here will be a competitive product or will remain an interesting science experiment. There are theoretical benefits of stacking high performance dies on low leakage ones like this but also substantial challenges and deficiencies which the current iteration doesn't show that it has overcome. What we might see in benefit in terms of better overall area, lower package level fab rejection rates, and better net power characteristic could be offset by a worse concentration of heat and hence more throttling when both elements are running or more expensive packaging. Perhaps in the end, a monolithic die is a better compromise despite losing out on some metrics for mobile.
nico_mach - Wednesday, December 12, 2018 - link
So the GPU is going to be called ... Ten to the Eeth power? Is that right?

I reject all these Xes used in unpredictable ways. The iPhones are pronounced exar and excess. This is ecksee, and I still use oh ess ecks on my emm bee eh at home.
Jon Tseng - Wednesday, December 12, 2018 - link
>Intel actually says that the reason why this product came about is because a customer
>asked for a product of about this performance but with a 2 mW standby power state.

Huh wonder who the customer for that Core/Atom hybrid is. Seems a bit overpowered for a tablet. A bit underpowered for a MacBook (or for a car). Chromebooks maybe but most are too low volume to demand a custom part (maybe the education market is taking off?). PC OEMs don't normally take such custom parts for their laptops. But the graphics loadout implies some kind of PC-type application?

Any ideas??
HStewart - Wednesday, December 12, 2018 - link
From the diagram, it appears that hybrid cpu - has single Core CPU with 4 small (Atom) CPU's - such technology is done with Samsung Processors - this would mean it still lower power - but still have primary single thread core speed.

Most interesting would be how the smaller core are used in scheduling system. Most like means and enhancement in OS for proper usage.
A5 - Wednesday, December 12, 2018 - link
There aren't a ton of companies big enough to make Intel create a new product line just for them.

The whole list is probably Apple/HP/Dell. Maybe Microsoft.
The_Assimilator - Wednesday, December 12, 2018 - link
Microsoft Surface, obviously. It's become a very profitable line for MS but the current models are either too battery-hungry (Core CPUs) or too slow (Atom CPUs). Fovoros will give the best of both these worlds while also being x86... priced right, a Fovoros-based Surface will essentially end any argument for iPads in a business environment, especially considering most software remains firmly single-threaded. But it remains to be seen whether (a) Intel can get the power down even further (7W is still double most smartphones) and (b) whether their big.LITTLE implementation is good enough.
Raqia - Wednesday, December 12, 2018 - link
Windows on ARM will do just fine now that Visual Studio emits ARM native code. Once Chrome gets ported (and that will be soon https://www.neowin.net/news/both-chromium-and-fire... the platform should address 95% of typical daily use cases and provide substantial compatibility with legacy software / file formats. This is better value than iPads and upcoming dedicated SoCs like the 8cx should offer better performance and battery / heat characteristics than what Intel has planned for next year in the same power envelope.
The_Assimilator - Thursday, December 13, 2018 - link
I think you missed the part where Windows on ARM is horribly slow and therefore shitty. As a result, Microsoft has no plans to port anything useful (e.g. Office) to ARM, which means Windows on ARM is stuck being the lowest of the low-end. And that's not a space that Surface is intended to play in; Surface is an iPad competitor, and an iPad competitor can't be slow and shitty. Business devices can't be slow and shitty, and they absolutely need to be able to run Office.

I expect that either Windows on ARM will be allowed to wither and die once Fovoros ships, or it will languish in a dead zone whereby only the cheapest of the cheap devices by no-name-brand OEMs (think $100 Lenovo tablets) use ARM chips and hence need it.

So unless Qualcomm's 8cx is a game-changer in terms of performance, Fovoros should be the end of ARM on desktop, and thank fucking God for that.
Spunjji - Thursday, December 13, 2018 - link
Microsoft already have an Office code base on ARM, so I'm not sure what you're talking about there.

What would worry me about an Intel BIG.little style design is that if Windows doesn't assign your performance-critical application to the correct (big) core, performance will mostly suck just as hard as if all your cores were Atom.

As such, I'd be cautious on calling a winner just yet.
gamerk2 - Thursday, December 13, 2018 - link
Agreed with this; Microsoft has been let down by Intel not having a good mobile platform. If it were up to them, they wouldn't bother with ARM, but they have to due to battery/power/heat requirements.

Intel's Architecture Day 2018: The Future of Core, Intel GPUs, 10nm, and Hybrid x86

The Next Generation Gen11 Graphics: Playable Games and Adaptive Sync!

Gen11: Intel’s first GT2 TFLOPS Graphics

Post Your Comment

148 Comments

View All Comments

Raqia - Thursday, December 13, 2018 - link

nico_mach - Wednesday, December 12, 2018 - link

Jon Tseng - Wednesday, December 12, 2018 - link

HStewart - Wednesday, December 12, 2018 - link

A5 - Wednesday, December 12, 2018 - link

The_Assimilator - Wednesday, December 12, 2018 - link

Raqia - Wednesday, December 12, 2018 - link

The_Assimilator - Thursday, December 13, 2018 - link

Spunjji - Thursday, December 13, 2018 - link

gamerk2 - Thursday, December 13, 2018 - link

Log in

Don't have an account? Sign up now