Ever since Raja Koduri left AMD and joined Intel, I have been continuously asking for a 1-on-1 interview, as I’m sure a number of my peers in the industry have also. For any event I went to that intersected with Raja, it became almost an amusing meme that I’d ask for time with him. His (and his team’s) response was pretty understandable, as he has had ‘nothing more to add, officially’ for a while, given how deep his remit has been inside the company and how long these projects take to emerge.

This week Raja gave the keynote at Intel’s HPC DevCon event, a precursor to Supercomputing, and I did my usual thing of asking for the interview, fully expecting the same ‘not quite yet’ response. To my surprise, Intel agreed, and we spent the best part of an hour discussing his role at Intel, his work, and some of the finer details of the recent Xe-HPC, Ponte Vecchio, and Aurora announcements.

Raja Koduri is a well-known figure in the semiconductor space, having previously held high positions at Apple, driving new architectures at AMD, and now at Intel in charge of everything to do with Architecture. His particular focus has been on Software and the new GPU initiative, and he is ultimately one prong of Intel’s charge into the wider compute arena, covering everything from integrated graphics to discrete graphics and then onto compute graphics. This means hardware, but also Raja is spending a lot of time with software, and digging deep with Intel’s new oneAPI initiative, to develop an all-encompassing SDK that developers can use to write code across any number of elements in Intel’s hardware stack. Raja is just coming shy of two years since joining Intel, over which time he has given a number of presentations and been part of a number of announcements, but truth be told the specifics of his role, beyond the few elements in this paragraph, are still unknown to the wider community.

This week at Intel’s HPC Developer Conference (HPC DevCon, or just DevCon), the company lifted the lid on Intel’s first high-performance graphics implementation, called Ponte Vecchio (PVC for short). In the scale of Intel’s new Xe architecture for graphics, the company announced that it will have two microarchitectures for the wide graphics market (Xe--LP for low power/integrated solutions, and Xe-HP for high power/discrete solutions), and a single microarchitecture for the high performance compute market (Xe-HPC, which goes from discrete through to more complex packaging). Intel also showed diagrams of the chiplet design behind Ponte Vecchio, with EMIB, Foveros, a new ‘Rambo’ cache, HBM, and a new Xe memory fabric (Xe-MF) that drives efficient scaling of all these elements. There are still plenty of points to pick from the slides of the presentation, which we are currently working on. Another element to the talk was Intel’s new oneAPI industry initiative, with the announcement that oneAPI is available in beta form today and is part of Intel’s DevCloud infrastructure too.

We should point out that there were a few topics that Intel weren’t going to talk about, such as exact details about the new Ponte Vecchio design (finer details will be disclosed when Intel determines it is the right time, I was told) and we weren’t able to talk about Intel’s process technology. Intel’s recent disclosures or supply chain demands are not within Raja’s wheelhouse, so we’re not covering them here. I’d rather speak to the appropriate people directly about those topics.

With that being said, we’d like to thank Raja and his team for this opportunity.

Raja Koduri
SVP, Chief Architect at Intel
GM, Architecture, Graphics and Software
Dr. Ian Cutress
AnandTech, Senior Editor

 

The ‘Koduri’ Factor

Ian Cutress: You’ve now been at Intel for coming up to two years, with a large remit of covering the company’s ‘Architecture’. How would you characterize your role at Intel compared to what you’ve done at other companies? What makes Intel unique and special for you to be at the company at this phase of history?

Raja Koduri: I think the biggest thing is to put my journey, to Apple, to AMD, and to Intel, into context. It’s a cyclical thing that goes between wanting to make a difference at a scale, and wanting to be disruptive. At a ‘scale’ company, like an Apple, you have the ability to reach hundreds of millions of users, so everything you do gets amplified. There’s a certain amount of satisfaction in that, in having an impact, and ultimately at a scale company you learn a lot. With enough time at such a company, you develop a fundamental set of disruptive ideas. It’s hard to be disruptive at a ‘scale’ company, and easier at a small company where they can be more agile and more open to taking risks. When I joined AMD 7 years ago, it was at an inflection point –  they were hungry and ready to take risks and they provided me with a great opportunity to build new software and hardware architectures. They also provided me opportunities to learn about business and transforming company culture.

The next cycle for me was basically where I saw the technology going, and that fundamental technologies are needed. As both Jim and I have said on stage at various times, we want to drive more data closer to where it is needed – this is in itself a very interesting challenge and goes beyond just GPU architecture or CPU architecture. This problem touches every layer of technology in the industry, and Intel is one of the few companies that have the scale of technology of investments to pursue this dream. The breadth and depth of technology scale is HUGE at Intel and I saw Intel as an incredible learning opportunity – enough to keep me busy and excited for the rest of my life.

It’s great having all these smart people in the room, with no barriers, talking through how we’re going to drive 10x, 100x, or more on this stuff. There are very few companies in which you can go across the entire stack, from transistor to software.

IC: So far in your time at Intel, you and Jim Keller are often paired together as a team when it comes to disclosures and presenting to the media. In that context, Jim is often presented as the hardware person, and you’re presented as the software person. Are your roles really that straight cut?

RK: [chuckles] No, not really. You will have noticed that I’m increasingly focusing on the software, but ultimately my role in Architecture at Intel covers both. Architecture is often a misused word today – people usually use it in the context of microarchitecture, when in reality architecture defines a contract between the system and the developer - how something is built is not architecture, but how something interfaces with outside world and how something functions is architecture.

One of the most important aspects of architecture is the contract between hardware and software. I live very much in this hardware/software contract, the silicon platform contract, or in other words, the ecosystem – that is architecture. So my focus is on the ‘what & why’, and Jim focuses on ‘how & when’.  Both of these points are connected with each other, so we work very closely to drive our collective teams forward.

IC: Intel is well known for its process technology and manufacturing, and Dr. Murthy (Renduchintala, Raja’s boss) announced a couple of years ago that Intel was disaggregating its product portfolio with its manufacturing process. When we interviewed Jim, he said (and has said repeatedly since) that he’s not worried about the process, and in the past 30 years whenever people have had process problems they are always solved. Intel’s recent manufacturing issues are a known factor, but I’ll ask the same question I posted to Jim – how much to you get involved at the process and manufacturing level?

RK: Very much. It’s also a really interesting topic. Like I’ve said before, at Intel, we have such a strong connection between how we build our chips and how we manufacture them. There’s a historic way of how we set up the methodologies and then the processes and tools around that. At most companies you have to go with what the manufacturer tells you they can do – at Intel, we get to know everything. We’re in a position where we can get involved in the manufacturing, and that’s very different from the outside ecosystem.

This method does also have its pros and cons. One of the big things that Jim and I are working on is how to amplify those positives in that model, as well as diminish the negatives. There are products and IPs, like particularly CPU for example, that get a lot of benefit from collaboration. By contrast, at a fundamental level, there are graphics and other things that due to the design don’t need that intense level of customization – customizing at that level can actually get in the way of us executing fast relative to the outside industry.

 

Architecture and Microarchitecture

IC: One of the big questions around Xe being an architecture and multiple ‘micro-architectures’ being spun from it is to how many microarchitectures are being developed inside Intel? Before today we knew that Intel would have two, and today you put names to those, Xe-LP and Xe-HP, but also introduced Xe-HPC for the compute market. Is there a fourth?

RK: It was good to mention the three on stage – it is three, and truth be told I don’t have a fourth one. Maybe I’ll have a fourth one if the need arises, but that’s it – three covers the entire roadmap.

IC: With the three variants of Xe you’ve mentioned today, Xe--L, Xe--HP, and Xe-HPC, are the products built on these microarchitectures fundamentally tied to specific Intel process nodes?

RK: No. The IP can be ported to any process technology.

IC: Of the three microarchitectures, I imagine you being knee deep in all three, defining the designs, managing the teams, and executing. Can you discuss a little bit actually how deep you personally go into this?

RK: At Intel I say that from 7am to 7pm there is not a single moment where I can turn my neurons off. We have to get the architecture and the microarchitecture details right to have amazing products. Nobody inside or outside gave us a chance two years ago that we will get new things like GPUs done. They said we will take 5 years and we will lose interest two years in! Well, we’re two years in now and we have our first discrete GPUs powered on. It’s not possible to drive these things without being hands-on.

IC: One of the things about Xe and the graphics team is that Intel rehired a number of engineers that worked on the Larrabee product, the attempt at an x86 graphics architecture. Even though we got Xeon Phi from that project, the fact that Intel has rehired these engineers means that there are obviously things that Intel can relearn from them and that project. How does that filter into Xe?

RK: Great question. You know, Larrabee, as well as Xeon Phi, taught us great learnings for lots of verticals. That experience helped us particularly with how Intel managed HPC at the time, and how we have built Xe-HPC and Ponte Vecchio today. There are a lot of problems that Xeon Phi solved particularly in relation to memory, coherency, virtual memory, reliability, and all that. Having access to that knowledge is helping us build a product like Ponte Vecchio very quickly. The Ponte Vecchio DNA incudes Gen, Xeon, Xeon Phi and even Itanium learnings.

 

The Goals of oneAPI

IC: Intel is starting to open up about Xe, and it clearly wants to cover a large range of the market, all with the oneAPI stack. Can Xe and oneAPI be everything to everybody at the same time?

RK: That’s a great question. First off, when we look at the scale and reach of graphics, whether it’s integrated graphics where we have hundreds of millions of users, or discrete graphics going into the cloud, one of the central elements is software. What is the software that runs all of this stuff? We have sets of APIs like DirectX, OpenGL, OpenCL, and other languages – and we also have middleware, like game engines that sit on top of the stack. What we want is to be everywhere where there is a software presence.

I will summarize our strategy simply as ‘Leverage, Optimize, Scale’. We are leveraging our existing CPU and integrated software stack and integrated graphics IP. We have invested heavily to optimize our existing IP. The next step was to scale - for the high-end GPUs, we needed to scale over 1000x. As you saw from the Xe-HPC disclosures, that’s our vision of scale.

I think we have a very good strategy here, and that is not something by accident – I’m taking my 20 years of industry experience, plus the two from being here at Intel, and applying it here. The way we approached the Xe design is to take measured steps – we’ve already proved with having silicon in hand.

IC: One of the key things with high performance computing is to know your hardware. In order to extract every iota of performance, you have to know how big your caches are, where the latencies are, memory bandwidth, exact ALU structures, and ultimately build software that is rarely portable in order for it to be so performant. One of the features of oneAPI is to move away from this specifity by enabling software that can potentially work anywhere. How do you reconcile this desire for very specific optimizations and yet have a software package/SDK that is designed to help everybody?

RK: Again, a great question. Our key goal with oneAPI was that no developer should be left behind. Within that, we have worked very hard on the interfaces for what we call ‘ninja programmers’ – the low-level software developers that build the high-performance libraries that everyone else uses. We noticed that these ninja programmers have a strong non-linear impact on the ecosystem, so with our system programming layer inside oneAPI, and some of the abstractions available through oneAPI, will give these ninja programmers control of hardware resources at finer granularity.

IC: Obviously the key operating systems and markets are going to be Windows, Linux, and to a certain extent, iOS. We’ve seen software packages for HPC attack these operating systems very differently, so what is the oneAPI strategy here?

RK: Great question. One of the things we agonised quite a bit over is how we make oneAPI support be very good on Windows. We recognize that the developer footprint with our PCs is a key strength, and we want to enable developers’ access to this stuff easily, whatever PC they pick up. So we put a lot of work in and you’ll see us supporting Windows and Linux in there. For operating systems beyond those, such as iOS, Android, and Chrome, it’s more whether you have on-device support or access to oneAPI service through cloud. This is where our DevCloud strategy, where developers can use oneAPI in the cloud, will come in.

The other thing to say is that with oneAPI, the version we shipped today is the beginning of a long journey. Solving this problem and building the stacks, building the services, will take time. Many innovations are in the pipeline, and this is why God invented something called version 2 and 3!

IC: Part of oneAPI, as you mentioned today on stage, is the ability to translate CUDA code to the oneAPI infrastructure. You’ve been at one company in the past that previously attempted to provide translation tools for its hardware, with varying degrees of success. What can Intel do here differently to make it succeed at scale?

RK: Great question. Portability of code between different parallel architectures has never been easy. There are often key fundamental differences between them, and a particular one is vector width. You can’t take a program that is optimized for a smaller vector width and make it efficient on a machine that has a larger vector width without refactoring the code and all that.

The Xe architecture is actually a narrower width machine - the variable vector width that we have and the ability to switch between SIMT mode and SIMD mode and combine them gives the software guys lots of tools to do more. Now having said that, the tools will take some time to mature. What we are seeing today is that we’re being more productive than prior attempts in the industry. We are also putting the software out ahead of the hardware for productive performance enablement.

 

Gaming on Xe

IC: Turning to gaming solutions, because there is a lot of interest in how Intel is going to attack the gaming space: what we’ve seen today is a compute GPU based on chiplets. Moving from a monolithic graphics chip to a chiplet design is a tough paradigm to solve, so does working on chiplets help solve the ‘multi-GPU’ issue on graphics? Is the future of graphics still consigned to single GPU, or should we expect multiple GPU scaling easier to manage?

RK: That’s a great question. As you know, solving the multi-GPU problem is tough – it has been part of my pursuits for almost 15 years. I’m excited, especially now, because multiple things are happening. As you know, the software aspect of multi-GPU was the biggest problem, and getting compatibility across applications was tough. So things like chiplets, and the amount of bandwidth now going on between GPUs, and other things makes it a more exciting task for the industry to take a second attempt.  I think due to these continual advances, as well as new paradigms, we are getting closer to solving this problem. Chiplets and advancement of interconnect will be a great boost on the hardware side. The other big problem is software architecture. With many interesting cloud-based GPU efforts, I am optimistic that we will solve the software problems as well.

IC: Narrowing the scope down to discrete gaming GPUs, how is Intel going to approach those driver stacks with Xe?

RK: The system programming layer is a key difference between operating systems. The rest of the layers are largely OS independent. So we have a good development strategy here.

IC: With the GPU team, particularly the GPU marketing team, we’ve seen Intel pull in industry talent from a wide variety of sources, such as competitors, analysts, and even some of my former media peers. We’re seeing a strong commitment from Intel for this community, and building excitement for future Intel graphics solutions. To what degree, being in charge of graphics at Intel, are you pushing them ahead with that excitement, or are you telling them to reel it in?

RK: We have incubated a discrete GPU business unit at Intel, run by Ari Rauch. There was a lot of excitement when we announced our graphics ambition, and that attracted a lot of GPU talent from the industry, including for our marketing efforts. They have been doing a good job building up connections with gaming community and leveraging their feedback. My guidance to them always is to ’reel it in’ until we have products! But we will be geared to enable developers and the wider community with our marketing outreach.

IC: Have you discussed how the eventual discrete graphics launch is going to happen?

RK: Not really. We are so much focused on execution right now. But I will tell you a funny story about ’Ponte Vecchio’ name. At Intel we have a policy for engineering code names to places or things you can find on a map. We have had too many ’lakes’ and I wanted to do bridges. Wanted to pick a place that I don’t mind going to for a launch! Florence in Italy has some of best Gelato in the world. And I love Florence and the art and architecture there as well.

 

The Future of Gen Graphics

IC: Is Xe anything like Gen at a fundamental level?

RK: At the heart of Xe, you will find many Gen features. A big part of our decision making as we move forward is that the industry underestimates how long it takes to write a compiler for a new architecture. The Gen compiler has been with us, and has been continually improved, for years and years, so there is plenty of knowledge in there.  It is impressive how much performance there is in Gen, especially in performance density. So we preserved lots of the good elements of Gen, but we had to get an order of magnitude increase in performance. The key for us is to leverage decades of our software investment – compilers, drivers, libraries etc. So, we maintained Gen features that help on software.

IC: As Xe pushes on and products come out, will Intel continue to develop Gen as a separate architecture line?

RK: All of our GPU teams are working on variants of the Xe architecture at the moment. We don’t see a reason for Gen anymore – Xe-LP, our low powered variant, covers the market that Gen covered.

 

The Future and The Vision of Xe

IC: You have been in the GPU space a long time. Is there anything definitive that you can say that Xe will bring to the table that hasn’t been seen before?

RK: In a word, vision. The ‘exascale for everyone’ vision. Solving this requires fundamental disruptions in all layers of technology stack. And I think we have taken a big step towards that with Ponte Vecchio. When I look at our path ahead, I think about how we make that happen. That for me is the foundation of Xe, relative to how the rest of the industry is thinking about things, and all the problems we’ve discussed today: distributed memory problems, distributed computing problems, and computing at scale problems are all essential things in our vision.

IC: Is there anything that the industry should know about Xe that it doesn’t spend enough time thinking about?

RK: It’s a question of scale. I think the impact of 200 million PCs with integrated graphics, moving to Xe, with more performance and better efficiency, is something I don’t see much of the industry appreciating. Intel’s reach and leverage means that a small change can make a big difference – here we are making a big change, and it’s going to have a knock-on effect. It’s a big deal.

 

Many thanks to Raja and his team for their time.

POST A COMMENT

71 Comments

View All Comments

  • wishgranter - Wednesday, November 20, 2019 - link

    Woow, waited on something like this long explanation all the time... THANX for summarizing the problematic for us. Reply
  • Arsenica - Wednesday, November 20, 2019 - link

    Patents expire, by 2021 any patent filed before 2001 will expire (so basic patents such as US7,548,238 will be public domain).

    And besides that, Intel has hardly been sitting idle regarding graphics IP. For example, out of all post 2001 granted US patents on the classification G06T15/80 9.2% belong to Nvidia and 8.9% belong to Intel. So effectively both companies have comparable IP portfolios, which kind-of invalidates your point isn't it?
    Reply
  • mdriftmeyer - Saturday, November 23, 2019 - link

    Patents get renewed or did you think for some absurd reason they never seem to disappear but get amended? Reply
  • Irata - Thursday, November 21, 2019 - link

    Nice and very long post. Thanks !
    I think the "Let's go for it and drag it out in court if necessary" approach is probably what Intel will go for. It is no coincidence they hired employees with technical knowledge away from their competition.

    And sadly, I also agree on the AMD being naive part (generally speaking).
    Reply
  • JayNor - Thursday, November 21, 2019 - link

    Raja is apparently claiming a big increase in performance from Xe, although it isn't clear if that claim is wrt hpc ai and data analysis.

    Tiger Lake integrated graphics is the first Xe architecture GPU, so we won't have to wait for the HPC Xe to get more info on the Xe architecture.

    "So we preserved lots of the good elements of Gen, but we had to get an order of magnitude increase in performance."
    Reply
  • peevee - Friday, November 22, 2019 - link

    "The market's graphics are already patented from top to bottom for every fundamental efficient principle of drawing/rendering graphics since ages"

    Exactly, ages. You know that patents expire, right?
    Reply
  • mdriftmeyer - Saturday, November 23, 2019 - link

    Do some research, patents get augmented/modified and tied to new revisions. They continue indefinitely when possible. Graphics IP/CPU IP they don't just expire. Reply
  • mdriftmeyer - Saturday, November 23, 2019 - link

    An actual comment worthy of article status which drastically overshadows the fluff pieces Ian seems to have for Raja--the master bs artist extraordinaire. Reply
  • JayNor - Saturday, December 07, 2019 - link

    I thought AMD manufactured the gpu chiplets for the Kaby Lake G, and Intel just connected them up via a local pcie interconnect. How does Intel swipe any IP from that relationship?

    https://spectrum.ieee.org/tech-talk/semiconductors...
    Reply
  • Smartcom5 - Wednesday, November 20, 2019 - link

    Intel hasn't fabrication issues, but first and foremost design-talent issues. Get that straight into your head! All the flaws since ages and well over a decade are resulting from talent-issues being only majorly influenced by rogue-flavoured carelessness.

    The claim that the reason for Intel's major flaws are solely rooted by its products' very market-share and prevalence rate, is just what it always was: Some flimsy excuse which aims to negligently downplay the next security-flaw's vulnerability.

    For instance, their Hyper-Threading was found out to be effectively pretty close to broken and rather inefficient from the very beginning of its implementation even back then. It was also pretty clear and well-known (not only) to Intel. They just didn't cared about it, as cash swept in.

    Also, and as pointed out countless times, Intel was a) very well aware of the issues and flaws their implementations might bring in anytime in the future and b) independent and third-party security-researchers fairly shortly after their implementation at Intel warned them about it. Intel ignored them deliberately! They gave NIL fucks.

    𝑱𝒖𝒔𝒕 𝒇𝒐𝒓 𝒖𝒏𝒅𝒆𝒓𝒔𝒕𝒂𝒏𝒅𝒊𝒏𝒈 …
    E.g. the explicit security gap or -flaw Meltdown is not new, not even a tad. Anyone who claims the contrary – in contempt of glaring sources stating and proofing the exact opposite – either (hopefully) doesn't know it any better or deliberately and wilfully suppresses these facts.

    The fact that everyone got surprised by the danger of such risks all of a sudden and was hit completely unprepared doesn't even correspond to the facts one bit, not even slightly. The whole topic, respective theoretical rudiments and so forth are and were some hotly debated topic since years within the security industry or among processor experts respectively.

    Heck, the very basics for timed- and thus side channel attacks were developed back in 1992 and have been repeatedly explained/elucidated by security experts ever since. Just because such methods and attack vectors – while being known since many years – were only used 'publicly' in '17, doesn't mean they weren't used under the radar for many years prior to that date.

    … and yes, especially the style of handling the caches the way they were used explicitly by Intel was not only known but also a frequently discussed crux and central subject-matter of security researches. This means that, as a collective within the industry (of chip-engineering) you were very well aware of given respective - at least theoretically - highly safety-critical exploits – and this was already brought up towards Intel some time ago, more than once.

    𝐾𝑒𝑦𝑤𝑜𝑟𝑑 ‚𝑹𝒊𝒔𝒌 𝒎𝒂𝒏𝒂𝒈𝒆𝒎𝒆𝒏𝒕‘
    ... and yes, Intel always considered these attack-scenarios to be too insignificant and such resulting speed advantages as too severe in order to drop them – in favour of thereby increased security. If I recall correctly, the topic is almost as old as the given Intel'ian implementation in those same processors. If I remember correctly, at least since '06 it has been considered se-ri-ous-ly critical how Intel addresses or manages their caches. Intel knew that and ignored it.

    𝑩𝒍𝒂𝒄𝒌 𝑯𝒂𝒕 𝑩𝒓𝒊𝒆𝒇𝒊𝒏𝒈𝒔
    At the very latest '16 such issues resulting eventually in Meltdown (or at least parts of it) were actually brought up again being made public while being a major agenda item and got openly discussed in great detail at the well-known Blackhat '16 on 3rd and 4th of August that year – while the very same subject was at least broached at the same security conference in '14. Wasn't it already known even before that?

    Reading:
    Joseph Sharkey, Ph.D. – Siege Technologies: „Breaking Hardware-Enforced Security with Hypervisors“
    Yeongjin Jang et al. – „Breaking Kernel Address Space Layout Randomization with Intel TSX“
    John Harrison - Formal Verification at Intel – Katholieke Universiteit Nijmegen, 21 June 2002
    John Harrison - Formal Methods at Intel: An Overview – Second NASA Formal Methods Symposium, Washington DC, 14 April 2010

    𝒕𝒍;𝒅𝒓: Intel (and some prime employees) knew, at least from 2002 onwards, about the potential risk.
    They gave no fucks.
    Reply

Log in

Don't have an account? Sign up now