Ever since Raja Koduri left AMD and joined Intel, I have been continuously asking for a 1-on-1 interview, as I’m sure a number of my peers in the industry have also. For any event I went to that intersected with Raja, it became almost an amusing meme that I’d ask for time with him. His (and his team’s) response was pretty understandable, as he has had ‘nothing more to add, officially’ for a while, given how deep his remit has been inside the company and how long these projects take to emerge.

This week Raja gave the keynote at Intel’s HPC DevCon event, a precursor to Supercomputing, and I did my usual thing of asking for the interview, fully expecting the same ‘not quite yet’ response. To my surprise, Intel agreed, and we spent the best part of an hour discussing his role at Intel, his work, and some of the finer details of the recent Xe-HPC, Ponte Vecchio, and Aurora announcements.

Raja Koduri is a well-known figure in the semiconductor space, having previously held high positions at Apple, driving new architectures at AMD, and now at Intel in charge of everything to do with Architecture. His particular focus has been on Software and the new GPU initiative, and he is ultimately one prong of Intel’s charge into the wider compute arena, covering everything from integrated graphics to discrete graphics and then onto compute graphics. This means hardware, but also Raja is spending a lot of time with software, and digging deep with Intel’s new oneAPI initiative, to develop an all-encompassing SDK that developers can use to write code across any number of elements in Intel’s hardware stack. Raja is just coming shy of two years since joining Intel, over which time he has given a number of presentations and been part of a number of announcements, but truth be told the specifics of his role, beyond the few elements in this paragraph, are still unknown to the wider community.

This week at Intel’s HPC Developer Conference (HPC DevCon, or just DevCon), the company lifted the lid on Intel’s first high-performance graphics implementation, called Ponte Vecchio (PVC for short). In the scale of Intel’s new Xe architecture for graphics, the company announced that it will have two microarchitectures for the wide graphics market (Xe--LP for low power/integrated solutions, and Xe-HP for high power/discrete solutions), and a single microarchitecture for the high performance compute market (Xe-HPC, which goes from discrete through to more complex packaging). Intel also showed diagrams of the chiplet design behind Ponte Vecchio, with EMIB, Foveros, a new ‘Rambo’ cache, HBM, and a new Xe memory fabric (Xe-MF) that drives efficient scaling of all these elements. There are still plenty of points to pick from the slides of the presentation, which we are currently working on. Another element to the talk was Intel’s new oneAPI industry initiative, with the announcement that oneAPI is available in beta form today and is part of Intel’s DevCloud infrastructure too.

We should point out that there were a few topics that Intel weren’t going to talk about, such as exact details about the new Ponte Vecchio design (finer details will be disclosed when Intel determines it is the right time, I was told) and we weren’t able to talk about Intel’s process technology. Intel’s recent disclosures or supply chain demands are not within Raja’s wheelhouse, so we’re not covering them here. I’d rather speak to the appropriate people directly about those topics.

With that being said, we’d like to thank Raja and his team for this opportunity.

Raja Koduri
SVP, Chief Architect at Intel
GM, Architecture, Graphics and Software
Dr. Ian Cutress
AnandTech, Senior Editor

 

The ‘Koduri’ Factor

Ian Cutress: You’ve now been at Intel for coming up to two years, with a large remit of covering the company’s ‘Architecture’. How would you characterize your role at Intel compared to what you’ve done at other companies? What makes Intel unique and special for you to be at the company at this phase of history?

Raja Koduri: I think the biggest thing is to put my journey, to Apple, to AMD, and to Intel, into context. It’s a cyclical thing that goes between wanting to make a difference at a scale, and wanting to be disruptive. At a ‘scale’ company, like an Apple, you have the ability to reach hundreds of millions of users, so everything you do gets amplified. There’s a certain amount of satisfaction in that, in having an impact, and ultimately at a scale company you learn a lot. With enough time at such a company, you develop a fundamental set of disruptive ideas. It’s hard to be disruptive at a ‘scale’ company, and easier at a small company where they can be more agile and more open to taking risks. When I joined AMD 7 years ago, it was at an inflection point –  they were hungry and ready to take risks and they provided me with a great opportunity to build new software and hardware architectures. They also provided me opportunities to learn about business and transforming company culture.

The next cycle for me was basically where I saw the technology going, and that fundamental technologies are needed. As both Jim and I have said on stage at various times, we want to drive more data closer to where it is needed – this is in itself a very interesting challenge and goes beyond just GPU architecture or CPU architecture. This problem touches every layer of technology in the industry, and Intel is one of the few companies that have the scale of technology of investments to pursue this dream. The breadth and depth of technology scale is HUGE at Intel and I saw Intel as an incredible learning opportunity – enough to keep me busy and excited for the rest of my life.

It’s great having all these smart people in the room, with no barriers, talking through how we’re going to drive 10x, 100x, or more on this stuff. There are very few companies in which you can go across the entire stack, from transistor to software.

IC: So far in your time at Intel, you and Jim Keller are often paired together as a team when it comes to disclosures and presenting to the media. In that context, Jim is often presented as the hardware person, and you’re presented as the software person. Are your roles really that straight cut?

RK: [chuckles] No, not really. You will have noticed that I’m increasingly focusing on the software, but ultimately my role in Architecture at Intel covers both. Architecture is often a misused word today – people usually use it in the context of microarchitecture, when in reality architecture defines a contract between the system and the developer - how something is built is not architecture, but how something interfaces with outside world and how something functions is architecture.

One of the most important aspects of architecture is the contract between hardware and software. I live very much in this hardware/software contract, the silicon platform contract, or in other words, the ecosystem – that is architecture. So my focus is on the ‘what & why’, and Jim focuses on ‘how & when’.  Both of these points are connected with each other, so we work very closely to drive our collective teams forward.

IC: Intel is well known for its process technology and manufacturing, and Dr. Murthy (Renduchintala, Raja’s boss) announced a couple of years ago that Intel was disaggregating its product portfolio with its manufacturing process. When we interviewed Jim, he said (and has said repeatedly since) that he’s not worried about the process, and in the past 30 years whenever people have had process problems they are always solved. Intel’s recent manufacturing issues are a known factor, but I’ll ask the same question I posted to Jim – how much to you get involved at the process and manufacturing level?

RK: Very much. It’s also a really interesting topic. Like I’ve said before, at Intel, we have such a strong connection between how we build our chips and how we manufacture them. There’s a historic way of how we set up the methodologies and then the processes and tools around that. At most companies you have to go with what the manufacturer tells you they can do – at Intel, we get to know everything. We’re in a position where we can get involved in the manufacturing, and that’s very different from the outside ecosystem.

This method does also have its pros and cons. One of the big things that Jim and I are working on is how to amplify those positives in that model, as well as diminish the negatives. There are products and IPs, like particularly CPU for example, that get a lot of benefit from collaboration. By contrast, at a fundamental level, there are graphics and other things that due to the design don’t need that intense level of customization – customizing at that level can actually get in the way of us executing fast relative to the outside industry.

 

Architecture and Microarchitecture

IC: One of the big questions around Xe being an architecture and multiple ‘micro-architectures’ being spun from it is to how many microarchitectures are being developed inside Intel? Before today we knew that Intel would have two, and today you put names to those, Xe-LP and Xe-HP, but also introduced Xe-HPC for the compute market. Is there a fourth?

RK: It was good to mention the three on stage – it is three, and truth be told I don’t have a fourth one. Maybe I’ll have a fourth one if the need arises, but that’s it – three covers the entire roadmap.

IC: With the three variants of Xe you’ve mentioned today, Xe--L, Xe--HP, and Xe-HPC, are the products built on these microarchitectures fundamentally tied to specific Intel process nodes?

RK: No. The IP can be ported to any process technology.

IC: Of the three microarchitectures, I imagine you being knee deep in all three, defining the designs, managing the teams, and executing. Can you discuss a little bit actually how deep you personally go into this?

RK: At Intel I say that from 7am to 7pm there is not a single moment where I can turn my neurons off. We have to get the architecture and the microarchitecture details right to have amazing products. Nobody inside or outside gave us a chance two years ago that we will get new things like GPUs done. They said we will take 5 years and we will lose interest two years in! Well, we’re two years in now and we have our first discrete GPUs powered on. It’s not possible to drive these things without being hands-on.

IC: One of the things about Xe and the graphics team is that Intel rehired a number of engineers that worked on the Larrabee product, the attempt at an x86 graphics architecture. Even though we got Xeon Phi from that project, the fact that Intel has rehired these engineers means that there are obviously things that Intel can relearn from them and that project. How does that filter into Xe?

RK: Great question. You know, Larrabee, as well as Xeon Phi, taught us great learnings for lots of verticals. That experience helped us particularly with how Intel managed HPC at the time, and how we have built Xe-HPC and Ponte Vecchio today. There are a lot of problems that Xeon Phi solved particularly in relation to memory, coherency, virtual memory, reliability, and all that. Having access to that knowledge is helping us build a product like Ponte Vecchio very quickly. The Ponte Vecchio DNA incudes Gen, Xeon, Xeon Phi and even Itanium learnings.

 

The Goals of oneAPI

IC: Intel is starting to open up about Xe, and it clearly wants to cover a large range of the market, all with the oneAPI stack. Can Xe and oneAPI be everything to everybody at the same time?

RK: That’s a great question. First off, when we look at the scale and reach of graphics, whether it’s integrated graphics where we have hundreds of millions of users, or discrete graphics going into the cloud, one of the central elements is software. What is the software that runs all of this stuff? We have sets of APIs like DirectX, OpenGL, OpenCL, and other languages – and we also have middleware, like game engines that sit on top of the stack. What we want is to be everywhere where there is a software presence.

I will summarize our strategy simply as ‘Leverage, Optimize, Scale’. We are leveraging our existing CPU and integrated software stack and integrated graphics IP. We have invested heavily to optimize our existing IP. The next step was to scale - for the high-end GPUs, we needed to scale over 1000x. As you saw from the Xe-HPC disclosures, that’s our vision of scale.

I think we have a very good strategy here, and that is not something by accident – I’m taking my 20 years of industry experience, plus the two from being here at Intel, and applying it here. The way we approached the Xe design is to take measured steps – we’ve already proved with having silicon in hand.

IC: One of the key things with high performance computing is to know your hardware. In order to extract every iota of performance, you have to know how big your caches are, where the latencies are, memory bandwidth, exact ALU structures, and ultimately build software that is rarely portable in order for it to be so performant. One of the features of oneAPI is to move away from this specifity by enabling software that can potentially work anywhere. How do you reconcile this desire for very specific optimizations and yet have a software package/SDK that is designed to help everybody?

RK: Again, a great question. Our key goal with oneAPI was that no developer should be left behind. Within that, we have worked very hard on the interfaces for what we call ‘ninja programmers’ – the low-level software developers that build the high-performance libraries that everyone else uses. We noticed that these ninja programmers have a strong non-linear impact on the ecosystem, so with our system programming layer inside oneAPI, and some of the abstractions available through oneAPI, will give these ninja programmers control of hardware resources at finer granularity.

IC: Obviously the key operating systems and markets are going to be Windows, Linux, and to a certain extent, iOS. We’ve seen software packages for HPC attack these operating systems very differently, so what is the oneAPI strategy here?

RK: Great question. One of the things we agonised quite a bit over is how we make oneAPI support be very good on Windows. We recognize that the developer footprint with our PCs is a key strength, and we want to enable developers’ access to this stuff easily, whatever PC they pick up. So we put a lot of work in and you’ll see us supporting Windows and Linux in there. For operating systems beyond those, such as iOS, Android, and Chrome, it’s more whether you have on-device support or access to oneAPI service through cloud. This is where our DevCloud strategy, where developers can use oneAPI in the cloud, will come in.

The other thing to say is that with oneAPI, the version we shipped today is the beginning of a long journey. Solving this problem and building the stacks, building the services, will take time. Many innovations are in the pipeline, and this is why God invented something called version 2 and 3!

IC: Part of oneAPI, as you mentioned today on stage, is the ability to translate CUDA code to the oneAPI infrastructure. You’ve been at one company in the past that previously attempted to provide translation tools for its hardware, with varying degrees of success. What can Intel do here differently to make it succeed at scale?

RK: Great question. Portability of code between different parallel architectures has never been easy. There are often key fundamental differences between them, and a particular one is vector width. You can’t take a program that is optimized for a smaller vector width and make it efficient on a machine that has a larger vector width without refactoring the code and all that.

The Xe architecture is actually a narrower width machine - the variable vector width that we have and the ability to switch between SIMT mode and SIMD mode and combine them gives the software guys lots of tools to do more. Now having said that, the tools will take some time to mature. What we are seeing today is that we’re being more productive than prior attempts in the industry. We are also putting the software out ahead of the hardware for productive performance enablement.

 

Gaming on Xe

IC: Turning to gaming solutions, because there is a lot of interest in how Intel is going to attack the gaming space: what we’ve seen today is a compute GPU based on chiplets. Moving from a monolithic graphics chip to a chiplet design is a tough paradigm to solve, so does working on chiplets help solve the ‘multi-GPU’ issue on graphics? Is the future of graphics still consigned to single GPU, or should we expect multiple GPU scaling easier to manage?

RK: That’s a great question. As you know, solving the multi-GPU problem is tough – it has been part of my pursuits for almost 15 years. I’m excited, especially now, because multiple things are happening. As you know, the software aspect of multi-GPU was the biggest problem, and getting compatibility across applications was tough. So things like chiplets, and the amount of bandwidth now going on between GPUs, and other things makes it a more exciting task for the industry to take a second attempt.  I think due to these continual advances, as well as new paradigms, we are getting closer to solving this problem. Chiplets and advancement of interconnect will be a great boost on the hardware side. The other big problem is software architecture. With many interesting cloud-based GPU efforts, I am optimistic that we will solve the software problems as well.

IC: Narrowing the scope down to discrete gaming GPUs, how is Intel going to approach those driver stacks with Xe?

RK: The system programming layer is a key difference between operating systems. The rest of the layers are largely OS independent. So we have a good development strategy here.

IC: With the GPU team, particularly the GPU marketing team, we’ve seen Intel pull in industry talent from a wide variety of sources, such as competitors, analysts, and even some of my former media peers. We’re seeing a strong commitment from Intel for this community, and building excitement for future Intel graphics solutions. To what degree, being in charge of graphics at Intel, are you pushing them ahead with that excitement, or are you telling them to reel it in?

RK: We have incubated a discrete GPU business unit at Intel, run by Ari Rauch. There was a lot of excitement when we announced our graphics ambition, and that attracted a lot of GPU talent from the industry, including for our marketing efforts. They have been doing a good job building up connections with gaming community and leveraging their feedback. My guidance to them always is to ’reel it in’ until we have products! But we will be geared to enable developers and the wider community with our marketing outreach.

IC: Have you discussed how the eventual discrete graphics launch is going to happen?

RK: Not really. We are so much focused on execution right now. But I will tell you a funny story about ’Ponte Vecchio’ name. At Intel we have a policy for engineering code names to places or things you can find on a map. We have had too many ’lakes’ and I wanted to do bridges. Wanted to pick a place that I don’t mind going to for a launch! Florence in Italy has some of best Gelato in the world. And I love Florence and the art and architecture there as well.

 

The Future of Gen Graphics

IC: Is Xe anything like Gen at a fundamental level?

RK: At the heart of Xe, you will find many Gen features. A big part of our decision making as we move forward is that the industry underestimates how long it takes to write a compiler for a new architecture. The Gen compiler has been with us, and has been continually improved, for years and years, so there is plenty of knowledge in there.  It is impressive how much performance there is in Gen, especially in performance density. So we preserved lots of the good elements of Gen, but we had to get an order of magnitude increase in performance. The key for us is to leverage decades of our software investment – compilers, drivers, libraries etc. So, we maintained Gen features that help on software.

IC: As Xe pushes on and products come out, will Intel continue to develop Gen as a separate architecture line?

RK: All of our GPU teams are working on variants of the Xe architecture at the moment. We don’t see a reason for Gen anymore – Xe-LP, our low powered variant, covers the market that Gen covered.

 

The Future and The Vision of Xe

IC: You have been in the GPU space a long time. Is there anything definitive that you can say that Xe will bring to the table that hasn’t been seen before?

RK: In a word, vision. The ‘exascale for everyone’ vision. Solving this requires fundamental disruptions in all layers of technology stack. And I think we have taken a big step towards that with Ponte Vecchio. When I look at our path ahead, I think about how we make that happen. That for me is the foundation of Xe, relative to how the rest of the industry is thinking about things, and all the problems we’ve discussed today: distributed memory problems, distributed computing problems, and computing at scale problems are all essential things in our vision.

IC: Is there anything that the industry should know about Xe that it doesn’t spend enough time thinking about?

RK: It’s a question of scale. I think the impact of 200 million PCs with integrated graphics, moving to Xe, with more performance and better efficiency, is something I don’t see much of the industry appreciating. Intel’s reach and leverage means that a small change can make a big difference – here we are making a big change, and it’s going to have a knock-on effect. It’s a big deal.

 

Many thanks to Raja and his team for their time.

POST A COMMENT

71 Comments

View All Comments

  • Rudde - Thursday, November 21, 2019 - link

    I read the 2002/2003 source, but I didn't find anything about sidechannel, hyperthreading nor cache. What did I miss? What were you trying to prove? Reply
  • DigitalFreak - Thursday, November 21, 2019 - link

    Somebody's figured out how to do italics. Reply
  • Rudde - Thursday, November 21, 2019 - link

    The italic letters show up as boxes for me (on mobile). Reply
  • GreenReaper - Friday, November 22, 2019 - link

    There's lots of Unicode options out there. For example, here's Mathematical Italic Small T: 𝑡
    Of course you may need to have a font that implements it on your device as well.
    Just because a codepoint existed since March 2001 doesn't mean it's been used.
    Reply
  • mdriftmeyer - Saturday, November 23, 2019 - link

    Colleagues of mine would beg to differ with your comments on fab issues. They know they have issues. Reply
  • WaltC - Wednesday, November 20, 2019 - link

    It takes thousands of engineers and other employees to bring new mass-market computing products like GPUs to fruition, and I've always found it amusing that when Intel grabs a couple of AMD's people that the journalist tech-world immediately sees something huge in the deal (or when AMD grabs high-profile employees from other companies...;)) It got ridiculous of late when Intel hired some very nice employee from AMD's PR department...;) Indeed, it got the "Intel grabs another one from AMD" treatment. It was pretty funny, I thought. In this case, however, my thoughts about Raja's departure have been along the lines that he simply had no more to give to AMD--that he'd reached his zenith and knew that the upcoming expectations were likely to be a bit beyond his reach. More or less, I just think he'd "shot his load" so to speak in advanced GPU tech @ AMD and preferred the notion of getting in with Intel on a basic, new product line not nearly so challenging as where AMD was headed. Intel's earlier foray into 3d-compatible discrete GPUs--the horrid i7xx series that looked to AGP texturing for performance that was never there--while both nVidia and 3dfx at the time were using the almost indescribably faster local bus ram from which to texture (it's still being used today, of course!) So Intel got into that market by buying out "Real 3d" (I think it was) and then Intel very quickly abandoned that market as it was obvious their products were not competitive--not even a little bit. I believe I bought (and returned) at least three of them, myself. Talking about hatred and so on--my thoughts about Intel generating bad vibes had to do mostly with how it behaved itself as a company 20-25 years ago--Intel worked very hard to monopolize the markets and to crush every would-be competitor with highly unethical market behavior. Indeed, the *only* CPU company to actually succeed in dodging the Intel corporate axe--was AMD--and the company AMD merged with early on--can't recall the name. But anyway--that explains Intel's bad reputation in certain quarters, I think. And last here--yeeeeman--try and come up with something different from a comparison with AMD's Zen security holes that *haven't yet been found* and the plethora of Intel's vulnerabilities that *have been found*--indeed it seems like more are found every week! It's just not a convincing argument...;) Reply
  • Irata - Thursday, November 21, 2019 - link

    I almost got a PC with an i7xx GPU in it back when they were released. Ads had benchmarks results that showed them far ahead of competing products in the same price range. Thankfully, I did some more checking with friends and on the web (not as easy as it is now).

    I honestly cannot say what I ended up with, but thankfully it wasn't i7xx.
    Reply
  • WaltC - Sunday, November 24, 2019 - link

    Yes! In those days my thing was buying through a local Best Buy because of its 14-day return policy on hardware! That's why I kept buying them--but wound up keeping none of them. I think I must have come close to buying at least one of every 3d accelerator made in those days--at least that looked interesting enough to try and which I could find on a BB shelf.....;) None of the i7xx GPUs were competitive with what 3dfx and nVidia were selling in those days--especially 3dfx. IIRC, the most local ram any of them had was 8 MBs...;) As soon as the GPU needed to reach out through the AGP bus to access the dog-slow system ram for texturing--bam--performance just died--slide-show city, instantly. And that was all Intel wrote on discreet 3d accelerators, and has remained in the iGPU markets ever since. With the ~25-year lead both nVidia and AMD have on Intel in the discrete 3d API GPU product space, I'd say it will be many years from now before Intel can bring something competitive to market--maybe, if they don't drop out of that market again, a la the i7xx debacle. CPUs are enormously different from discrete GPUs, the markets, the manufacturing, and all connected, of course, and I'm not sure that Intel has the drive of ATi/AMD or nVidia in their discrete GPU market. I only know that if Intel once again tries to tie its discreet GPUs to its chipsets and its DDR/5 system--thinking that PCIe5 will be "fast enough" then they'll fail with it, just like they when they made their i7xx GPUs *dependent* on system ram and system-wide buses. Reply
  • Smell This - Sunday, November 24, 2019 - link


    If I recall, the 'guts' of the discreet i740 became the basis of the original integrated graphics chipset __ the Intel i810 (which quickly became the i815 chipset - for Socket A and 'old' Slot-1 processors) ...
    Reply
  • willis936 - Thursday, November 21, 2019 - link

    No one is upset about the discovery of security issues. They are upset that even when given overly generous timeframes to patch the security flaws they still ask for egregious extensions. The universities should adopt the project zero model and give every exploit 90 days full stop no exceptions. Reply

Log in

Don't have an account? Sign up now