Xilinx Announces Project Everest: The 7nm FPGA SoC Hybrid

Name: Xilinx Announces Project Everest: The 7nm FPGA SoC Hybrid
Item: Xilinx Announces Project Everest: The 7nm FPGA SoC Hybrid
Author: Dr. Ian Cutress

by Ian Cutress on March 19, 2018 7:40 AM EST

16 Comments | Add A Comment

16 Comments

Interview with Victor Peng, CEO of Xilinx

Despite only being the job for a little over a month, Mr. Peng is well positioned at the helm of Xilinx’s future. He has been at the company for over ten years, having previously been COO (managing Global Sales, Product and Vertical Marketing, Product Development and Global Operations) and EVP/GM of Products. His background is in engineering, holding four US patents and a Masters’ from Cornell, completing the trifecta of research, sales, and marketing that typically a CEO requires in their toolset. Prior to Xilinx, Mr. Peng was Corporate VP of the Graphics Products Group for silicon engineering at AMD, as well as having spent time at MIPS and SGI.

Ian Cutress: Congratulations on the CEO position! What has it been like for the first month? Have you had to hit the ground running?

Victor Peng: Oh thank you, I appreciate it! It has been pretty busy, but it is as expected I suppose. It is very exciting! It is busy for a few reasons, as this is our 4th quarter, so we are not only getting ready to finish the fiscal year, but also we are planning all the budgeting (laughs). It is just a busy quarter in general. We have a new Sales VP who just started yesterday, so I've actually been running sales and I've been wearing two hats so it's been pretty busy. Starting to talk about the new vision and strategy is also part of the journey, but it is all good.

IC: Is the new strategy an opportunity to put your stamp on the role?

VP: It is, but I don't think of it that way. I have this steward model of the leadership – Moshi [Gavrielov, former CEO] took the company to a great place and we have this great foundation that we've built up and all these strengths that we have. You know obviously I was a part of that too in terms of when I was on the product side, but I just think that this is the right time, both from the kind of the things that we've built up internally in terms of products and technology, but also from the industry perspective like we discussed, the nature of the workloads that are going to be driving computing from the end to end. It is fundamentally different than what it was a few years ago, for example, somebody asked the question could you have done an ACAP prior to 7nm. Well we could, it would be in certain respects maybe a little less powerful, more costly perhaps, but the kind of the coming together of all those things together at 7nm makes this just the right time for the company to take this more quantum leap. It's not about legacy, but of course it also remains to be seen how long I will be CEO!

IC: The march into the data center is a story that a number of companies have been saying over the past few years. So the topic of the day is obviously the new hardware that is still a year away - what makes now the right time to discuss the ACAP and the approach into the data center?

VP: From a compute perspective we are already in the data center - Amazon today deploys our 16nm products, and some others even deploy 20nm, but most of it for now is 16nm and I think 16nm will certainly continue for the next couple of years as we bring in our 7nm product. The last year was the first year we really made very significant progress in the data center, not only because that was the year that FaaS [FPGA as a Service] got announced and deployed, but because of other engagements that we have such as storage as well as in networking. I think the number of businesses that we are engaged with all the hyperscalers as well as some OEMs. The models that people are doing are not only FPGA servers, but also just internal private acceleration.

I think you know we started before 7nm and I think that's a good thing because if we were starting just at 7nm and people didn't even have the familiarity with us then why they would use our platform to begin with? We would have an even bigger hurdle but I think that now that 16nm will get the mindset out there, it will hopefully give people a little bit of understanding as to what we offer and why it has value.

7nm ups the level of our platform significantly, and I think during the 7nm time frame is certainly I think we will have growth and a reflection of our product portfolio.

IC: Is there no desire to use half nodes like 10nm?

VP: We didn't do that for a variety of reasons. For one, if I just looked at our product cadence and when 10nm would come out, it was not a good lineup. But the other thing is that the delta between 10nm and 16nm, compared to 7nm and 16nm, is much less significant. Let’s face it, as you follow a lot of tech, it is really the handsets that need this almost annual thing. Because Moore's Law is no longer on track, everybody is coming out with variants to their process: plus, plus plus, and this-that –and-the-other-thing. This is largely because the handset guys, and every year they have to come out with something because every year there is a Christmas holiday and a consumer business.

So for not only us, but other kinds of more high-end computing applications, they don't refresh every year so we've just picked the right technology so.

IC: For the ACAP, it feels as if Xilinx is moving some way between a FPGA and an ASIC, having hardened controllers on silicon and implementing more DSPs and application processors. Is that a challenge to your customers that are currently purely FPGA based?

VP: Well I first want to challenge that a little bit. I think in a sense you are looking at an implementation perspective, and if you are saying that there are blocks that are implemented with ASIC flow, that is true. But even the things that we are implementing with ASIC flow, we are still maintaining some level of hardware programmability. At the end of the day, we are going to play to our strengths, right? Our strength is our understanding in how to enable people to use the power of hardware programmability. We crossed the chasm of being only hardware programmable to being software and hardware programmable.

It's a little bit hard to talk the new product without pre-announcing some of the features, but I talked about hardware software programmable engine, which won't be an engine that is customizable down to a single bit, but it will have notions of some granular data paths and memory and things like that, and it has some overlay of an instruction set, but while most people won't program it, it still has hardware programmability. We’re not just somebody else coming out with a VLI doping multi core architecture because then what is the difference between us and someone else right? There's always going to be that secret sauce.

Also, there's no value for us in creating a custom CPU, like an Arm SoC - not for the applications we want. If we were doing a server class processor you would need to take an Arm architecture license and create your own micro-architecture, but there's no value in us doing that, but we have embedded application cores in our ACAP. So, some of the things are implemented from an ASIC flow and is just like a hard ASIC as any other one. But also I'm not going to create a GPU architecture either - some of our products have a GPU in it, a very low GPU more for 2D kind of interfaces and stuff like that.

But when there is heavy lifting, like acceleration, there I will always try to find a way where we could add value in terms of the programmability. For the challenge of the programmability to the customer, we have already crossed that chasm once with Zynq, so it doesn’t need be a hardware expert anymore to use these. There is a software team working with that and now with FDSoC and some of these design environments you don't absolutely need the hardware design anymore because there's just many more systems and software developers than there are FPGA designers, right?

IC: One of the things that came out today with was your discussion on how implementing libraries and APIs to essentially have the equivalent of CUDA stack for the Xilinx product line. Training people on that sort of level of API is something that NVIDIA has done for the last 8 years. Is that the sort of training program that is appealing to help improve the exposure of these types of products?

VP: I'm not the one to get into the details, but I actually have people on my team that feel like we're even easier to use than CUDA! But I what I would say is in general we are going to try enable more innovators, and if the innovators are used to things like the TensorFlow Framework, we'll do that. You know we have even enabled python, right? Because of younger programmers it is probably more relevant to do python than have to do something like C or C++ or something that. But in other areas people still develop to those other languages, so you know it is really all about us trying to enable more innovators with the framework and development experience they are used to – we are going to try match those as much as possible

At some level there isn't going to be a compile step, which isn't going to be like software. Because it is physics, we are actually not going to have to do something but it doesn't mean that they always have to do it. As you can imagine when you are in a development cycle you could do very quick compiles and then work out when you're doing. Here's the production thing though - you could take longer with a new platform, but we're trying to minimize that. The general mantra is to try and make the experience like any other software target or platform, but getting a custom hardware engine underneath the hood and they don't have to really muck with it.

IC: I mean with something like the APAC it is clear that cloud providers would other this as a service, as they want a multi-configurable system for their users. But beyond the cloud providers, are people actually going to end up deploying it in their own custom solutions -with this amount of re-configurability, do you actually see them using it in a ‘reconfigurable way’ rather than just something that's fixed and deployed?

VP: That's a good point - I think that it wouldn't be necessarily that everybody dynamically configures it when it is deployed, but we do see that and I'll give you an example as far away from the cloud as you could imagine. There are a lot of people in testing instruments. Now some test instruments are kind of like a reconfigurable panel, with people moving the panels that have hard knobs, so if you can virtualize the interface when they select things, they can just reconfigure it to do a different thing with some of the guts of the electronics - for example, a situation where eight analogue testing components are being rotated. Like anything else, they try to move to digital as soon as possible, so the ability to completely reconfigure something is vital.

IC: Would you consider that a mass scale deployment, rather than prototyping?

VP: No it's not prototyping – it is a tester, but somebody using it on the bench. If they want the scope to do this or that, they can change the functionality on the fly. It is deployed, right, up and down in many engineering workshops.

That's a good example, especially if we look at telecommunications. 5G is not really a unified global standard – it is multiple standards that vary by geography, by region, and because it's going to be around connectivity there are new band radios, as well as different form factors and different regions and even down to different large customers that demand certain things. So our traditional customers in communications still need some degree of customization. One of them said to me that last year they had to do on average 2 + radios a week, because they always need to customize something about the frequency bands or the wavelengths or something about the radio. So even if it is stuff they have already built, they always still have to have some degree of customization now. Within that deployment, there will be a customer that has to change things? I could tell you actually in aerospace and defense, there are applications for security reasons, people actually want to reconfigure dynamically when it's deployed.

So it's a range - some will reconfigure when it's deployed, some will do it when they are trying to scale their product line, but in the cloud whether it's public or private I think, clearly there will be different workloads right.

IC: The ACAP schedule is for tape out this year, with sales and revenue next year. Aiming at 7nm means you're clearly going for the high end, and previously you said 16nm would have had deficits compared to using 7nm. In time do you see this sort of type of ACAP moving down into the low end?

VP: I'll put it this way - our selection of technology is primarily based on what is the right match for the end solutions. Last year we did some new 28nm tape outs, despite the fact that we had finished our original plans at 28nm a long time ago. But for the more cost and power sensitive segments, and we're talking about $10 a blow kind of segments, you don't need 20nm or 16m, but it might be kind of hard to reach that price point too in a lot of cases. So, we will still do some of those older technologies for other products, but there are some new IP blocks that are brand new only in 7nm. It is quite costly to implement a new IP block and the technology after all. But if there is a big enough opportunity and we have important enough customers or a growth segment, we would do it.

IC: Any desire to make products like these act as hosts, rather than just attached devices or attached accelerators?

VP: Well yeah, we have the Zynq product line in the data center today, and pretty much people are using pure FPGAs in the compute side and in some other areas people are finding use for a remote processor that doesn't have to do as much heavy lifting - in fact in most cases it's doing some sort of management function. So, I’d say yes, today because it's mainly a pure accelerator and it needs a local host, but I think in the future we will see that change.

I mean basically you don't want to have to go back to that host – it is costly on multiple levels and if you have a host that can do it locally, such as the embedded Arm core, you don't want to use that expensive CPU cycle that's remote, as overall you lose performance.

IC: Back at Supercomputing we saw a research development project about just network attached accelerators, so not even with a host. You just attach it in and...

VP: Exactly, and actually that's how Microsoft has eliminated this issue – it is connected to the network and so they can talk peer to peer to others. They don't necessarily even have to go to the CPU at all, but if they do, generally it's just like a CPU process - it's more like a batch, do a big job, come back, so you don't have to have lots of communication.

IC: Obviously the biggest competitor you have in this space (Intel/Altera) has different technologies for putting multiple chips together (EMIB). Is Xilinx happy with interposers? Is there research into embedded package interconnects?

VP: I think it's all about what's the problem trying to solve. If you need massive amounts of bandwidth, the interposer, is the way to go, based on TSMCs CoWoS technology, which we call SSIT. You know InFO (TSMC’s Integrated Fan Out) is much more limited, so if you don't need huge amounts of bandwidth and you don't need to connect many chips, InFO could be interesting.

It's all about technology. I mean we're not like religious about it, and if you could do what you want with InFO or a traditional MCM then OK, but it is really all about what you're trying to do. But yeah, we are quite comfortable because generally for the problems we are trying to solve, there is a need for a lot of bandwidth and fairly low latency. We've been doing this since 28nm, so you know 16nm is like our 3rd generation and 7nm will have both monolithic chips and we'll still use interposer technology. I mean that's why we could do this chip right [holds up Virtex UltraScale+ with HBM].

IC: There's a bit of a kerfuffle with your competitors about who was the first to have HBM2, comparing announcing versus deploying.

VP: I was just going say! Even before they were part of Intel, Altera was pretty good about announcing and showing technology demonstrations, but when you look when you can actually deploy something into production that is another story. You know more recently we've actually both announced earlier and deployed earlier, but in this particular case, okay they claim that their getting out there sooner, but we'll see who gets out there in production. Because remember, EMIB has never seen production in anything yet,, so you know, this is like I said - our 3rd generation of production and high volume is very reliable for our kind of customers.

IC: With Zynq already using Arm processors, will the ACAP application processors also be Arm, or if they can have RISC-V cores, or others?

VP: We're still going with the ARM architecture. I think it is the broadest set of architecture for embedded applications and, you know in the case of the heavy-duty kind of acceleration where the host is off board, it could be anything. For the off board host, we interface with OpenCAPI, and obviously we're driving CCIX, and we are working with PCIe and at SC17 we had a demo with AMDs EPYC server together with our processor.

IC: For Project Everest, can you comment on security and virtualization?

VP: We've been working on security quite a bit. We traditionally leverage a lot of things into the more pure FPGA but with Everest, as they all have SoC capabilities, everything will be leveraged together. I think everybody knows that with the world becoming very smart and connected it means the attack surface is pretty much the entire world. Since we are in applications like automotive, they care a lot about that. The hyper-scalers care a lot about security as well, except they don't have to worry about physical tampering as much, but in the automotive we actually have to care about physical tampering as well as software attacks. So you know we do TrustZone, we have all kinds of internal boots so that people won't see memory traffic over interfaces. We have stuff for defeating DPA, we have all kinds of things. In fact, if we detect anomalies, it auto shuts down. Because we do get a lot of very secure customers, such as the aerospace and defense industries, we work with the agencies that are very sensitive to this thing.

We also get involved in safety, and it is interesting because they are not exactly the same thing - safety is about ensuring the thing doesn't harm people, and security is about people not harming the thing. But we're finding more and more applications that care about both. An example of that might be robotics, drones, etc.

One other thing about security that I think is really good about our technology - the security people really like it if you could do redundancy, but implement it different ways. Because of the diversity, if you attack one thing, it doesn't mean you can attack the other. The fact we can run things in the applications processer, the real time R5's, and in the fabric, means the level of diversity you can get and then poll to see if you get the same results is quite a bit richer than you know what fixed functions chip could do.

IC: With Everest, with things like the on-chip interconnect, is that Xilinx's IP?

VP: The NOC that is connected to all the blocks, that is our design. Internally in our SoC, we actually licensed a 3rd party NOC. But the reason we chose to the main NOC proprietary is because the requirements we needed for this adaptability a little different. If it was look like everything was an ASIC block, we could in theory use a licensed IP, but there are some unique things to do when you have these very flexible things that are being implemented in lots of different ways.

IC: So what sort of different things can you do?

VP: This is probably where I should get one of my architects! It's probably best for me to get into comparing some of these architectures from other companies - we looked at them, and as I mentioned we actually use one of them within the SoC block. But we found that we can't use it for our own and they didn’t have the tools and things that you have to give people in order to interface to that.

IC: Most of our readers are familiar with the ARM interconnects because of mobile chips, so I could assume that you'll have one of those inside the SoC block.

VP: We do, and it in fact most of our IP, even in our soft IP blocks, most of the interfaces are very AXI like. We help drive some of the design - they had heavyweight AXI for some blocks, and you really don't need that heavyweight in a sense, so we work with them on the streaming, using a more slim AXI. What I should really do is a good point that when we do announce the details of the ACAP we can compare and contrast the differences that help people because they have a reference point.

Many thanks to Victor Peng and his team for their time.
Also thanks to Gavin Bonshor for transcription.

An Adaptive Compute Acceleration Platform: Project Everest

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

16 Comments

View All Comments

davegraham - Monday, March 19, 2018 - link
actually, the adaptive nature of hardware is becoming more and more interesting. they blew right by it in the article but with the introduction of CCIX, you will start to see the ability to have coherency within a system (similar but slightly different than Torrenza from a while back) for these plug-in accelerators. establishing this level of "fairness" and coherency amongst accelerators and giving them precedence (esp. on AMD drive x86 compute ;) ), will allow the development of much more agile hardware. you could also think of driving coherency thru, let's say, CCIX tunneled thru Gen-Z. ;)
iwod - Monday, March 19, 2018 - link
1. Get it on AWS
2. Get Netflix to contribute on codec encoding.
3. Get Limelight to try figure this out for CDN.
4. Partner with AMD EPYC
ZolaIII - Monday, March 19, 2018 - link
Well FPGA is as a clean sheet of paper on which you can write what ever you want & then delate it & write something else so they are by the real mean of the world the universal accelerators when patched with enough storage RAM (which current HMB's still cannot provide) they become suitable for large data set's as scientific one's but they do need a direct lo latency RAM to achieve real usability of it. So this is just another small step into the right direction as as far as I understand this stil won't be 100% autonomous self stand operational. For that sizable number (four is perfectly enough) general purpose cores is required & not weak ones but also not high HPC one's let's say costume server ARM ones could fit in perfectly (there for I don't understand the statement how their don't need them) it will also require a powerful enough GPU (mobile licensable one's are perfectly fine) that could meat the need of detail accurate 3D model representation) not a huge crunchy desktop one's which would be a overkill in efficiency and many other things. As I understand they didn't put anything like that in. A basic 2D one simply won't fit anything more than base interface and result by numbers representation. So this is another step into the right direction but we still aren't going to be there with this. The main advantage of the FPGA is that it can be utilised to execute simultaneously medium numbers of variables tasks (by simply applying couple of design on partitions of programmable aria [a ASICS for this & ASICS for that or a real neural network, multiplier... basically anything as long as it can fit in]) & as soon as done apply for most suitable ones for new tasks and reprogram it (again part by part) & best of all it's never outdated as you always can program newer & more perfected algorithms. Intel for now did a bit different approach by adding limited area FPGA to the HPC many core Xeon so that FPGA remain only a second league player big enough to utilise couple smaller ASICS designs & suitable only for fast switch use as networking. Still me by that changes & their development of in haus GPU brings them a step closer to be able to make it autonomous self stand ones if nothing else it will simplify them an way to interconnect the GPU better. Interesting enough their still isn't any player in the industry that it can put it all together by him self. Intel has CPU design and FPGA but it lacks GPU, QC has a CPU & GPU but it doesn't have FPGA, Xilinx has only a FPGA & all tho IP licensing let's say Power VR graphics would fit in the GPU need they still can't license a powerful enough CPU core's as reference ARM designs aren't there yet (that's why vendors are making costume designs in the first place especially server suitable one's) but who knows me by this changes I'm the near future.

At the end even when suitable autonomous platform appears as a SoC it will in the first place only be a developer platform for both scientific and commercial community & will take some time that it becomes useful to secondary developers (aka programmers) & only after that to general (consumers) public but never the less this is a way how things will got to go as we simply can't add more & more of dead black silicone no matter how much someone is lying to present it as most optimized & best suitable.
Threska - Monday, March 19, 2018 - link
I see what they're trying and wish them well. However one of the biggest issues is FPGAs keeping up with everything else. Other is more having enough people with the needed skillset. Programming computers is different than FPGAs.
ZolaIII - Monday, March 19, 2018 - link
That's why adaption after they actually produce complete SoC with dominant FPGA will have 3 stages engineering scientific one, adapted for high symbolic programming the second one & consumer one as the last & final one all tho first stage will be a never ending one.
modport0 - Tuesday, March 20, 2018 - link
I wonder what the power consumption range of these are. It seems that Xilinx is going for the high-end. Outside of data centers (which are also concerned about power consumption), FPGAs are typically used for prototyping or other low volume applications.

From what I hear from murmurs during conferences/conventions, despite all the PR, MS (uses Intel FPGAs) and others are struggling to justify continued use of FPGAs in data centers.

Xilinx Announces Project Everest: The 7nm FPGA SoC Hybrid

Interview with Victor Peng, CEO of Xilinx

Post Your Comment

16 Comments

View All Comments

davegraham - Monday, March 19, 2018 - link

iwod - Monday, March 19, 2018 - link

ZolaIII - Monday, March 19, 2018 - link

Threska - Monday, March 19, 2018 - link

ZolaIII - Monday, March 19, 2018 - link

modport0 - Tuesday, March 20, 2018 - link

Log in

Don't have an account? Sign up now