But 16FF+ Silicon Exists

One of the salient points of our talk with Soft Machines was the fact that silicon talks louder than simulations. Their CTO was very honest and said this before I even had the chance to. The 28nm design was shown in 2014 and data was provided, but no 16FF+ design had since been made public. Soft Machines were happy enough to share with us that they do have the core design for 16nm at HQ being examined:


16nm Silicon of a Shasta design

This is literally a test chip of cores rather than a full SoC, and they are currently running the correlation data between simulation and silicon. We were told that the design errors that the 28nm silicon had, such as cache flushing properly, were fixed. The new silicon also includes power plane management, although customers are welcome to use their own power plane adjustments.

The goal, according to Soft Machines' numbers, is to provide a Shasta core on an optimized 16nm FF+ process at 2GHz at around 2W. Their goal includes scaling the design from SoC to server, meaning that there is the goal to reach a range of 0.5W per core up to 5W per core. Because there’s only one 16FF+ part-SoC early run currently at their headquarters it remains to be seen if that is possible, and requires a partner or investor to get their hands dirty with the technology first.

Before someone jumps up and says "is platform XYZ going to use VISC?", it should be fairly obvious from most public roadmaps covering the next 1-2 years that major platforms will not be using VISC. What we see on public roadmaps is a mix of ARM and x86, and the fact that VISC is a different ISA under the hood (which can run native VISC code without translation) means that there has to be an ecosystem change. Soft Machines, with their announcement last week, is at this time principally fishing for clients, investors, and potentially something more.

The big thing about why this design has got a lot of attention in the media and between analysts is because of the potential. Being able to have many light-weight cores that can share resources between threads would be a major milestone in semiconductor design and the next point in the CISC/RISC lineage. It epitomizes the idea of having all the hardware working on a task no matter what it is, such that you can have many slower power efficient cores working on a single task or one inefficient high power but fast core. If you can spare the die area and have a good ISA translation layer, this opens up some of the power budget in a power limited device. A lot of discussion on laptops or smartphones is all about the power, although Soft Machines believes this can impact servers just as easily. 

Arguably one could state that future processors will have to do something like VISC in order to get better IPC – when a thread needs a large wide core, then a VISC design can be one when needed. Technically we already have semiconductor designs that work very well on prepared data – vector calculations and graphics are handled by lots of small, simple cores in their thousands. But these only work with consistent data and when the same calculation on all the data points is needed; with a VISC design, the code can be complex with dependencies and the virtual cores will shrink/expand as needed. A lot of questions surrounding the translation layer are to be expected, and if it can be as water-tight as possible when other ISAs are passed through (ARM to VISC, x86 to VISC) and also take advantage of compiler benefits as to SMI’s claims.

As it stands the design promises a lot, but because we really need to see the proper silicon implementation, it might be hard to visualize until a company in the technology ecosystem decides to make that step. It would be an interesting differentiation point for sure, but it requires investment to reach utility in mass production. That makes a number of analysts wary and conservative with good reasons, especially with the assumptions made on that data graph.

Soft Machines has invited us to their offices next time I’m in the Bay Area, which I will probably take them up on.

Sources:

Soft Machines
Microprocessor Report
2014 Linley Conference Video
2015 Linley Conference Video

Show Me the Proof
Comments Locked

97 Comments

View All Comments

  • kgardas - Saturday, February 13, 2016 - link

    Market is not dominated by excellent technology, but by average or mediocre in fact. "Good enough" is enemy of any excellence.
    Also in comparison with AMD64 which is just pile of hacks to prolong x86 architecture life, IA-64 was clean design on green field and its really a pity Intel can't push that further -- also due to AMD64 existence in the market.
  • Alexvrb - Monday, February 15, 2016 - link

    That kind of thinking is what lead to the Itanic, the unsinkable chip. Too bad they ran afoul of a giant costberg.

    But no, you're right, everyone should switch to a new ISA because Intel says it's better, even though it means switching to an Intel-exclusive ISA that will cost you dearly now, and even more dearly later when you become dependent on a product that only Intel can produce.

    If Intel's only desire was a better design for everyone, they would have worked with AMD and freely extended licensing agreements to IA-64 to them so they could both produce IA-64 chips. The outcome could have very well been different in that scenario. But that is not what they did, and they paid for it. Of course, Intel is such a giant that they can afford to take such failures in stride. AMD can not afford another flop - Zen, Polaris, and eventually a ZenPolaris APU have to achieve at least a significant degree of success.
  • FunBunny2 - Monday, February 15, 2016 - link

    -- Of course, Intel is such a giant that they can afford to take such failures in stride.

    the irony, of course, is that Intel got where it is just because, in ~1980, Intel had one foot in chapter 7 and one foot on a banana peel. IOW, an easy controlled peon for IBM to abuse, thus the 8088 came to be. if IBM had secured the BIOS, life for both would have turned out rather differently, I suspect.
  • Alexvrb - Monday, February 15, 2016 - link

    Yeah that is ironic. Intel was trying to avoid making a similar mistake, and in doing so they screwed up - but the SIZE of the failure was tiny in comparison. One thing about Intel is that they have better foresight and planning than IBM. IBM always was caught up in their own world. Intel probably was working on backup plans for Itanium failure before it even launched, regardless of how high they thought its chances were.
  • diediealldie - Friday, February 12, 2016 - link

    Thanks for great article.

    Anyway, I'd wait for actual working silicon with high frequency(Not .5Ghz) to figure out if it's real or not.

    Since they're making abstraction layer with real silicon, demonstrating it on slow chip will not enough to convince industry experts(Hardware will be complicated so making it work on very high frequency is also big challenge).
    High freqency chips requires more pipeline thus latencies and cache efficiency gets worse, hardware blocks not working...etc. so high frequeny chip is what Shasta really have to demonstrate.
  • dcbronco - Friday, February 12, 2016 - link

    Interesting that AMD are switching to SMT with Zen and are one of the big financers of Soft Machines and VISC works well with SMT. I also wonder if an OS written for VISC would give a boost to APUs or would the bottlenecks kill any advantage.
  • zodiacfml - Saturday, February 13, 2016 - link

    Interesting as numerous patents created can be beneficial to other CPU makers but for them creating a compelling chip that could sell would be miraculous.
  • haplo602 - Saturday, February 13, 2016 - link

    Lots of marketing and even more estimates and projections. Looks like a very long road ahead to actual working chip with peripherals. And a lot of obstacles still to clear.

    While the idea is interesting, it is highly impractical. The higher frequency they'll go, the more the final compositing overhead will bite them. There's a cost traversing the layers of the design from thread to CPU and back with results. I do not see that explained anywhere.

    Another point is, I think this is not a new idea. It is fairly obvious an extension so expect Intel already went that route and met a dead-end.
  • vladx - Sunday, February 14, 2016 - link

    Intel are too conservative to come with such ideas, they'd rather milk the cow as long as possible.
  • hMunster - Sunday, February 14, 2016 - link

    There's one important question I don't see addressed, how do you run at higher IPCs when you have a conditional branch every few instructions? A 16-wide virtual CPU using 4 physical cores is all nice and dandy, but 16 instructions will, in normal x85 code, contain at least 2 branches. I can't see them doing a lot of speculative execution because that drives up the power consumption. So how do they solve this?

Log in

Don't have an account? Sign up now