Dealing with Guest ISAs and a Translation Layer

Going back to this architecture diagram, everything up to the global front end is another interesting story as well.

Part of Soft Machines' product package is a low level virtual software layer that will translate a guest instruction set and convert it into the VISC ISA. This is to allow VISC to be used with existing software, and to more easily integrate into current environments rather than trying to establish an ecosystem for a new architecture in 2016. Soft Machines tells us that two instruction sets are supported, one of which will be ARMv8. It was implied that x86 would be the other, although they were reluctant to outright confirm it (ed: x86 translation is likely not to be looked upon fondly by Intel). Meanwhile we were told that writing additional translation layers, while not trivial, can be done and that they plan to support other guest ISAs in future.

So for all intents and purposes, this is a translation layer converting from ARMv8 to VISC. Many companies over the past couple of decades have tried with translation layers – Intel with Itanium, Transmeta to x86, and one of the latest was NVIDIA with Denver, which translated ARM to a custom ISA. Mentioning Itanium, Transmeta and Denver, for those who have followed the industry, might bring a chill down the spine given the very limited success each of these platforms have had. Soft Machines’ CEO was keen to point out that the purpose of the translation layer for VISC is very different to these previous attempts.

The VISC translation layer is designed to be a thin and lean implementation whose main role is to maintain compatibility to the VISC ISA, not to extract performance. Taking Denver as the most recent example, the translation layer there is designed to adjust the ARM instructions into Denver’s ISA and extract instruction level parallelism into the 7-wide design. For VISC, we are told, there is no need to go after performance at this level. The main point at which the VISC design increases performance is at threadlet generation, not in translation and making instruction sequences better fit the VISC hardware. This allows the ARM translation layer to have a less than 5% overhead, according to Soft Machines, and releases a point of contention with previous translation layer designs. As long as the translation layer is 100% compatible, the performance can in principle be extracted at the threadlet level.

This also means, again according to Soft Machines, that any specific compiler enhancement offered by others can also be used when translated. We put it to them that in the case of x86 certain codes are accelerated better on Intel’s compiler than say GCC (a question that arose out of the results we’ll go into later), and we were told that those instruction enhancements by ICC should translate well into the VISC ISA after going through the translation layer.

We asked about the VISC ISA, but were told that more information about this and the core design would be released at a later date as designs progress. We were told that it is a relatively small ISA (as to us sounds like a RISC, which is easier to extract ILP at lower power) with smaller instructions in comparison to ARM and x86. I would assume that this means they are fixed length, but this was not confirmed.

The VISC ISA and Core Pipeline Soft Machines, VISC and Roadmaps
Comments Locked


View All Comments

  • Bleakwise - Tuesday, March 14, 2017 - link

    I mean IBM does this with the POWER8 very successfully.
  • Bleakwise - Tuesday, March 14, 2017 - link

    If you would like to know how an Superscaler CPU can beat an in-order CPU....

    So a Processor with 6 pipelines can do
    1*2*3*4*5*6 in one instructoin
    a processor with 12 piplines can do
    in one clock cycle

    This is the opposite of hyper threading, which allows my 4770k with 5 pipelines to do
    1*2*3 and 4*5
    1*2 and 3*4*5
    all in one clock cycle.
  • jjj - Friday, February 12, 2016 - link

    What they do with A72 in their slides is a huge red flag. They clock it above 3GHz on 16ff to make it look bad. When you don't need to distort the truth why do it? Was excited about them but they lost all credibility with this.
    vs ARM it will be hard for them ,assuming ARM will have yearly updates and a broader range of cores. Area will also matter a lot Ofc vs ARM the proper math when it comes to perf, power, thermal and area would be to include dark silicon. ARM is at 8-10 cores in 2-3 clusters but we might see even more than that (i would add a gaming cluster, as GPU perf is a rather complicated problem right now).

    Hope we do get to see them in commercial products and i wonder about their longer term plans. Would be interesting if they would aim for a lot more cores at very low power and even cooler if they would aim to use different types of cores - as undoable as all that might be lol. For glasses we need a huge step forward that process and packaging might fail to enable soon enough and even server might find such a path preferable. Would love to see 1T 32PC at 50-100mW on 5nm. Or ,to just go crazy, would be great if they could reach low enough power (thermal) to stack logic and go monolithic 3D since folks are not quite able to do that , for now.
    Guess , it would be great if you could ask them how far they think they can push with the number of cores in a thread.
  • gamerk2 - Friday, February 12, 2016 - link

    Odds are, Soft Machines gets acquired by Intel (who want a low-power core for mobile. And hey, ARM support to eliminate the lack of mobile X86 software to boot) or NVIDIA (who want a CPU core, and hey, already have ARM based tablets. X86 support is a bonus an could allow full NVIDIA branded PCs).
  • jjj - Friday, February 12, 2016 - link

    It would be easier for Intel or ARM to just copy. Additionally, a sale to Intel would be difficult with Samsung and AMD as investors in SM.
  • fiodhkf - Friday, February 12, 2016 - link

    I don't understand these results. How are skylake specint and spefp scores so low? On the weakest skylake part I could quickly find is Celeron G3900 at 2.8 GHz and 2MB L3 (and huge power consumption, but let's ignore that for now). It has CINT2006 of ~45 and CFP2006 of ~61. Can i5-6200U be that much slower?
  • extide - Friday, February 12, 2016 - link

    Because those are NOT the results of a skylake chip, those are their adjusted results of a chip that is equivalent to skylake, but with 1MB L2, no L3, and made on TSMC's 16nmFF+, which is a chip that will NEVER exist in the wild and is POINTLESS to compare to as these guys will never be competing against a made up chip, only the actual stuff released by Intel, and other people.
  • fiodhkf - Friday, February 12, 2016 - link

    In the second Performance/Watt comparisonfigure the blue curve is supposed to(?) show the true unscaled-for-cache skylake (power is probably scaled to TSMC 16nmFF+, but surely they're not scaling the performance as well). Even there the skylake spec scores are only about half of what they should be according to results on
  • Exophase - Friday, February 12, 2016 - link

    The scores are using ICC, which has optimizations that game a few SPEC2006 subtests like crazy. They also apply auto-par and pointer compression optimizations that aren't applied in GCC. There's also some extra optimizations for peak if you're looking at that but it doesn't make a huge difference in the overall score.

    All of this adds up to big differences in SPEC score.
  • fiodhkf - Friday, February 12, 2016 - link

    Thanks, that was pretty much what I guessed would be one explanation for the difference. Still, I'm a bit surprised with the low skylake scores even when compared to some (old) AMD processors where scores used open64. But I don't care quite enough to try myself.

Log in

Don't have an account? Sign up now