Dealing with Guest ISAs and a Translation Layer

Going back to this architecture diagram, everything up to the global front end is another interesting story as well.

Part of Soft Machines' product package is a low level virtual software layer that will translate a guest instruction set and convert it into the VISC ISA. This is to allow VISC to be used with existing software, and to more easily integrate into current environments rather than trying to establish an ecosystem for a new architecture in 2016. Soft Machines tells us that two instruction sets are supported, one of which will be ARMv8. It was implied that x86 would be the other, although they were reluctant to outright confirm it (ed: x86 translation is likely not to be looked upon fondly by Intel). Meanwhile we were told that writing additional translation layers, while not trivial, can be done and that they plan to support other guest ISAs in future.

So for all intents and purposes, this is a translation layer converting from ARMv8 to VISC. Many companies over the past couple of decades have tried with translation layers – Intel with Itanium, Transmeta to x86, and one of the latest was NVIDIA with Denver, which translated ARM to a custom ISA. Mentioning Itanium, Transmeta and Denver, for those who have followed the industry, might bring a chill down the spine given the very limited success each of these platforms have had. Soft Machines’ CEO was keen to point out that the purpose of the translation layer for VISC is very different to these previous attempts.

The VISC translation layer is designed to be a thin and lean implementation whose main role is to maintain compatibility to the VISC ISA, not to extract performance. Taking Denver as the most recent example, the translation layer there is designed to adjust the ARM instructions into Denver’s ISA and extract instruction level parallelism into the 7-wide design. For VISC, we are told, there is no need to go after performance at this level. The main point at which the VISC design increases performance is at threadlet generation, not in translation and making instruction sequences better fit the VISC hardware. This allows the ARM translation layer to have a less than 5% overhead, according to Soft Machines, and releases a point of contention with previous translation layer designs. As long as the translation layer is 100% compatible, the performance can in principle be extracted at the threadlet level.

This also means, again according to Soft Machines, that any specific compiler enhancement offered by others can also be used when translated. We put it to them that in the case of x86 certain codes are accelerated better on Intel’s compiler than say GCC (a question that arose out of the results we’ll go into later), and we were told that those instruction enhancements by ICC should translate well into the VISC ISA after going through the translation layer.

We asked about the VISC ISA, but were told that more information about this and the core design would be released at a later date as designs progress. We were told that it is a relatively small ISA (as to us sounds like a RISC, which is easier to extract ILP at lower power) with smaller instructions in comparison to ARM and x86. I would assume that this means they are fixed length, but this was not confirmed.

The VISC ISA and Core Pipeline Soft Machines, VISC and Roadmaps
Comments Locked

97 Comments

View All Comments

  • ddriver - Saturday, February 13, 2016 - link

    All abbreviations are capitalized, not just acronyms, idiot. Whether it is an acronym or not depends on how it is pronounced.
  • erple2 - Saturday, March 12, 2016 - link

    Enough people confuse initialisms and acronym that it probably doesn't matter anymore.
  • FunBunny2 - Friday, February 12, 2016 - link

    If SMI wants this to be believed, then just publish a paper (in a peer reviewed journal) showing how VISC invalidates Amdahl's Law. This is, after all, what they're really claiming.
  • willis936 - Friday, February 12, 2016 - link

    Could you explain how they're claiming to invalidate Amdahl's law ?
  • FunBunny2 - Friday, February 12, 2016 - link

    as I read the piece, SMI is implying/claiming performance improvement in running serial code in a parallel fashion, and Amdahl says you can't do that. if, OTOH, the claim is that VISC is able to suss out parallel execution in superficially serial code, then that process has to be proven to exist algorithmically. as the piece goes to some length to describe, based on what's been provided by SMI, it much like smoke and mirrors.
  • Arnulf - Friday, February 12, 2016 - link

    I read their claims as an expansion of superscalar design. Nothing new here and certainly nothing breaking any "laws". It still cannot magically make non-parallelizable code run faster than it normally would.
  • Samus - Monday, February 15, 2016 - link

    If their decoder can break up serial code and run it through different cores optimized to do different things better, this would theoretically complete the code faster because their will be no pipeline penalty.

    Personally. I think we have better odds of seeing a quantum processor before this type of thing taking off, though. That is to say, no time soon.
  • gamerk2 - Friday, February 12, 2016 - link

    Kinda. There will still be a limited performance simply because some operations can not be made parallel under any circumstances, but Soft Machines is really taking ILP to the extreme here.
  • xthetenth - Friday, February 12, 2016 - link

    No it really isn't and you're profoundly misunderstanding Amdahl's law. All that says is how much an improvement to a portion of a workload's execution speed will affect the workload as a whole's execution speed. Meanwhile what they're doing is trying to extract parallelism from single threads, which means that they're speeding up a greater fraction of the code. Funnily enough, you can use Amdahl's law to predict when this method (shrinking the non-improved section to allow higher maximum speed) is more effective than things like clocking higher.

    I suspect what you're doing is confusing the law with an explanation/common use of the law because it is very popular to use it to show there's a limit on the gains that can be made by parallel processors.
  • Drumsticks - Friday, February 12, 2016 - link

    It doesn't really invalidate Amdahl's Law. Serial code still can't be run on multiple cores. As I understand it, it only allows extracting more ILP using an ultra ultra wide design when possible.

Log in

Don't have an account? Sign up now