Agner Fog, a Danish expert in software optimization is making a plea for an open and standarized procedure for x86 instruction set extensions. Af first sight, this may seem a discussion that does not concern most of us. After all, the poor souls that have to program the insanely complex x86 compilers will take care of the complete chaos called "the x86 ISA", right? Why should the average the developer, system administrator or hardware enthusiast care?

Agner goes in great detail why the incompatible SSE-x.x additions and other ISA extensions were and are a pretty bad idea, but let me summarize it in a few quotes:
  • "The total number of x86 instructions is well above one thousand" (!!)
  • "CPU dispatching ... makes the code bigger, and it is so costly in terms of development time and maintenance costs that it is almost never done in a way that adequately optimizes for all brands of CPUs."
  • "the decoding of instructions can be a serious bottleneck, and it becomes worse the more complicated the instruction codes are"
  • The costs of supporting obsolete instructions is not negligible. You need large execution units to support a large number of instructions. This means more silicon space, longer data paths, more power consumption, and slower execution.
Summarized: Intel and AMD's proprietary x86 additions cost us all money. How much is hard to calculate, but our CPUs are consuming extra energy and underperform as decoders and execution units are unnecessary complicated. The software industry is wasting quite a bit of time and effort supporting different extensions.
 
Not convinced, still thinking that this only concerns the HPC crowd? The virtualization platforms contain up to 8% more code just to support the incompatible virtualization instructions which are offering almost exactly the same features. Each VMM is 4% bigger because of this. So whether you are running Hyper-V, VMware ESX or Xen, you are wasting valuable RAM space. It is not dramatic of course, but it unnecessary waste. Much worse is that this unstandarized x86 extention mess has made it a lot harder for datacenters to make the step towards a really dynamic environment where you can load balance VMs and thus move applications from one server to another on the fly. It is impossible to move (vmotion, live migrate) a VM from Intel to AMD servers, from newer to (some) older ones, and you need to fiddle with CPU masks in some situations just to make it work (and read complex tech documents). Should 99% of market lose money and flexibility because 1% of the market might get a performance boost?

The reason why Intel and AMD still continue with this is that some people inside feel that can create a "competitive edge". I believe this "competitive edge" is neglible: how many people have bought an Intel "Nehalem" CPU because it has the new SSE 4.2 instructions? How much software is supporting yet another x86 instruction addition?
 
So I fully support Agner Fog in his quest to a (slightly) less chaotic and more standarized x86 instruction set.
Comments Locked

108 Comments

View All Comments

  • davepermen - Monday, December 7, 2009 - link

    Doubling the amount of instructions means the dispatcher has one step more to do, at most, as it's mostly logarithmic (if even, it can actually be reduced further in parts). and finding out what to do, and then actually doing it, are not equal in work. the dispatcher is a tiny part of the logic. in general, it does not cost much at all.

    still, cleaning up the instruction set would be great. i hoped for x64 to be more cleaned up and streamlined than amd designed it, back then. would have been a great first step.
  • Springfield45 - Sunday, December 6, 2009 - link

    This is why I wish people would drop x86 entirely. Backwards compatabillity, while desireable at times, need not be native. Look at the performance increases in the last two generations of CPUs. Modern CPUs could EMULATE the computing environment from two generations ago and STILL be faster than high end processors of the time.
  • jensend - Sunday, December 6, 2009 - link

    I don't see people dropping x86 in the near future, and the cost of emulating all instructions in software is just too great for anybody to go that route. "Hardware accelerated emulation" with a different ISA ala Loongson 3 might prove to be interesting, but I don't think you'll see mainstream processors go that route soon either. But deprecating a vast number of those instructions now and moving them out of hardware later makes a lot of sense, and the idea to take measures to keep the ISA from getting further unnecessarily bloated in the future is a no-brainer.
  • GourdFreeMan - Sunday, December 6, 2009 - link

    If you count all of the x86 instructions from different vendors, and treat uses of different types of source registers as different instructions, there are ~3000 of them. See http://www.nasm.us/doc/nasmdocb.html">http://www.nasm.us/doc/nasmdocb.html

    In actual fact, though, even when only counting unique opcodes a large number of instructions are the same -- just treating the data in the source registers as being different sized, breaking up the vector registers differently, or doing the same integer operations in signed and unsigned modes.

    Decoding and dispatching is not as hellacious as these numbers might suggest, as most instructions are encoded so their bits actually have meaning as to what functional subset they belong.

    I will concede there are many legacy instructions that clutter the instruction space (BCD anyone?). Backwards compatibility (with forward performance improvements) is generally the reason x86 won the processor wars, however...

    Frankly, I am surprised Intel and its rivals have cooperated so well in respecting each other’s machine code to date. I could very well see Intel treating the x86 instruction space as its own and charging competitors to add their own proprietary extensions. Those who didn't play ball would have their old processors become incompatible with future generations of the x86 architecture... perhaps there are segments of their cross-licensing agreement with AMD (redacted in the public document) that forbid this?
  • dgingeri - Tuesday, December 8, 2009 - link

    The fact of the matter is that AMD and Intel worked together to make the 16-bit and 32-bit x86 instruction set when the 286 and 386 come out. Both have ownership in that instruction set. Others were allowed to make x86 compatable processors because AMD insisted on an open type license. If Intel alone owned it, you could bet they'd close it to everyone.

    This has also led to law suit after lawsuit between AMD and Intel over who could build what processors. When the K6 first came out, Intel tried to sue them to keep them from producing it because it used the same socket as the Pentium. A court stated that they could use it because it was so close to the original 386 interface, of which AMD did have part ownership.

    When the Pentium II came out, it used a totally different interface, so AMD couldn't do the same thing again. That's why the Athlon came out with the Alpha based Slot A interface, which was far superior to the PII interface. After that, AMD just kept using far superior interfaces, with Intel playing catchup. Intel may have a faster base chip, but their interfaces and instruction sets have been behind AMD for years.
  • GourdFreeMan - Tuesday, December 8, 2009 - link

    Do you have a source I can read about AMD's input into the 16-bit and 32-bit extensions of x86? My memory is a bit fuzzy going that far back, and all I can recall from that era is IBM requiring a second source for x86 chips and Intel licensing AMD to produce clones.
  • tygrus - Monday, December 7, 2009 - link

    It would be interesting to see a comparison of the # of instructions in each instruction set (RISC vs x86).

    Having memory addresses (or indirect) as sources makes a messy ISA and implementation. Explicitly load into register then calculations use up to three registers is much better (RISC). Only loads, stores and jumps use memory addresses. Old x87 FP stack was horrible.

    Could Intel or AMD resurrect Alpha or design new RISC with sensible vector extension.

    Could AMD create a CPU that started executing both branches without commit and discard the result of the wrong execution branch. Like Hyperthreading but the threads become a clone.
  • GourdFreeMan - Tuesday, December 8, 2009 - link

    "Having memory addresses (or indirect) as sources makes a messy ISA and implementation. Explicitly load into register then calculations use up to three registers is much better (RISC). Only loads, stores and jumps use memory addresses. Old x87 FP stack was horrible."

    Everything is a trade-off. Consider the cache footprint and the absolute instruction length of the machine code for both architectures. There are no absolute wins, except in the rose-colored world of academia. I will concede the x87 FP stack was simply a dinosaur from a previous age of microprocessors, however.

    "Could Intel or AMD resurrect Alpha or design new RISC with sensible vector extension."

    You know, with a clean-slate design you can do anything... except be the dominate microarchitecture in the computing world. More seriously, the only market for new architectures is the HPC domain of supercomputers... which is probably much better served by research into heterogeneous computing systems with a small number of complex cores that handle branchy code augmented by a larger number of simpler cores that do fast vector processing (e.g. Cell, GPCPU, etc.).
  • GourdFreeMan - Tuesday, December 8, 2009 - link

    Whoops... spelling error. Change "dominate" to "dominant" in my previous post.
  • titan7 - Monday, December 7, 2009 - link

    When Apple moved from CISC 680x0 to RISC PowerPC they actually had MORE instructions than before. So don't get too hung up on the absolute instruction count.

    Branch Prediction is on average no worse than 50% correct (if it guessed randomly), but often well above 90%. Doing both branches would mean 2x the power use for as little as 5% more speed overall.

Log in

Don't have an account? Sign up now