Agner Fog, a Danish expert in software optimization is making a plea for an open and standarized procedure for x86 instruction set extensions. Af first sight, this may seem a discussion that does not concern most of us. After all, the poor souls that have to program the insanely complex x86 compilers will take care of the complete chaos called "the x86 ISA", right? Why should the average the developer, system administrator or hardware enthusiast care?

Agner goes in great detail why the incompatible SSE-x.x additions and other ISA extensions were and are a pretty bad idea, but let me summarize it in a few quotes:
  • "The total number of x86 instructions is well above one thousand" (!!)
  • "CPU dispatching ... makes the code bigger, and it is so costly in terms of development time and maintenance costs that it is almost never done in a way that adequately optimizes for all brands of CPUs."
  • "the decoding of instructions can be a serious bottleneck, and it becomes worse the more complicated the instruction codes are"
  • The costs of supporting obsolete instructions is not negligible. You need large execution units to support a large number of instructions. This means more silicon space, longer data paths, more power consumption, and slower execution.
Summarized: Intel and AMD's proprietary x86 additions cost us all money. How much is hard to calculate, but our CPUs are consuming extra energy and underperform as decoders and execution units are unnecessary complicated. The software industry is wasting quite a bit of time and effort supporting different extensions.
 
Not convinced, still thinking that this only concerns the HPC crowd? The virtualization platforms contain up to 8% more code just to support the incompatible virtualization instructions which are offering almost exactly the same features. Each VMM is 4% bigger because of this. So whether you are running Hyper-V, VMware ESX or Xen, you are wasting valuable RAM space. It is not dramatic of course, but it unnecessary waste. Much worse is that this unstandarized x86 extention mess has made it a lot harder for datacenters to make the step towards a really dynamic environment where you can load balance VMs and thus move applications from one server to another on the fly. It is impossible to move (vmotion, live migrate) a VM from Intel to AMD servers, from newer to (some) older ones, and you need to fiddle with CPU masks in some situations just to make it work (and read complex tech documents). Should 99% of market lose money and flexibility because 1% of the market might get a performance boost?

The reason why Intel and AMD still continue with this is that some people inside feel that can create a "competitive edge". I believe this "competitive edge" is neglible: how many people have bought an Intel "Nehalem" CPU because it has the new SSE 4.2 instructions? How much software is supporting yet another x86 instruction addition?
 
So I fully support Agner Fog in his quest to a (slightly) less chaotic and more standarized x86 instruction set.
Comments Locked

108 Comments

View All Comments

  • Penti - Sunday, December 13, 2009 - link

    PowerPC is Power, but more too the point Motorola sold of their cpu tech and fabs, IBM did continue to develop PPC further then Freescale as it's not Motorola who's behind it, the PowerPC 970MP was used for several years after the mac switched in low end System p systems and power blade servers. Fixstars who bought terra soft/YDL do still even sell a PowerPC 970MP workstation. But no need to develop it anymore POWER6 got the VMX/Altivec unit. G5's weren't actually bad. But the move was wise any way. Native windows, same notebook processors. Ended the rivalry pretty much, and was not needed for stuff like classic environment anymore.
  • Scali - Monday, December 14, 2009 - link

    PowerPC is a subset of POWER (the IBM server ISA on which PowerPC was based).
  • The0ne - Monday, December 7, 2009 - link

    Much too many years I actually don't care what's going on anymore to be quite honest. None of use is going to change anything by discussing it here. I think no one can do anything to "solve" this crude until one actually steps up to take the initiative, with a lot of money. Either that or a genius 10 year old kid that created something evolutionary for all to use, free and open-sourced :)

    x86 has gotten TOO ingrained and TOO big. I don't think it can be killed, like MS Windows. You would think it makes sense to somehow start from scratch but that's like passing algebra while chewing bubblegum.
  • Zoomer - Wednesday, December 9, 2009 - link

    I think the power architecture still lives as an embedded processor ISA. See freescale power series.
  • Scali - Monday, December 7, 2009 - link

    I think we've been at this point for many years... Thing is, everytime such a processor is launched onto the market, it is killed with brute force.
    One example is PowerPC... when it was first being used in the early 90s, PowerPC was a good alternative to x86, generally delivering better performance.
    However, since Apple/Motorola/IBM didn't have such a large market as Intel/AMD had, they didn't have the same amount of resources to keep improving the CPU at the same rate as x86 did.
    A few years ago, Motorola stopped development of the PowerPC altogether... Apple turned to IBM for PowerPCs for a while, but eventually moved to x86.

    I think that if PowerPC development had the same resources as x86, it would probably still be ahead of x86 today.
  • Scali - Monday, December 7, 2009 - link

    I suppose Intel and AMD need to get together and decide what parts of the instructionset can be abandoned.
    I think an obvious examples is 3DNow!
    There's very little software that uses it, developers have abandoned it in favour of SSE years ago. Intel never supported 3DNow! anyway, so any code with 3DNow! has a workaround.
    MMX is also pretty useless since SSE2. With SSE2 you can do the same operations on the SSE registers, without messing up the FPU stack.

    16-bit mode can be abandoned aswell... 64-bit OSes don't support 16-bit binaries anymore anyway, might aswell just use software emulation such as dosbox.

    Software emulation should be good enough for large parts of the instructionset. Other CPU developers such as Motorola and IBM have been doing it for years.

    A nicely 'cleaned up' x86 which only does 64-bit natively and only the instructions that 'make sense' to support natively, such as SSE (perhaps not even x87, or most of it, as SSE2 replaces that aswell, and is preferred anyway in most 64-bit OSes)... that would probably make the CPUs cheaper, smaller, and more efficient.
    Some applications may suffer in terms of performance, but that should be easy to fix with a recompile. Without a 'lightweight' CPU there's just nothing forcing a recompile so it never happens.
  • wetwareinterface - Wednesday, December 9, 2009 - link

    "16-bit mode can be abandoned aswell... 64-bit OSes don't support 16-bit binaries anymore anyway, might aswell just use software emulation such as dosbox."

    you are confusing 16 bit isa instructions and 16 bit compiled binaries that rely on a 16bit version operating system. the former is an instruction that handles data of no greater than 16 bits in length. the latter is a program that relies on specific api modules from the operating system that simply aren't there anymore.

    there are still several 16 bit isa commands that are used even under windows 7 64 bit. why have a number value in your program that will never exceed 10 for instance, take up a memory footprint 4 times larger?
  • Scali - Thursday, December 10, 2009 - link

    I'm not confusing anything. I'm saying that the 16-bit mode can be abandoned. I didn't say anything about 16-bit operands, so I don't know where your confusion comes from.
    What I'm saying is this:
    16-bit mode is only used during BIOS and the first part of the OS loader.
    Since 64-bit OSes don't have a virtual 16-bit mode anymore, you won't actually be using the 16-bit mode anywhere other than during BIOS. With EFI or something like that, you won't need 16-bit mode at all anymore, since you can start in 32-bit mode right away.
    Then after a while, 32-bit mode can be dropped aswell, and only 64-bit mode remains. No more need for mode-switching logic, and with only one mode, instruction decoding becomes simpler aswell, since you don't need to take the context of the current mode into account (instructions are encoded slightly differently in the different modes, certain instructions/operands are valid in one mode, but illegal in another, etc).

    So I'm talking about something completely different from you. I'm surprised you don't seem to know what 16-bit mode is (the legacy 8086 mode).
  • Scali - Thursday, December 10, 2009 - link

    Aside from that, I think YOU are confusing something.
    The small immediate operand encoding is not because of 16 bit instructions, but rather because of sign-extension modes.
    Therefore I can encode the '-1' in an instruction like push -1 with just 1 byte, even though it pushes 8 bytes on stack in 64-bit mode.
    If you want to use 16-bit instructions in 32-bit or 64-bit mode, you will get a prefix byte in front of your code, which will switch the CPU's instruction decoder to 16-bit mode for the next instruction.
    (and the opposite can be done in 16-bit mode... using a prefix to execute 32-bit wide instructions). So 16-bit instructions in a 32-bit/64-bit binary are actually LARGER, not smaller.
  • ET - Monday, December 7, 2009 - link

    Yes, I think it could help, but honestly, like the article says it's a matter of a few percents. A little bigger VM's, a few more transistors, and a few specific developers having to work harder (compiler makers).

    ARM is going to take over the market anyway. :)

Log in

Don't have an account? Sign up now