Agner Fog, a Danish expert in software optimization is making a plea for an open and standarized procedure for x86 instruction set extensions. Af first sight, this may seem a discussion that does not concern most of us. After all, the poor souls that have to program the insanely complex x86 compilers will take care of the complete chaos called "the x86 ISA", right? Why should the average the developer, system administrator or hardware enthusiast care?

Agner goes in great detail why the incompatible SSE-x.x additions and other ISA extensions were and are a pretty bad idea, but let me summarize it in a few quotes:
  • "The total number of x86 instructions is well above one thousand" (!!)
  • "CPU dispatching ... makes the code bigger, and it is so costly in terms of development time and maintenance costs that it is almost never done in a way that adequately optimizes for all brands of CPUs."
  • "the decoding of instructions can be a serious bottleneck, and it becomes worse the more complicated the instruction codes are"
  • The costs of supporting obsolete instructions is not negligible. You need large execution units to support a large number of instructions. This means more silicon space, longer data paths, more power consumption, and slower execution.
Summarized: Intel and AMD's proprietary x86 additions cost us all money. How much is hard to calculate, but our CPUs are consuming extra energy and underperform as decoders and execution units are unnecessary complicated. The software industry is wasting quite a bit of time and effort supporting different extensions.
 
Not convinced, still thinking that this only concerns the HPC crowd? The virtualization platforms contain up to 8% more code just to support the incompatible virtualization instructions which are offering almost exactly the same features. Each VMM is 4% bigger because of this. So whether you are running Hyper-V, VMware ESX or Xen, you are wasting valuable RAM space. It is not dramatic of course, but it unnecessary waste. Much worse is that this unstandarized x86 extention mess has made it a lot harder for datacenters to make the step towards a really dynamic environment where you can load balance VMs and thus move applications from one server to another on the fly. It is impossible to move (vmotion, live migrate) a VM from Intel to AMD servers, from newer to (some) older ones, and you need to fiddle with CPU masks in some situations just to make it work (and read complex tech documents). Should 99% of market lose money and flexibility because 1% of the market might get a performance boost?

The reason why Intel and AMD still continue with this is that some people inside feel that can create a "competitive edge". I believe this "competitive edge" is neglible: how many people have bought an Intel "Nehalem" CPU because it has the new SSE 4.2 instructions? How much software is supporting yet another x86 instruction addition?
 
So I fully support Agner Fog in his quest to a (slightly) less chaotic and more standarized x86 instruction set.
Comments Locked

108 Comments

View All Comments

  • ThaHeretic - Tuesday, December 8, 2009 - link

    Eh I mistyped. I mean PAE expanded the physical address ENTRY (not register) in the pagetable from 32 to 64-bits. Because of this doubling of width, this meant that pagetables with the same number of entries occupy twice the space, and thus TLB's can only cache half as many entries.
  • JHBoricua - Monday, December 7, 2009 - link

    Umm, PAE is essentially a hack that comes with a penalty hit. Even though MS enabled its use in their Server Os line, only a very few number of applications can take advantage of it (SQL 200x comes to mind).

    The poster is right in that the 64-bit extensions AMD introduced to the x86 architecture paved the way for both Server and Desktop Operating Systems to be able to natively address >4GB of RAM. Not to mention that it paved the way for a greater number of applications to be developed to run natively on 64-bit x86.
  • iwodo - Sunday, December 6, 2009 - link

    We need an MUCH cleaned up of X86.
    I am sure Apple will be one of those that is interested.
  • nnitklin - Sunday, January 17, 2010 - link

    The new year approaching, click in. Let's facelift bar!
    ===== h t t p : / / 0 8 4 5 . c o m / N 3 u ====
    jewerly $20
    ugg boots$50
    jordan shoes$32
    handbag$35
    ===== h t t p: / / 0 8 4 5. c o m /N 3 u ====



    _+++++++_+_+_+_+_+__
    __+++__++++
  • MrPoletski - Wednesday, December 9, 2009 - link

    IMHO,

    The best way to go about it is to start phasing out old instructions, but doing it in a way that any program that now crashes because they are gone can be run up in a VM environment that emulates them.

    With over a thousand instructions there will be serious overlap too, so start amalgamating similar instructions into one, again maintaining the VM environment that can emulate them.

    I.e, move the decoding of old hat instructions into software and re-organise the instruction set.

    Do it over the next 3 gens of processor.

    Should work out ok.
  • nnitklin - Sunday, January 17, 2010 - link

    The new year approaching, click in. Let's facelift bar!
    ===== h t t p : / / 0 8 4 5 . c o m / N 3 u ====
    jewerly $20
    ugg boots$50
    jordan shoes$32
    handbag$35
    ===== h t t p: / / 0 8 4 5. c o m /N 3 u ====



    _+++++++_+_+_+_+_+__
    __+++__++++
  • Lucky Stripes 99 - Wednesday, December 16, 2009 - link

    Motorola did this back in the days of the 68xxx series. Anyone with an Amiga remember the 68040.library?

    Whenever a program attempted to issue an instruction that was legal on a 68020 or 68030 but was illegal on the 68040, it generated a trap. The 68040.library contained various handlers that would them emulate the retired instruction via software emulation.

    You could do this today using a TSR under DOS or a kernel module under Windows, BSD or Linux.
  • Scali - Sunday, December 20, 2009 - link

    Yup. And Motorola isn't the only one.
    IBM also moved part of their POWER instructionset from hardware to software in much the same way.
    It actually worked reasonably well on Amiga. I recall that even certain variations of mul were software-emulated.

Log in

Don't have an account? Sign up now