Agner Fog, a Danish expert in software optimization is making a plea for an open and standarized procedure for x86 instruction set extensions. Af first sight, this may seem a discussion that does not concern most of us. After all, the poor souls that have to program the insanely complex x86 compilers will take care of the complete chaos called "the x86 ISA", right? Why should the average the developer, system administrator or hardware enthusiast care?

Agner goes in great detail why the incompatible SSE-x.x additions and other ISA extensions were and are a pretty bad idea, but let me summarize it in a few quotes:
  • "The total number of x86 instructions is well above one thousand" (!!)
  • "CPU dispatching ... makes the code bigger, and it is so costly in terms of development time and maintenance costs that it is almost never done in a way that adequately optimizes for all brands of CPUs."
  • "the decoding of instructions can be a serious bottleneck, and it becomes worse the more complicated the instruction codes are"
  • The costs of supporting obsolete instructions is not negligible. You need large execution units to support a large number of instructions. This means more silicon space, longer data paths, more power consumption, and slower execution.
Summarized: Intel and AMD's proprietary x86 additions cost us all money. How much is hard to calculate, but our CPUs are consuming extra energy and underperform as decoders and execution units are unnecessary complicated. The software industry is wasting quite a bit of time and effort supporting different extensions.
 
Not convinced, still thinking that this only concerns the HPC crowd? The virtualization platforms contain up to 8% more code just to support the incompatible virtualization instructions which are offering almost exactly the same features. Each VMM is 4% bigger because of this. So whether you are running Hyper-V, VMware ESX or Xen, you are wasting valuable RAM space. It is not dramatic of course, but it unnecessary waste. Much worse is that this unstandarized x86 extention mess has made it a lot harder for datacenters to make the step towards a really dynamic environment where you can load balance VMs and thus move applications from one server to another on the fly. It is impossible to move (vmotion, live migrate) a VM from Intel to AMD servers, from newer to (some) older ones, and you need to fiddle with CPU masks in some situations just to make it work (and read complex tech documents). Should 99% of market lose money and flexibility because 1% of the market might get a performance boost?

The reason why Intel and AMD still continue with this is that some people inside feel that can create a "competitive edge". I believe this "competitive edge" is neglible: how many people have bought an Intel "Nehalem" CPU because it has the new SSE 4.2 instructions? How much software is supporting yet another x86 instruction addition?
 
So I fully support Agner Fog in his quest to a (slightly) less chaotic and more standarized x86 instruction set.
Comments Locked

108 Comments

View All Comments

  • MonkeyPaw - Tuesday, December 8, 2009 - link

    Possibly like Fusion and OpenCL? Once GPUs come onto the CPU die and become standard, maybe we can see some processes move to a much cleaner ISA?
  • phaxmohdem - Wednesday, December 9, 2009 - link

    I don't know much about what exactly goes on at the instruction set level, but Combining the functions of GPU and CPU seems to me like it would add further complexity to the ISA. The processor would need some way to discern which instructions are meant to be dispatched to the GPU shader coreds, and which need to be sent to the regular CPU cores... Then it needs some rules for what to do with the data after it comes out of either the GPU or CPU pipeline.

    Bottom line, is that while this utopian vision of a single ISA, unable to be modified by individual companies like Intel or AMD without consent, would perhaps improve things a little in the short run, lack of competition and the incentive to add something useful to your processor to set it apart would be detrimental to progress in the industry in the long run.
  • wolfman3k5 - Monday, December 7, 2009 - link

    If you're talking about x86-64 abandoning 32bit support, then you're clueless. x86-64 is an extension of the 32bit instruction set. What you're saying here has been thought of by AMD when they designed x86-64.

    As for the addition of proprietary x86 instructions, wasn't AMD the company that added 64 bit instructions to the x86 instruction set? Weren't they the ones who created AMD64 or x86-64 as it's referred to? That little proprietary instruction set is what's allowing every Joe Six pack to be able to use more than 4GB of RAM on their desktop.
  • darthscsi - Monday, December 7, 2009 - link

    No, AMD64 did not allow more than 4GB of ram, Physical Address Extension (PAE) did. PAE was introduced in the Pentium-Pro and allows more than 4GB of physical memory in 32-bit mode. There is no technical requirement that the virtually addressable memory is the same size as the physically addressable memory. In PAE, the page tables map 32 bit virtual addresses to 36 bit physical addresses. Microsoft chooses not to enable this in consumer OSes (but does in enable it in server OS builds). 32 bit x86 has supported 64 GB of ram with 4 GB process virtual address space for a long time.

    http://en.wikipedia.org/wiki/Physical_Address_Exte...">http://en.wikipedia.org/wiki/Physical_Address_Exte...
  • GIBson3 - Monday, December 7, 2009 - link

    You aren't 100% correct in your point about moving to a 36 bit addressable space. Any IA-32 (aka x86) program is able to address a Maximum of 32 bit's worth of memory, that's still a 4 gigabyte limit. The operating system has the ability to "page" programs to different 4gig blocks under PAE. The x86_64 extension set (pioneered by AMD, and later duplicated by Intel) enables a (current) Maximum of 48 bit's addressable in Virtual(that's 256 Tebibytes) and 40 bits in physical (that's 1 Tebibyte) address spaces, both of with can be pushed to 64/52 bits respectively. While PAE was the foundation on which x86_64's memory addressing system was based, when AMD was making the push for x86_64 Intel was taking a line of 64 bit isn't necessary in the home user space.

    The standardization of the x86 instruction set makes a lot of sense, it would allow things like AMD64/EM64T to happen faster and more "evenly" instead of this 5 year battle between the two major x86 producers. While Intel may have created x86, they have certainly done their fair share of Messing with it. From a design standpoint the x86 instruction set is muddy compared to others, and really shouldn't have come out on top against others such as SPARC, ALPHA, Etc.

    As for your point about Microsoft choosing not to enable it in consumer based OS's, look also at the limitations in place on the commercial OS's not all of those support "36 bit PAE" (all NT based os's support PAE, it's a question of whether or not they use the 36 bits instead of just 32). it's an artificial selling point on the part of Microsoft, there is no reason to limit consumer 32 bit OS's to 4 gigs of ram except to place a "premium" on the ability to use more than that.
  • lemonadesoda - Sunday, March 21, 2010 - link

    There is so much rubbish here in this thread it is embarrassing. Do you guys think that an 8 bit processor could only handle 256 bytes of memory? Of course not.

    The width of the execution registers and the bit width of the memory, execution, and stack pointers have nothing to do with each other. There is no reason that a 16 or 32 bit processor cant have 64 bit memory register/pointers if it was designed that way.

    The problem is that a processor DESIGNED with only 16bit, 24bit, 32bit or 40bit memory pointers cannot just be swapped out with a 64bit memory pointer edition version and still be compatible. All the machine code fails. THAT is why page addressing and x86 extensions have been used.

    Intel and Microsoft have played with other microprocessor architectures... but they have never really caught on. And for the consumer, no matter how ugly x86 is, it works and the whole hardware and software industry is built around it. Changing
    that architecture is going to require a lot of bravery.

  • Calin - Saturday, December 12, 2009 - link

    The official reason for PAE limit of 4 GB on desktop operating systems are the drivers. Making drivers that work correctly with PAE on more than 32 bits is a bit more difficult.
  • misium - Friday, January 8, 2010 - link

    The official reason for PAE limit are not the drivers but the fact that PAE is still 32 bit architecture and thus uses the same 32-bit compilers with the same 32-bit pointers. 32-bit pointers mean 4GB of address space per process and thats it.
  • ThaHeretic - Tuesday, December 8, 2009 - link

    @GIBson3: Yea mostly right; good post. PAE allowed the OS to address 36-bits of physical memory by expanding the physical address register from 32 to 64 bits of width, but each process is still limited to 4GB unless they setup some sort of memory file or block-like access.

    There used to be a more significant performance hit for having PAE-enabled, particularly with TLB efficiency--with PAE you get half as many pagetable entries fit inside of your TLB--but that performance hit his become practically negligible with mordern TLBs especially when using larger page sizes (ie hugepages).

    The OS code to setup non-PAE memory is simpler and pagetable entries are smaller, but meh. If you're running in 64-bit mode (x64 whatever), you're using PAE to setup your pagetables, so they've gone to great lengths to negate the 64-bit hit.

    Also, IA32 was rather advanced for virtualization. It supported 4 rings of execution back in the 70s and 80s, and supported all ISA level features needed for visualization very, very early on in the academic discovery of virtualization. AMD64 restricted long mode (64-bit) execution to 2 rings of execution, which meant it did not provide sufficient and necessary conditions for virtualization, though in AMD's cause they had an IOMMU with fencing that helped work around this. Anyway, that's one of the key facets of both Intel's and AMD's virtualization extension, they add another ring of execution (-1 if you will) for hypervisor execution.
  • Lucky Stripes 99 - Thursday, December 17, 2009 - link

    The four protection rings found in IA32 have nothing to do with virtualization (in the traditional sense). They are a form of security domain, not unlike the access control methods for pages or segments on processors with a full memory management unit.

    Furthermore, the IA32 instruction set has numerous difficulties with regards to virtualization. It traditionally fails to meet the Popek and Goldberg requirements for virtualization due to a number of unprivileged instructions that can modify sensative status registers, interrupt registers and the stack.

    AMD-V and Intel VT are supposed to restrict those instructions in addition to it ability to run a hypervisor with ring privilege mode -1.

Log in

Don't have an account? Sign up now