Agner Fog, a Danish expert in software optimization is making a plea for an open and standarized procedure for x86 instruction set extensions. Af first sight, this may seem a discussion that does not concern most of us. After all, the poor souls that have to program the insanely complex x86 compilers will take care of the complete chaos called "the x86 ISA", right? Why should the average the developer, system administrator or hardware enthusiast care?

Agner goes in great detail why the incompatible SSE-x.x additions and other ISA extensions were and are a pretty bad idea, but let me summarize it in a few quotes:
  • "The total number of x86 instructions is well above one thousand" (!!)
  • "CPU dispatching ... makes the code bigger, and it is so costly in terms of development time and maintenance costs that it is almost never done in a way that adequately optimizes for all brands of CPUs."
  • "the decoding of instructions can be a serious bottleneck, and it becomes worse the more complicated the instruction codes are"
  • The costs of supporting obsolete instructions is not negligible. You need large execution units to support a large number of instructions. This means more silicon space, longer data paths, more power consumption, and slower execution.
Summarized: Intel and AMD's proprietary x86 additions cost us all money. How much is hard to calculate, but our CPUs are consuming extra energy and underperform as decoders and execution units are unnecessary complicated. The software industry is wasting quite a bit of time and effort supporting different extensions.
 
Not convinced, still thinking that this only concerns the HPC crowd? The virtualization platforms contain up to 8% more code just to support the incompatible virtualization instructions which are offering almost exactly the same features. Each VMM is 4% bigger because of this. So whether you are running Hyper-V, VMware ESX or Xen, you are wasting valuable RAM space. It is not dramatic of course, but it unnecessary waste. Much worse is that this unstandarized x86 extention mess has made it a lot harder for datacenters to make the step towards a really dynamic environment where you can load balance VMs and thus move applications from one server to another on the fly. It is impossible to move (vmotion, live migrate) a VM from Intel to AMD servers, from newer to (some) older ones, and you need to fiddle with CPU masks in some situations just to make it work (and read complex tech documents). Should 99% of market lose money and flexibility because 1% of the market might get a performance boost?

The reason why Intel and AMD still continue with this is that some people inside feel that can create a "competitive edge". I believe this "competitive edge" is neglible: how many people have bought an Intel "Nehalem" CPU because it has the new SSE 4.2 instructions? How much software is supporting yet another x86 instruction addition?
 
So I fully support Agner Fog in his quest to a (slightly) less chaotic and more standarized x86 instruction set.
POST A COMMENT

109 Comments

View All Comments

  • wavebossa - Friday, January 22, 2010 - link

    I realize that you guys are talking about cleaning up the x86 and fullly going 64bit, but lets not get carried away and lets actually focus on the article at hand..

    So let me get this straight, at this point... we are are only wasting 4-8% ram utilization...?

    Why not just, oh I don't know, buy more ram for now? I mean come on, we are not ready to get into to full 64bit, ppl still use windows 98, lol.

    Am I missing something? please tell me if I am.
    Reply
  • bcronce - Sunday, January 10, 2010 - link

    "As for your point about Microsoft choosing not to enable it in consumer based OS's, look also at the limitations in place on the commercial OS's not all of those support "36 bit PAE" (all NT based os's support PAE, it's a question of whether or not they use the 36 bits instead of just 32). it's an artificial selling point on the part of Microsoft, there is no reason to limit consumer 32 bit OS's to 4 gigs of ram except to place a "premium" on the ability to use more than that."

    If you ever read MSDN, most consumer level drivers didn't correctly support PEA and drivers that don't handle PEA don't play well with 4GB+ memory. Random errors that were hard to track down or repeat. Essentially your drivers didn't recognize the extra 4bit and when Windows said "Hey, I got free memory at 123456 on page 2" the driver goes, "YAY!! free memory at 123456" and would overwrite data on a wrong page. Being drivers are kernel level, the OS couldn't say "no" and you'd get random corruption/blue screens/weirdness.

    "Than my question would be why branch prediction,speculation on today AMD adn Intel cpu-s takes so much space from the core die."

    Branch prediction takes about 1/15th of the die space of each core on and i7 and is ~94%+-1% accurate. A quick google brought back a few science journals saying about 10-12% increase in average speed. So, for about 7% of the core die area, you increase the core speed by ~10-12%. Sounds like a decent trade-off... For now. I could see just dropping branch prediction when we start to get a lot more cores and adding more cores with the saved space.

    RISC vs CISC
    This was more of a debate a decade ago. Now days, modern CPUs have 200+ internal registers that are managed by the CPU so you only see the standard x86 registers. Exposing extra registers helps up to a point, but in a modern desktop computer that's juggling many applications, it's better to let the CPU manage it's own registers to help context switches. RISC can be good for low power situations though.

    Modern decoders could use a slimming by removing old deprecated instructions. I would be interested on how much die space could be saved by slimming down a decoder unit to work with all the modern versions of old instructions. I know the i7 has a quite complex decoder. It can combine multiple x86 instructions into single internal instructions. When the i7 detects a loop and if the loop can fit all of its converted micro-code into the instruction cache, it will shut down the decoder to save power.

    Reply
  • HollyDOL - Wednesday, December 09, 2009 - link

    And I say we should all stop using gasoline based vehicles right now.

    It's the same thing... While there are some more or less working alternatives you would effectively kill whole traffic.

    Same with computers... If you decide to throw away x86 (and it's 1234 extensions) or decide to reduce the set in favor of trashing outdated instructions, you are no longer backward compatibile and you are causing huge havoc... Will my software run on this reduced set or that reduced set, what do I need to make it work?

    Even though the x86+extensions gets more and more bloated over time you have to keep in mind backward compatibility. And emulators? I still have to see one that works 100%.

    Extensions being useless? Another point that I personaly consider false. My computer runs SETI@Home application... and the same task is done almost twice as fast using optimized application compared to normal one. Of my own software I have one cross comparing huge sets of data. Using 32bit ISA I can compare 32x32 records at one step... running the same software 64bit I compare 64x64 records at once... yay, just recompiling the app made it 4 times faster...

    I don't like the ISA getting bigger and bigger but I understand there is a reason behind that. Going to extremes if not for ISA extensions we would still run floating point operations on software emulation.

    my 2 cents
    Reply
  • - Thursday, December 10, 2009 - link

    The PC is a dinosaur and its X86 instruction set is a dodo bird. Relative to the Iphone, the PC is an underpowered overpriced shell of “could have been’s” . The Iphone gives the illusion of new age productivity- heck , I can flick my thumb to scroll- use two or three fingers to enlarge on two planes or three, I can watch movies, listen to music-and take it with me. Talk, internet, text, take and send pics…My computer uses a mouse, my productive programs use menus, as I click, click, click, to get things done- and oh the complexity of Word, Photoshop, C++. Rendering a photo still takes time at any cost –and who wants to spend $700 on a CPU for a 20% savings in productivity, on a Saturday afternoon. And ya know even if there were apps like the Iphone for a PC, it would take a a Quad Core CPUGPU to run and do what the IPhone does, thanks to Microsoft. It finally appears that Intel will no longer hinder the progress of the CPU’s, only now it finally melds with the coprocessor; a hybrid- fusion – the king is dead, long live the king

    asH
    Reply
  • RadnorHarkonnen - Friday, December 11, 2009 - link

    I giggled like a little girl.

    Have you tried to simulate a CISCO router on X86 CPU ? a 16mhz can bring the latests quad core to its knees. And running a GUI on cisco router ? Impossible.

    Diferent chips do difernet things mate. And btw i prefer my APhone.
    Reply
  • ProDigit - Wednesday, December 09, 2009 - link

    Should Windows ever be written for an ARM processor, we'd see battery life gains, and probably also cost reduction.
    If MS would put it's focus away from the x86 architecture, intel would be done for!
    And AMD would rule, because they have knowledge of the other architecture thanks to their GPU building.

    Then the world would be much easier and simpler!
    Intel would probably start making chips that support other architectures, and probably get some optimizations on them;
    But for some reason it's MS that decides what architecture leads!

    With many more newer Linux distributions supporting other architectures, AMD and Intel could start making those chips too, but I think they know that those markets are rather small...

    If MS came out with it's first Windows Mobile platform that would run on anything else than x86/64 we'd probably see a huge leap from one to the other.

    Netbooks, MIDS, cellphones, all would be able to run Windows, and quite energy efficient!
    I believe some netbooks have been tested, where a certain Linux would give 5-6 hours on x86 cpu's, while giving between 8 and 10 hours on ARM architecture.
    The ARM processor was a bit slower than the Atom, but nevertheless, if we'd see 2 hours of battery life gain on netbooks (or about 30% gain), that it will be something to look forward to!

    That's a big step forward!
    Reply
  • Scali - Thursday, December 10, 2009 - link

    I don't think MS decides.
    Back in the days of Windows NT 4, Microsoft supported x86, MIPS, Alpha and PowerPC (and for this reason, Windows NT was designed from scratch to be highly portable... In fact, the 32-bit Windows NT executable format is even called Portable Executable).
    You could run x86 binaries via a dynamic recompiler.

    By the time Windows 2000 arrived, most of these architectures were no longer being used in servers or workstations, so they only supported x86 and Itanium (where Itanium again got a dynamic recompiler for x86 after it turned out that the hardware-emulation of x86 wasn't that efficient).
    Later they added x86-64 support, and that's where we are today. Aside from Itanium, non-x86 systems are no longer supported, because there's nothing that really has any significant marketshare worth supporting.

    Ofcourse the alternative Windows CE/Mobile works on ARM, and supports most of the Windows API. But until recently, ARM was not powerful enough to run a complete up-to-date desktop OS, so there was no point in doing a 'full port' of the regular x86 Windows. Perhaps that time will come once ARM-based netbooks become more popular.
    It seems that MS would like to bring Windows 7 and Windows Mobile closer together anyway.
    Reply
  • cbemerine - Wednesday, December 30, 2009 - link

    "...But until recently, ARM was not powerful enough to run a complete up-to-date desktop OS, so there was no point in doing a 'full port' of the regular x86 Windows. Perhaps that time will come once ARM-based netbooks become more popular. ..."

    I would suggest that it has more to do with lack of marketing then desktop OS. As I had a computer in the palm of my hand in 2005/2006 thanks to Linux, Maemo, OS 2008 and the Nokia Nxxx (N770/N800). If only Nokia would have marketed it more effectively. Of course Google will do a much better job advertising their new open Linux Google Android phone than Nokia did/does so perhaps that time is now.

    If by saying the ARM is not powerful enough to run a complete up-to-date desktop OS you are referring to MS Windows specifically, then you are correct. The bloat, excessive memory requirements speak for themselves. I still lament the inability to reformat the hard drive, remove all bloat and run efficiently. With auto updates and auto upgrades the bloat is back after the first update and the poor user simply does not have any choice in the matter.

    However Linux has been running very effectively for many years in the embedded device space, with low levels of RAM memory and slower processors. I doubt you would suggest that Linux is not a desktop OS? At least I hope not.

    Now that the Nokia N900 (has cellular...I had everything else I needed with the Nokia N800.

    With the Nokia N800 I had: GPS, H.264 high def video codec, webcam, 2 memory slots (I still have not filled up my two 4GB Micro SSD memory cards and I see both 16GB and 32GB for sub $20 via Amazon of all places...it was a special months ago), full web browser, mic/sound jacks and speakers; WiFi; Bluetooth; touchscreen w/ stylus, full size bluetooth keyboard, and most important of all root account access (so you can tweak/configure applications); FM Chip...

    In fact there were over 450 apps available for the N800 and OS 2008 (Linux Maemo) last year, not counting the many Linux repositories where you could download and install apps on the device.

    Now in addition to the Nokia N900, 1st quarter 2010 will bring the first unlocked (root accessible with blessings of company from the beginning) Google Android.

    Shame the ability to install many applications on a computer this size since 2005/2006 has not been adequately advertised and marketed. Everyone I showed mine too, wanted one. How many people believe the Nokia N900 is the first, when the only thing it has, that the N900 (2006) does not is cellular, thats it.

    If you prefer the iPhone, Windows Mobile or other vendor locked OS and hardware, well that is a choice you made.

    The ARM is definitely powerful enough to run a desktop OS, but not every desktop OS. There are many versions of Linux, embedded or not, that will simply scream on that footprint (processor and memory). After all there are many Linux distros that will run just fine in 128MB of RAM, more is better, but they will do it!

    Perhaps you have picked the wrong desktop OS!
    Reply
  • Scali - Monday, January 04, 2010 - link

    '2006' qualifies as 'recently' with me. Making the rest of your post void. Reply
  • Penti - Saturday, December 12, 2009 - link

    Back in those days Microsoft even helped designing non-x86 systems.

    Windows NT was actually originally designed for Intel's RISC chip - Intel i860, but dropped before they completed. Or really OS/2 we might say.

    Any way Intel not developing Itanium does of course effect things, IA-64 or Itanium is dead. New products aren't really coming. Compaq killed Alpha-support when they bought DEC. And so on. The world wasn't so uniform before. Now there's mainly ARM, x86 and some POWER and SPARC, MIPS is still there in the embedded space too. In the mid 90's there where MIPS, POWER, x86, SPARC, Alpha and PA-RISC all big. ARM was brewing then too. Of course the seizure of PA-RISC development also has stopped the old HP-UX market from evolving. It didn't pick up much steam on Itanium. Latest Itaniums are dual-core 1.66GHz 90nm processors and didn't get hw virtualization until 2006. So it's understandable. Of course you much rather run high-end x86 servers. Or even Power or SPARC for that matter. Microsoft has no real interest to continue support for Itanium either. They did release 2008 Server however but. They where mainly used for MS SQL Server any way. Ancient hardware makes it unappealing for even that. But of course did MS had a role there. But they decided to run with x86. By the time 64-bit computing became interesting for their market both AMD and Intel had come with x86 cpus supporting it. Only natural for the databases to move there. While Sun and IBM still has their UltraSPARC and POWER servers. They made more strategic choices so software was continued together with hardware. MS could have supported IA64 better and pushed it. So could Intel. It's not like we has to have the same cpus in are desktops as in our servers. But it's mainly Intel that has decided that. They did try RISC cpus, EPIC and so forth. But x86 is where they did succeed where they failed with the others. Itanium didn't really get any real server features so, there's one where maybe they didn't learn so much. It's Intel in the 90's that was able to compete and outdo MIPS and Alpha systems. I can think of Pentium Pro that was a huge performer, Alpha was preferred for some use in high-end for a while, but was killed by Compaq and the failure of DEC. Apple is the ones behind why Windows never took of on PowerPC in a way, no PReP compatible Power Macintosh was released, clones where killed quickly. But however that is all pretty moot because people developed for Windows to get away from all that need to support multiple platforms and architectures. In a way the success of x86 is that it isn't bound to a single vendor. That broad adoption couldn't really be achieved by any one else. Of course the home and lower end corporate market has a lot to do with it. DOS lived for a long time. Where games go it wasn't really till 97 we where starting to see Windows games. The market moved to consolidate the diverse industry it was. Apple show however that MS could have switched arch if they wanted too. But none such move where really made, then came the Pentium Pro as far as workstations and servers are concerned. Then P2, P3, etc. All the rest where ancient history by then when talking none vendor-specific systems. Apple did got the Power-desktop market, Sun moved away from their sparc workstations, the workstation-market as a whole disappeared and the only player in the desktop market was really x86. And it was really the only way to move away from the kind of multi-vendor market with vendor lock-ins that where prevalent. That applied to servers too.

    Regarding binary emulation and such, it isn't until recently that could be made, therefore I think it was right that for example Apple didn't switch to x86 with OS X right away, they needed that backwards compatibility. For Classic environment. So backwards compatibility matters and it's only recently the software has turned up and the hardware is good enough to do it. Dropping x86 wouldn't been easy 15 years ago, and it's not easy today with even more legacy or baggage. But I don't think we need that any longer, as with the jump with Pentium Pro, we have really fast and advanced products on x86 today. Dumping x86 to develop x86 doesn't make sense. Hardware wise the legacy of the oldest stuff is already gone. I don't think it's a real problem for the decoders to handle that stuff. The older stuff can be emulated just fine though (see QEMU). But 16-bit BIOSes aren't completely gone yet even though they have stopped developing them. Peripherals are important too just to note. x86 has shown it can reinvent itself without resorting to new ISA. There the power lies. The legacy has just helped x86 cpus. MS was in a way trapped to x86 too, as they where expected to continue to support the machines sold with MS software. So it's not surprising to see them shine in a none-multi vendor and more unified software climate.
    Reply

Log in

Don't have an account? Sign up now