Agner Fog, a Danish expert in software optimization is making a plea for an open and standarized procedure for x86 instruction set extensions. Af first sight, this may seem a discussion that does not concern most of us. After all, the poor souls that have to program the insanely complex x86 compilers will take care of the complete chaos called "the x86 ISA", right? Why should the average the developer, system administrator or hardware enthusiast care?

Agner goes in great detail why the incompatible SSE-x.x additions and other ISA extensions were and are a pretty bad idea, but let me summarize it in a few quotes:
  • "The total number of x86 instructions is well above one thousand" (!!)
  • "CPU dispatching ... makes the code bigger, and it is so costly in terms of development time and maintenance costs that it is almost never done in a way that adequately optimizes for all brands of CPUs."
  • "the decoding of instructions can be a serious bottleneck, and it becomes worse the more complicated the instruction codes are"
  • The costs of supporting obsolete instructions is not negligible. You need large execution units to support a large number of instructions. This means more silicon space, longer data paths, more power consumption, and slower execution.
Summarized: Intel and AMD's proprietary x86 additions cost us all money. How much is hard to calculate, but our CPUs are consuming extra energy and underperform as decoders and execution units are unnecessary complicated. The software industry is wasting quite a bit of time and effort supporting different extensions.
 
Not convinced, still thinking that this only concerns the HPC crowd? The virtualization platforms contain up to 8% more code just to support the incompatible virtualization instructions which are offering almost exactly the same features. Each VMM is 4% bigger because of this. So whether you are running Hyper-V, VMware ESX or Xen, you are wasting valuable RAM space. It is not dramatic of course, but it unnecessary waste. Much worse is that this unstandarized x86 extention mess has made it a lot harder for datacenters to make the step towards a really dynamic environment where you can load balance VMs and thus move applications from one server to another on the fly. It is impossible to move (vmotion, live migrate) a VM from Intel to AMD servers, from newer to (some) older ones, and you need to fiddle with CPU masks in some situations just to make it work (and read complex tech documents). Should 99% of market lose money and flexibility because 1% of the market might get a performance boost?

The reason why Intel and AMD still continue with this is that some people inside feel that can create a "competitive edge". I believe this "competitive edge" is neglible: how many people have bought an Intel "Nehalem" CPU because it has the new SSE 4.2 instructions? How much software is supporting yet another x86 instruction addition?
 
So I fully support Agner Fog in his quest to a (slightly) less chaotic and more standarized x86 instruction set.
Comments Locked

108 Comments

View All Comments

  • Griswold - Tuesday, December 15, 2009 - link

    First of all, it used to be called EM64T. Now Intel calls it Intel64.

    However, literally everyone calls it AMD64. Linux distros refer to it as AMD64, even Microsoft does so.

    So, before you call somebody a fanboy, you should stop being a fanboy and get your facts straight. Makes it less embarassing for you.
  • Scali - Tuesday, December 15, 2009 - link

    There are two sides to this story.
    Developers tend to call it 'AMD64' because that is the original name that AMD used.
    Hence, when you browse through folders, you'll often find AMD64 in filenames and directory names.

    However, the problem is that people who are less familiar with hardware won't understand that their Intel processor can run AMD64 code. It can be rather confusing. Hence, Microsoft uses x64 in product names and marketing material. It is a simple name, looks like the x86 which people are already familiar with, and doesn't have a direct link to any brand.
    Microsoft would probably just have used '64', but they already used that for Itanium products, so x64 is there to distinguish x86 from Itanium.
  • piroroadkill - Tuesday, December 8, 2009 - link

    Woah, I'm not a fanboy at all, I have an Intel system, infact, the last AMD processor I bought was a K6-2, but it's unavoidable to say that AMD invented the 64 bit x86 extensions we use today.

    "AMD licensed its x86-64 design to Intel, where it is marketed under the name Intel 64 (formerly EM64T)."

    So please, get your facts right.
  • johnsonx - Tuesday, December 8, 2009 - link

    Intel actually calls it EM64T. Anywhere outside of Intel, it's called AMD64. Fanboi.
  • bersl2 - Tuesday, December 8, 2009 - link

    Everybody has different names for it. x86-64. x86_64. x64 (BTW, I want the person who came up with that particular abomination taken out back and shot).

    Or better yet, stop the foolishness and just call it "64-bit x86". Everybody will know what you mean. Nobody will be offended.

    Or we could just switch to a *sane* instruction set. I almost don't care which one.
  • piroroadkill - Tuesday, December 8, 2009 - link

    Mostly AMD64, though:

    BSD systems such as FreeBSD, NetBSD and OpenBSD refer to both AMD64 and Intel 64 under the architecture name "amd64".

    Debian, Ubuntu, and Gentoo refer to both AMD64 and Intel 64 under the architecture name "amd64".

    Java Development Kit (JDK): The name "amd64" is used in directory names containing x86-64 files.

    Microsoft Windows: x64 versions of Windows use the AMD64 moniker... ...For example, the system folder on a Windows x64 Edition installation CD-ROM is named "AMD64"...

    Solaris: The "isalist" command in Sun's Solaris operating system identifies both AMD64- and Intel 64–based systems as "amd64".
  • npaladin2000 - Sunday, December 6, 2009 - link



    If AMD abandons all AMD created extensions, say good-bye to the extension that is x64, since AMD is the one that created it and not Intel. In fact, it was specifically created to contradict Intel's Itanium. We were very happy about that beause Itanium stunk so bad at running x86 code.

    Maybe Intel should abandon all Intel-created extensions for AMD ones because AMD made x64?

    To some degree, you HAVE to have these guys competing, so we get to decide between the two of them (hence we now have x86-64 instead of Intel's nightmarish Itanium). Otherwise Intel makes all decisions, and we'd probably still trying to choose between x86 NetBurst and Itanium...which is kind of like trying to decide between being being beaten with a hammer or a baseball bat.

  • wetwareinterface - Tuesday, December 8, 2009 - link

    the x86-64 extensions by AMD were quite good and also superior to the extensions Intel created later.

    however Itanium was not a bad cpu. far from it, it was faster running tasks than any other x86 cpu, sparc, power 4 then later 5, etc...

    Itanium was a good product, it was held back by a lack of software to run on it as it was a completely new isa and only did x86 at all to maintain some backwards compatibility for orginizations who might need it. look at the old top 500 lists and where Itanium sat as a single cpu in benchmarks and that was only at 1GHz. if Itanium had gained any traction at all software would have been written for it specifically and Intel would have invested more R&D resources to mainstream it and we'd be seeing 3.4 GHz Itanium quad cores now. Itanium was a simple efort to do exactly what the original proponent of x-86 wants, to clean up the mess. backwards compatibility is the problem with x-86 right now, as a cpu and as a platform. physical irq's being limited to 16 (actually 15 because of another backwards compatibility issue) would not be a reality if we could ditch some crap baggage away from x-86. yes logical assignment by the operating system is the norm now but imagne what we could do with a much larger irq range alone. let alone a revamped floating point instruction set that doesn't have to carry the baggage that makes the current x-86 floatng point instructions a joke.
  • ThaHeretic - Tuesday, December 8, 2009 - link

    So IA64's specific weakness was that they (HP/Intel) assumed it was easier to predicate logic in software than it was in hardware. What the learned was that this is not the case; it's just hard anywhere you try. You can only predicate so many branches in advance before you run out of functional units to matter how wide your architecture is and it requires explicit knowledge and tuning of the software/binary/compile process to account for this hardware. The need to recompilation for optimal performance is heavy, and even Intel who has arguable the best compiler optimizers out there, have had great difficulty generating awesome binaries.

    EPIC (Explicitly Parallel Instruction Computing) isn't even new, it's just a rebrand of VLIW (Very Long Instruction Word) which in all previous incarnations ultimately failed and earned a bad reputation. ie VAX went the way of the doodoo. Itanium is good in a very, very small niche market: multi-exabyte databanks.

    IA64 wasn't never meant to "clean up the IA32" mess, it was meant to address a totally different market. AMD64 (x86-64) was meant to clean up the IA32 (x86) mess to a large extend. A lot of old stuff was removed from long mode, system specific stuff. Plus the x87 floating point stack was made obsolete but guaranteed inclusion of SSE1&2. Plus a doubling of the registers, etc. IA64 was always something totally different, never meant to replace IA32.
  • bsoft16384 - Friday, December 11, 2009 - link

    Well, the problem is the assumption that predication is the solution to branch performance issues at all. The reality is that most branches are predictable enough that predication doesn't really buy you much. It's only in the situation when you have a highly unpredictable branch that branch prediction really breaks down, and then predication starts to be much more useful.

    Note that there are some predicated instructions on x86 as well, but not anywhere near the same scope as on Itanium.

    It's not quite correct that EPIC isn't new. EPIC is very VLIW-like, but it solves a number of VLIW problems (e.g. how to keep software compatibility for future CPU generations that are wider).

    The bottom line is that we've basically run out of ILP for most code, at least with current research. Increasing the instruction window doesn't get you much, and increasing the issue width doesn't get you much either.

    VLIW/EPIC works really well on some programs, but the bottom line is that the magic compilers that make VLIW/EPIC "better" than an out-of-order multiple-issue design don't really exist. ICC and other good compilers show us that VLIW/EPIC is better some of the time, on some code. In other cases, it's considerably worse.

    I know a lot of people who worked on Itanium (I grew up in Fort Collins, where the HP design team worked) and I remember the rhetoric well. Itanium was NOT just a mainframe CPU. Itanium was going to replace HP-PA (Itanium is in many ways very PA-RISC like), it was going to replace other RISC architectures, and eventually it was supposed to replace x86. It was supposed obe a server architecture, a workstation architecture, and eventually a desktop/mobile architecture.

    Many, many people at HP believed the rhetoric. Was that because they were naive or stupid? No. It's because VLIW (and by extension EPIC) always looks better on paper than it is in practice. VLIW allows you to have wider designs with less logic since you use far fewer resources on resolving instruction dependencies. The reality is that dependencies are sometimes very hard to resolve at compile time. The reality is that some code just doesn't have that much ILP to begin with. The reality is that code is often memory bottle-necked anyway.

    In a way, Itanium was like the Pentium 4. Both are brilliant on paper, and both perform more poorly in practice. The great irony is that Intel decided to push for more parallelism in one design (Itanium) and less in another (Pentium 4). Itanium was supposed to be faster because it was wider and therefore did more per clock. P4 was supposed to be faster because very high clocks would make up for lower IPC.

    The reality is that neither extreme really works.

    P4 ran out of gas because the process technology simply couldn't make a 10GHz P4. Architecturally, P4 was (and is) capable of very high clocks; P4 still has the clock records at over 8GHz. But a CPU needs to be manufactured, and leakage current (and other factors) prevented an 8GHz P4 from being practical.

    Itanium ran out of gas because you can only get so much from ILP. Itanium is wide, has more registers than you could ever want, and has huge caches. It's a 2+ billion transistor (1B+ per core) monstrosity, more than 3x as many as Lynnfield (i7) per core. Despite all the hype, Itanium didn't end up being simpler than out-of-order CPUs, and it didn't end up being dramatically faster per clock (except on certain applications).

    Are these faults of the IA-64 architecture, of the Itanium design, of Intel's manufacturing, or of the compilers and software? We'll probably never really know for sure. But we do know that CPU design is about making trade-offs, and that designs that look good on paper often perform poorly in practice.

Log in

Don't have an account? Sign up now