Motherboards Memory Storage Cases/Cooling/PSUs IT Computing Displays Mobile Mac CPUs & Chipsets Video Digital Cameras Linux Gadgets Systems Trade Shows Guides Home Increase Font Size Decrease Font Size Change Page Size
the x86 instruction proprietary extensions: a waste of time, money and energy
the x86 instruction proprietary extensions: a waste of time, money and energy
Date: December 6th, 2009
Author: Johan De Gelas
 
 

Agner Fog, a Danish expert in software optimization is making a plea for an open and standarized procedure for x86 instruction set extensions. Af first sight, this may seem a discussion that does not concern most of us. After all, the poor souls that have to program the insanely complex x86 compilers will take care of the complete chaos called "the x86 ISA", right? Why should the average the developer, system administrator or hardware enthusiast care?

Agner goes in great detail why the incompatible SSE-x.x additions and other ISA extensions were and are a pretty bad idea, but let me summarize it in a few quotes:
  • "The total number of x86 instructions is well above one thousand" (!!)
  • "CPU dispatching ... makes the code bigger, and it is so costly in terms of development time and maintenance costs that it is almost never done in a way that adequately optimizes for all brands of CPUs."
  • "the decoding of instructions can be a serious bottleneck, and it becomes worse the more complicated the instruction codes are"
  • The costs of supporting obsolete instructions is not negligible. You need large execution units to support a large number of instructions. This means more silicon space, longer data paths, more power consumption, and slower execution.
Summarized: Intel and AMD's proprietary x86 additions cost us all money. How much is hard to calculate, but our CPUs are consuming extra energy and underperform as decoders and execution units are unnecessary complicated. The software industry is wasting quite a bit of time and effort supporting different extensions.
 
Not convinced, still thinking that this only concerns the HPC crowd? The virtualization platforms contain up to 8% more code just to support the incompatible virtualization instructions which are offering almost exactly the same features. Each VMM is 4% bigger because of this. So whether you are running Hyper-V, VMware ESX or Xen, you are wasting valuable RAM space. It is not dramatic of course, but it unnecessary waste. Much worse is that this unstandarized x86 extention mess has made it a lot harder for datacenters to make the step towards a really dynamic environment where you can load balance VMs and thus move applications from one server to another on the fly. It is impossible to move (vmotion, live migrate) a VM from Intel to AMD servers, from newer to (some) older ones, and you need to fiddle with CPU masks in some situations just to make it work (and read complex tech documents). Should 99% of market lose money and flexibility because 1% of the market might get a performance boost?

The reason why Intel and AMD still continue with this is that some people inside feel that can create a "competitive edge". I believe this "competitive edge" is neglible: how many people have bought an Intel "Nehalem" CPU because it has the new SSE 4.2 instructions? How much software is supporting yet another x86 instruction addition?
 
So I fully support Agner Fog in his quest to a (slightly) less chaotic and more standarized x86 instruction set.

110 Comments
Username:
Password:
What we need. by iwodo, 98 days ago
We need an MUCH cleaned up of X86.
I am sure Apple will be one of those that is interested.

Reply
RE: What we need. by MrPoletski, 95 days ago
IMHO,

The best way to go about it is to start phasing out old instructions, but doing it in a way that any program that now crashes because they are gone can be run up in a VM environment that emulates them.

With over a thousand instructions there will be serious overlap too, so start amalgamating similar instructions into one, again maintaining the VM environment that can emulate them.

I.e, move the decoding of old hat instructions into software and re-organise the instruction set.

Do it over the next 3 gens of processor.

Should work out ok.

Reply
RE: What we need. by Lucky Stripes 99, 88 days ago
Motorola did this back in the days of the 68xxx series. Anyone with an Amiga remember the 68040.library?

Whenever a program attempted to issue an instruction that was legal on a 68020 or 68030 but was illegal on the 68040, it generated a trap. The 68040.library contained various handlers that would them emulate the retired instruction via software emulation.

You could do this today using a TSR under DOS or a kernel module under Windows, BSD or Linux.

Reply
RE: What we need. by Scali, 84 days ago
Yup. And Motorola isn't the only one.
IBM also moved part of their POWER instructionset from hardware to software in much the same way.
It actually worked reasonably well on Amiga. I recall that even certain variations of mul were software-emulated.

Reply
A solution by Shining Arcanine, 98 days ago
Just don't buy AMD processors until they abandon their extensions in favor of Intel's extensions. Problem solved.

There is absolutely no reason for AMD to make proprietary extensions to x86 that contradict Intel's extensions. Intel made x86 and whatever competitive advantage there is to doing so is negated by the fact that they have so little market share that no one cares about optimizing for their hardware.

Reply
RE: A solution by psychobriggsy, 98 days ago
What about when AMD do the extensions first, and Intel does something different?

Examples: 3DNow! and AMD's Virtualisation instructions (which were more functional than Intel's, at least early on).

The sad thing is that it is the broken x86 architecture itself that requires special virtualisation instructions to be present.

I say that in around 2015 32-bit compatibility x86 should be relegated to a separate 32-bit core in the CPU for all backward compatibility, and the main CPU cores should be 64-bit only, no backwards compatibility, maybe even have the ISA tweaked to account for this (64-bit instruction prefixes not required, for example).

Reply
RE: A solution by wolfman3k5, 97 days ago
If you're talking about x86-64 abandoning 32bit support, then you're clueless. x86-64 is an extension of the 32bit instruction set. What you're saying here has been thought of by AMD when they designed x86-64.

As for the addition of proprietary x86 instructions, wasn't AMD the company that added 64 bit instructions to the x86 instruction set? Weren't they the ones who created AMD64 or x86-64 as it's referred to? That little proprietary instruction set is what's allowing every Joe Six pack to be able to use more than 4GB of RAM on their desktop.

Reply
> 4GB supported for a long time by darthscsi, 97 days ago
No, AMD64 did not allow more than 4GB of ram, Physical Address Extension (PAE) did. PAE was introduced in the Pentium-Pro and allows more than 4GB of physical memory in 32-bit mode. There is no technical requirement that the virtually addressable memory is the same size as the physically addressable memory. In PAE, the page tables map 32 bit virtual addresses to 36 bit physical addresses. Microsoft chooses not to enable this in consumer OSes (but does in enable it in server OS builds). 32 bit x86 has supported 64 GB of ram with 4 GB process virtual address space for a long time.

http://en.wikipedia.org/wiki/Physical_Address_Extension

Reply
RE: > 4GB supported for a long time by JHBoricua, 97 days ago
Umm, PAE is essentially a hack that comes with a penalty hit. Even though MS enabled its use in their Server Os line, only a very few number of applications can take advantage of it (SQL 200x comes to mind).

The poster is right in that the 64-bit extensions AMD introduced to the x86 architecture paved the way for both Server and Desktop Operating Systems to be able to natively address >4GB of RAM. Not to mention that it paved the way for a greater number of applications to be developed to run natively on 64-bit x86.

Reply
RE: > 4GB supported for a long time by GIBson3, 97 days ago
You aren't 100% correct in your point about moving to a 36 bit addressable space. Any IA-32 (aka x86) program is able to address a Maximum of 32 bit's worth of memory, that's still a 4 gigabyte limit. The operating system has the ability to "page" programs to different 4gig blocks under PAE. The x86_64 extension set (pioneered by AMD, and later duplicated by Intel) enables a (current) Maximum of 48 bit's addressable in Virtual(that's 256 Tebibytes) and 40 bits in physical (that's 1 Tebibyte) address spaces, both of with can be pushed to 64/52 bits respectively. While PAE was the foundation on which x86_64's memory addressing system was based, when AMD was making the push for x86_64 Intel was taking a line of 64 bit isn't necessary in the home user space.

The standardization of the x86 instruction set makes a lot of sense, it would allow things like AMD64/EM64T to happen faster and more "evenly" instead of this 5 year battle between the two major x86 producers. While Intel may have created x86, they have certainly done their fair share of Messing with it. From a design standpoint the x86 instruction set is muddy compared to others, and really shouldn't have come out on top against others such as SPARC, ALPHA, Etc.

As for your point about Microsoft choosing not to enable it in consumer based OS's, look also at the limitations in place on the commercial OS's not all of those support "36 bit PAE" (all NT based os's support PAE, it's a question of whether or not they use the 36 bits instead of just 32). it's an artificial selling point on the part of Microsoft, there is no reason to limit consumer 32 bit OS's to 4 gigs of ram except to place a "premium" on the ability to use more than that.

Reply
RE: > 4GB supported for a long time by ThaHeretic, 96 days ago
@GIBson3: Yea mostly right; good post. PAE allowed the OS to address 36-bits of physical memory by expanding the physical address register from 32 to 64 bits of width, but each process is still limited to 4GB unless they setup some sort of memory file or block-like access.

There used to be a more significant performance hit for having PAE-enabled, particularly with TLB efficiency--with PAE you get half as many pagetable entries fit inside of your TLB--but that performance hit his become practically negligible with mordern TLBs especially when using larger page sizes (ie hugepages).

The OS code to setup non-PAE memory is simpler and pagetable entries are smaller, but meh. If you're running in 64-bit mode (x64 whatever), you're using PAE to setup your pagetables, so they've gone to great lengths to negate the 64-bit hit.

Also, IA32 was rather advanced for virtualization. It supported 4 rings of execution back in the 70s and 80s, and supported all ISA level features needed for visualization very, very early on in the academic discovery of virtualization. AMD64 restricted long mode (64-bit) execution to 2 rings of execution, which meant it did not provide sufficient and necessary conditions for virtualization, though in AMD's cause they had an IOMMU with fencing that helped work around this. Anyway, that's one of the key facets of both Intel's and AMD's virtualization extension, they add another ring of execution (-1 if you will) for hypervisor execution.

Reply
RE: > 4GB supported for a long time by ThaHeretic, 96 days ago
Eh I mistyped. I mean PAE expanded the physical address ENTRY (not register) in the pagetable from 32 to 64-bits. Because of this doubling of width, this meant that pagetables with the same number of entries occupy twice the space, and thus TLB's can only cache half as many entries.

Reply
RE: > 4GB supported for a long time by Lucky Stripes 99, 87 days ago
The four protection rings found in IA32 have nothing to do with virtualization (in the traditional sense). They are a form of security domain, not unlike the access control methods for pages or segments on processors with a full memory management unit.

Furthermore, the IA32 instruction set has numerous difficulties with regards to virtualization. It traditionally fails to meet the Popek and Goldberg requirements for virtualization due to a number of unprivileged instructions that can modify sensative status registers, interrupt registers and the stack.

AMD-V and Intel VT are supposed to restrict those instructions in addition to it ability to run a hypervisor with ring privilege mode -1.

Reply
RE: > 4GB supported for a long time by Calin, 92 days ago
The official reason for PAE limit of 4 GB on desktop operating systems are the drivers. Making drivers that work correctly with PAE on more than 32 bits is a bit more difficult.

Reply
RE: > 4GB supported for a long time by misium, 65 days ago
The official reason for PAE limit are not the drivers but the fact that PAE is still 32 bit architecture and thus uses the same 32-bit compilers with the same 32-bit pointers. 32-bit pointers mean 4GB of address space per process and thats it.

Reply
RE: A solution by darthscsi, 97 days ago
Intel once thought as you did, and created a processor with a 64bit instruction set which was incompatible with x86. They wound up with a seperate execution unit for x86 initially but now have dropped that in favor of binary emulation in SW. But you don't run an Itanium do you? You have a processor with extensive backwards comparability. You want a cleaner ISA? Vote with your dollars. (Yes I've had several Alphas and have been sad to see that ISA die).

Reply
RE: A solution by Shadowmaster625, 97 days ago
He's talking about having a dedicated x86 core to maintain backwards compatibility. This is a no-brainer. New multicore CPU's should only have one or two legacy cores, the rest should be more efficiently designed. I'm sure this will happen eventually, as soon as it becomes cheaper to design multicore CPUs is such an asymmetrical manner.

Reply
RE: A solution by MonkeyPaw, 96 days ago
Possibly like Fusion and OpenCL? Once GPUs come onto the CPU die and become standard, maybe we can see some processes move to a much cleaner ISA?

Reply
RE: A solution by phaxmohdem, 94 days ago
I don't know much about what exactly goes on at the instruction set level, but Combining the functions of GPU and CPU seems to me like it would add further complexity to the ISA. The processor would need some way to discern which instructions are meant to be dispatched to the GPU shader coreds, and which need to be sent to the regular CPU cores... Then it needs some rules for what to do with the data after it comes out of either the GPU or CPU pipeline.

Bottom line, is that while this utopian vision of a single ISA, unable to be modified by individual companies like Intel or AMD without consent, would perhaps improve things a little in the short run, lack of competition and the incentive to add something useful to your processor to set it apart would be detrimental to progress in the industry in the long run.


Reply
RE: A solution by Lucky Stripes 99, 87 days ago
Keep in mind that one major benefits of a CISC based instruction set is that you can theoretically achieve a greater code density than a RISC processor.

Look at ARM as an example. You need at least one 32-bit op to fetch and one 32-bit op to work the data. Under M68K, the whole thing can be done with a single 48-bit op. More complex forms of indirect addressing may require several more ops for ARM in order to get your offset. Under M68K, the offset is just added to the single op.

Sure, the ARM solution makes the prefetch and execution circuits much, much easier to implement. However, you end up taking a byte or two of overhead for each instruction versus the M68K. For IA32 which uses an even denser instruction set, the savings can be even greater.

Reply
RE: A solution by Scali, 84 days ago
I don't think that's a benefit anymore. These days memory and cache are relatively cheap. It's much easier to slap a few extra MB onto a system than it is to improve its performance per instruction.

Reply
RE: A solution by yuhong, 97 days ago
They already did abandon their own SSE5 in favor of AVX.

Reply
RE: A solution by npaladin2000, 97 days ago


If AMD abandons all AMD created extensions, say good-bye to the extension that is x64, since AMD is the one that created it and not Intel. In fact, it was specifically created to contradict Intel's Itanium. We were very happy about that beause Itanium stunk so bad at running x86 code.

Maybe Intel should abandon all Intel-created extensions for AMD ones because AMD made x64?

To some degree, you HAVE to have these guys competing, so we get to decide between the two of them (hence we now have x86-64 instead of Intel's nightmarish Itanium). Otherwise Intel makes all decisions, and we'd probably still trying to choose between x86 NetBurst and Itanium...which is kind of like trying to decide between being being beaten with a hammer or a baseball bat.



Reply
RE: A solution by wetwareinterface, 96 days ago
the x86-64 extensions by AMD were quite good and also superior to the extensions Intel created later.

however Itanium was not a bad cpu. far from it, it was faster running tasks than any other x86 cpu, sparc, power 4 then later 5, etc...

Itanium was a good product, it was held back by a lack of software to run on it as it was a completely new isa and only did x86 at all to maintain some backwards compatibility for orginizations who might need it. look at the old top 500 lists and where Itanium sat as a single cpu in benchmarks and that was only at 1GHz. if Itanium had gained any traction at all software would have been written for it specifically and Intel would have invested more R&D resources to mainstream it and we'd be seeing 3.4 GHz Itanium quad cores now. Itanium was a simple efort to do exactly what the original proponent of x-86 wants, to clean up the mess. backwards compatibility is the problem with x-86 right now, as a cpu and as a platform. physical irq's being limited to 16 (actually 15 because of another backwards compatibility issue) would not be a reality if we could ditch some crap baggage away from x-86. yes logical assignment by the operating system is the norm now but imagne what we could do with a much larger irq range alone. let alone a revamped floating point instruction set that doesn't have to carry the baggage that makes the current x-86 floatng point instructions a joke.

Reply
RE: A solution by ThaHeretic, 96 days ago
So IA64's specific weakness was that they (HP/Intel) assumed it was easier to predicate logic in software than it was in hardware. What the learned was that this is not the case; it's just hard anywhere you try. You can only predicate so many branches in advance before you run out of functional units to matter how wide your architecture is and it requires explicit knowledge and tuning of the software/binary/compile process to account for this hardware. The need to recompilation for optimal performance is heavy, and even Intel who has arguable the best compiler optimizers out there, have had great difficulty generating awesome binaries.

EPIC (Explicitly Parallel Instruction Computing) isn't even new, it's just a rebrand of VLIW (Very Long Instruction Word) which in all previous incarnations ultimately failed and earned a bad reputation. ie VAX went the way of the doodoo. Itanium is good in a very, very small niche market: multi-exabyte databanks.

IA64 wasn't never meant to "clean up the IA32" mess, it was meant to address a totally different market. AMD64 (x86-64) was meant to clean up the IA32 (x86) mess to a large extend. A lot of old stuff was removed from long mode, system specific stuff. Plus the x87 floating point stack was made obsolete but guaranteed inclusion of SSE1&2. Plus a doubling of the registers, etc. IA64 was always something totally different, never meant to replace IA32.

Reply
RE: A solution by wetwareinterface, 95 days ago
You have missed the IA64 mark by a longshot. IA64 doesn't predicate logic in software, it allowed software to handle it's own data and instruction width more efficiently. For instance you have to compare 2 16 bit values and fetch a 32 bit float. On x86 with no dependencies thats a lot of operations, 2 fetch's for the 16 bit values, a compare, then at least 2 stores (because of a serious lack of registers) then another fetch. In IA64 it can do all three fetches at once then store locally in a register the result on the 16 bit compare. That's just one case. There are several instances where IA64 simply kicks the crap out of x86 for doing what cpus do. The vliw is a means to an end, in VAX's case there wasn't enough resources behind the concept to make it worthwhile. In IA64's there is an aboundance of cpu horsepower to handle the concept of vliw. The compiler just has the ability to pack more fetches together if it can and do the job of the cpu ahead of time in organizing dependencies in some cases. The dependency resolve in the compiler was a bonus in the compiler to save even further cpu cycles on IA64 code and was neccesary due to the software x86 emulation. It was only required for x86 emulation because Intel wanted to junk x86 entirely. In any system there is a lot of non dependant data being fetched. The problem with x86 is you can't get too much ahead of time because of a lack of resources in the cpu and not many means to grab more at once. You can fetch to level 1 or lvl 2 cache in 64 bit chunks but because of the crap isa of x86 taking them into the alu or registers is a one after the other step. IA64 sought to get rid of the limitations of x86 and go forward with a 64 bit isa that was new.

Motorola/IBM/Apple did the exact same thing moving to Power PC, and it worked well for them. It meant slow software emulation for older code but a dramatic increase in new code and a new more modern isa without a lot of garbage they didn't need anymore. Intel was trying to do the same thing only they didn't have the partner in Microsoft that Motorola and IBM did in Apple. Meaning one focused on the mainstream desktop and willing to completely ditch legacy code and start over with a new cpu instruction set. Microsoft had a massivly larger user base and it was extremely varied and couldn't just drop everything the way Apple could.

HP on the other hand in the server space could devote a seperate effort to IA64. For HPC IA64 kicked the crap out of everything that then existed under HP-UX on a per cpu basis. The isa was very good even running at a MHz handicap. It took IBM going to Power 5 and ramping up the Ghz and Intel not updating IA64 due to spending their resources on Core2 to finally beat it. Make no mistake Itanium was a monster even at low clock speed. It just didn't get any software to run on it's own isa except in a few instances and those for HPC or server roles. You can't compare what IA64 can do with desktop centric performance benchmarks because you aren't running any IA64 code at all. You are running a cross isa emulator. And give Intel some credit on their jit compiler because it rocked. It took a completely foreign instruction set and ran at nearly the same speed as the cpus it was designed to run on, but on a foreign cpu to the code. People complained about the speed of IA64 running office and similar x86 apps under emulation as being like an older generation x86 cpu. Try running Pear PC (a Power PC emulator) and just time the install of OS8 even today on a core i7 920 overclocked to 4GHz and tell me how bad Intel's Itanium was at x86 emulation.


Lack of software on IA64 is what killed IA64 not the isa.

Also it was actually Intel's intent to transition to IA64's isa for the mainstream. First was server, then workstation Xeon motherboards would take either IA64 or x86 Xeons. Then the mainstream parts would come after. AMD threw the monkey wrench in the whole Xeon Itanium/x86 transition with the Opteron/x86-64 move.

Reply
RE: A solution by mgambrell, 92 days ago
I just want to clarify something here. Apple's handful of toady developers can be pushed around, but Microsoft doesnt have that clout over their hundreds of thousands of developers. It isn't even possible. I enjoy watch them try just to kick people off XP, and you think you could get them to ditch x86? Ha.

Reply
RE: A solution by cbemerine, 73 days ago
"...but Microsoft doesn't have that clout over their hundreds of thousands of developers. It isn't even possible. I enjoy watch them try just to kick people off XP..."

I do not know what planet you are living on, but they most certainly do have the clout to push every XP user off of it. While via the developers is one minor path; over the last 20+ years Microsoft has been more successful kicking people off older platform via the following methods: Hardware (Intel, Nvidia and others); Software (Corel, Novell and others); BIOS vendors: (all but Coreboot); and of course their own forced auto-updates and auto-upgrade process.

Its total vendor lock-in and has been so since mid way through Windows 2000. The only way out is not to play...Linux, Unix or Mac OSX.

Your delusional to ignore past abuses and facts, though you are hardly alone.

My preferred method is to set a "7 Year Clock"; if after 7 years of actions on the part of Microsoft and those they influence, they are being a good corporate citizen and leaving FUD vendor lock-in tactics behind...based on their ACTIONS, not words...than and only than will I purchase their products. When a vendor causes problems with software/hardware I am running, I do not blame the software, but THAT VENDOR! It really is that simple.

Reply
RE: A solution by alxx, 94 days ago
Sorry your a bit wrong there
VLIW is still heavily used by TI in their dsp cores.
Look at their C6000 series and C6400+ , also in the dsp unit in their OMAP cores used in a lot of mobile phones and in the dsps used in some base stations and a lot of other comms equipment.

A more correct statement would be vliw failed in general purpose computing.

http://www.eetasia.com/ART_8800445205_499489_NP_cb274e20.HTM
http://www.ece.umass.edu/ece/koren/architecture/VLIW/2/ti1.html
http://focus.ti.com/paramsearch/docs/pa...ilyId=132&sectionId=2&tabId=57

Interesting book
Embedded computing. A VLIW approach to architecture, compilers & tools

Reply
RE: A solution by bsoft16384, 93 days ago
Well, the problem is the assumption that predication is the solution to branch performance issues at all. The reality is that most branches are predictable enough that predication doesn't really buy you much. It's only in the situation when you have a highly unpredictable branch that branch prediction really breaks down, and then predication starts to be much more useful.

Note that there are some predicated instructions on x86 as well, but not anywhere near the same scope as on Itanium.

It's not quite correct that EPIC isn't new. EPIC is very VLIW-like, but it solves a number of VLIW problems (e.g. how to keep software compatibility for future CPU generations that are wider).

The bottom line is that we've basically run out of ILP for most code, at least with current research. Increasing the instruction window doesn't get you much, and increasing the issue width doesn't get you much either.

VLIW/EPIC works really well on some programs, but the bottom line is that the magic compilers that make VLIW/EPIC "better" than an out-of-order multiple-issue design don't really exist. ICC and other good compilers show us that VLIW/EPIC is better some of the time, on some code. In other cases, it's considerably worse.

I know a lot of people who worked on Itanium (I grew up in Fort Collins, where the HP design team worked) and I remember the rhetoric well. Itanium was NOT just a mainframe CPU. Itanium was going to replace HP-PA (Itanium is in many ways very PA-RISC like), it was going to replace other RISC architectures, and eventually it was supposed to replace x86. It was supposed obe a server architecture, a workstation architecture, and eventually a desktop/mobile architecture.

Many, many people at HP believed the rhetoric. Was that because they were naive or stupid? No. It's because VLIW (and by extension EPIC) always looks better on paper than it is in practice. VLIW allows you to have wider designs with less logic since you use far fewer resources on resolving instruction dependencies. The reality is that dependencies are sometimes very hard to resolve at compile time. The reality is that some code just doesn't have that much ILP to begin with. The reality is that code is often memory bottle-necked anyway.

In a way, Itanium was like the Pentium 4. Both are brilliant on paper, and both perform more poorly in practice. The great irony is that Intel decided to push for more parallelism in one design (Itanium) and less in another (Pentium 4). Itanium was supposed to be faster because it was wider and therefore did more per clock. P4 was supposed to be faster because very high clocks would make up for lower IPC.

The reality is that neither extreme really works.

P4 ran out of gas because the process technology simply couldn't make a 10GHz P4. Architecturally, P4 was (and is) capable of very high clocks; P4 still has the clock records at over 8GHz. But a CPU needs to be manufactured, and leakage current (and other factors) prevented an 8GHz P4 from being practical.

Itanium ran out of gas because you can only get so much from ILP. Itanium is wide, has more registers than you could ever want, and has huge caches. It's a 2+ billion transistor (1B+ per core) monstrosity, more than 3x as many as Lynnfield (i7) per core. Despite all the hype, Itanium didn't end up being simpler than out-of-order CPUs, and it didn't end up being dramatically faster per clock (except on certain applications).

Are these faults of the IA-64 architecture, of the Itanium design, of Intel's manufacturing, or of the compilers and software? We'll probably never really know for sure. But we do know that CPU design is about making trade-offs, and that designs that look good on paper often perform poorly in practice.

Reply
RE: A solution by piroroadkill, 97 days ago
Oh, so I guess we wouldn't be using AMD64 then?

Shut the hell up.

Reply
RE: A solution by Shining Arcanine, 96 days ago
It is now known as Intel E64MT.

Stop being a fanboy.

Reply
RE: A solution by johnsonx, 96 days ago
Intel actually calls it EM64T. Anywhere outside of Intel, it's called AMD64. Fanboi.


Reply
RE: A solution by bersl2, 96 days ago
Everybody has different names for it. x86-64. x86_64. x64 (BTW, I want the person who came up with that particular abomination taken out back and shot).

Or better yet, stop the foolishness and just call it "64-bit x86". Everybody will know what you mean. Nobody will be offended.

Or we could just switch to a *sane* instruction set. I almost don't care which one.

Reply
RE: A solution by piroroadkill, 96 days ago
Mostly AMD64, though:

BSD systems such as FreeBSD, NetBSD and OpenBSD refer to both AMD64 and Intel 64 under the architecture name "amd64".

Debian, Ubuntu, and Gentoo refer to both AMD64 and Intel 64 under the architecture name "amd64".

Java Development Kit (JDK): The name "amd64" is used in directory names containing x86-64 files.

Microsoft Windows: x64 versions of Windows use the AMD64 moniker... ...For example, the system folder on a Windows x64 Edition installation CD-ROM is named "AMD64"...

Solaris: The "isalist" command in Sun's Solaris operating system identifies both AMD64- and Intel 64–based systems as "amd64".


Reply
RE: A solution by piroroadkill, 96 days ago
Woah, I'm not a fanboy at all, I have an Intel system, infact, the last AMD processor I bought was a K6-2, but it's unavoidable to say that AMD invented the 64 bit x86 extensions we use today.

"AMD licensed its x86-64 design to Intel, where it is marketed under the name Intel 64 (formerly EM64T)."

So please, get your facts right.

Reply
RE: A solution by Griswold, 89 days ago
First of all, it used to be called EM64T. Now Intel calls it Intel64.

However, literally everyone calls it AMD64. Linux distros refer to it as AMD64, even Microsoft does so.

So, before you call somebody a fanboy, you should stop being a fanboy and get your facts straight. Makes it less embarassing for you.

Reply
RE: A solution by Scali, 89 days ago
There are two sides to this story.
Developers tend to call it 'AMD64' because that is the original name that AMD used.
Hence, when you browse through folders, you'll often find AMD64 in filenames and directory names.

However, the problem is that people who are less familiar with hardware won't understand that their Intel processor can run AMD64 code. It can be rather confusing. Hence, Microsoft uses x64 in product names and marketing material. It is a simple name, looks like the x86 which people are already familiar with, and doesn't have a direct link to any brand.
Microsoft would probably just have used '64', but they already used that for Itanium products, so x64 is there to distinguish x86 from Itanium.

Reply
RE: A solution by Calin, 96 days ago
Yes, AMD should no longer use the AMD64, 64-bit instructions and instead go with Intel's 64-bit instructions...
...wait, the Intel 64-bit instructions are AMD's 64-bit instructions

Reply
RE: A solution by WaltC, 92 days ago
/[Just don't buy AMD processors until they abandon their extensions in favor of Intel's extensions. Problem solved.]/

Or we could solve it by not buying Intel cpus until Intel decided to go with 100% AMD instruction extensions (I actually haven't bought an Intel cpu since 1999, btw.) To some extent, that's actually what happened with Core 2 64-bit Intel x86 cpus, isn't it? AMD's allowed them to use x86-64 all these years, and since Intel threw in the towel and just wrote AMD a $1.25B check, their new cross-licensing agreement provides Intel with at least 5 more years of x86-64 utilization in its cpus.

I don't think that "not buying" either company's cpus is any kind of a solution, seriously. Most people aren't going to do that because most people don't care what brand of cpu they buy in their box--they're buying box brand and price, primarily, and don't know or care about the differences in x86 cpus inside.

I sympathize with the programmer's point of view, here--I really do. Standardizing instructions certainly would make things simpler for the programmer. However, I'm also a firm believer in competition, and two heads are always better than one, imo. x86-64 was 100% AMD's invention, and Intel had to pick it up because it was so successful. OTOH, there've been Intel instruction set extensions which AMD has picked up for the same reason. So for all intents and purposes, there are extraneous x86 extensions made by both companies which programmers should pretty much ignore--unless they want to specialize for a particular cpu--which means they'll be limiting themselves to a smaller market--which means they probably won't do it.

I think that if both companies "agreed" on a particular set of extensions then it would limit innovation and future product improvement and introduce a lot of stagnation into cpu development. It would surely simplify things for programmers, but it would also slow down product R&D.

The problem here is we've got two distinct viewpoints: the cpu manufacturers' and the programmers', and they aren't necessarily the same at all. Conflicts like this are inevitable in a competitive cpu market. It isn't Intel versus AMD that we are really talking about, it's Intel and AMD versus programmers who naturally would prefer to have everything much simpler...;)


Reply
RE: A solution by Targon, 63 days ago
Some basic facts, since you seem to have missed several generations worth of processor development:

Intel started trying to make AMD processors incompatible with certain applications by adding SSE. AMD responded with 3DNow. As time went on, Intel stuck with the idea of trying to make AMD processors not run certain applications or not run them well with new instructions over time, while AMD really didn't do it beyond the 3DNow! set.

The move from 32 bit to 64 bit processors in the home market is ALL due to AMD adding 64 bit instructions to the set of 32 bit instructions of the time. This was not a case of trying to make some useless set of instructions, but was a true desire to bring 64 bit processing to the masses while providing improved performance in 32 bit applications. If you want to kill all AMD extensions, then you kill 64 bit support since Intel copied AMD instructions. Intel 64 bit is Itanium, which is a failed platform, even if there are a handful of systems running it.

You can't blame AMD for the useless extra instructions when Intel is to blame.


Reply
RE: A solution by Targon, 63 days ago
Some basic facts, since you seem to have missed several generations worth of processor development:

Intel started trying to make AMD processors incompatible with certain applications by adding SSE. AMD responded with 3DNow. As time went on, Intel stuck with the idea of trying to make AMD processors not run certain applications or not run them well with new instructions over time, while AMD really didn't do it beyond the 3DNow! set.

The move from 32 bit to 64 bit processors in the home market is ALL due to AMD adding 64 bit instructions to the set of 32 bit instructions of the time. This was not a case of trying to make some useless set of instructions, but was a true desire to bring 64 bit processing to the masses while providing improved performance in 32 bit applications. If you want to kill all AMD extensions, then you kill 64 bit support since Intel copied AMD instructions. Intel 64 bit is Itanium, which is a failed platform, even if there are a handful of systems running it.

You can't blame AMD for the useless extra instructions when Intel is to blame.


Reply
RE: A solution by Exophase, 63 days ago
Many years ago, long before any of the "media extension" instruction sets came to be, a legal agreement was reached between Intel, AMD, and other x86 manufacturers that allowed them to freely implement any instruction set changes that the others made.

Intel didn't make SSE to try to make AMD processors incompatible. You have the order backwards anyway - 3DNow! came first with AMD K6-2, while SSE wasn't available until the later released Pentium 3. Intel went with SSE instead of 3DNow! because it's a less limited design, not because they wanted to split the market. This is indicated by the fact that AMD eventually moved to SSE support instead of 3DNow!.

I don't think any of the extensions are useless, although they might appear that way to people who don't have particular use for them. They're added because Intel or AMD believes that enough people will benefit from them, and this belief is usually based on programmer feedback. If you look at the instructions and possibly do a little research then it's not hard to see applications where they would prove beneficial. That doesn't make the decision justified in the long run, but I don't think that they're keen on adding expensive execution functionality to their cores just so they can have something to advertise. If that were their angle they'd probably just start making things up.

Reply
Unique Instructions by GourdFreeMan, 97 days ago
If you count all of the x86 instructions from different vendors, and treat uses of different types of source registers as different instructions, there are ~3000 of them. See http://www.nasm.us/doc/nasmdocb.html

In actual fact, though, even when only counting unique opcodes a large number of instructions are the same -- just treating the data in the source registers as being different sized, breaking up the vector registers differently, or doing the same integer operations in signed and unsigned modes.

Decoding and dispatching is not as hellacious as these numbers might suggest, as most instructions are encoded so their bits actually have meaning as to what functional subset they belong.

I will concede there are many legacy instructions that clutter the instruction space (BCD anyone?). Backwards compatibility (with forward performance improvements) is generally the reason x86 won the processor wars, however...

Frankly, I am surprised Intel and its rivals have cooperated so well in respecting each other’s machine code to date. I could very well see Intel treating the x86 instruction space as its own and charging competitors to add their own proprietary extensions. Those who didn't play ball would have their old processors become incompatible with future generations of the x86 architecture... perhaps there are segments of their cross-licensing agreement with AMD (redacted in the public document) that forbid this?

Reply
RE: Unique Instructions by jensend, 97 days ago
If they did that they'd have at least $100 billion in antitrust fines rather than $1 billion.

Reply
RE: Unique Instructions by tygrus, 96 days ago
It would be interesting to see a comparison of the # of instructions in each instruction set (RISC vs x86).

Having memory addresses (or indirect) as sources makes a messy ISA and implementation. Explicitly load into register then calculations use up to three registers is much better (RISC). Only loads, stores and jumps use memory addresses. Old x87 FP stack was horrible.

Could Intel or AMD resurrect Alpha or design new RISC with sensible vector extension.

Could AMD create a CPU that started executing both branches without commit and discard the result of the wrong execution branch. Like Hyperthreading but the threads become a clone.

Reply
RE: Unique Instructions by titan7, 96 days ago
When Apple moved from CISC 680x0 to RISC PowerPC they actually had MORE instructions than before. So don't get too hung up on the absolute instruction count.

Branch Prediction is on average no worse than 50% correct (if it guessed randomly), but often well above 90%. Doing both branches would mean 2x the power use for as little as 5% more speed overall.

Reply
RE: Unique Instructions by Zool, 95 days ago
"Branch Prediction is on average no worse than 50% correct (if it guessed randomly), but often well above 90%. Doing both branches would mean 2x the power use for as little as 5% more speed overall."
Than my question would be why branch prediction,speculation on today AMD adn Intel cpu-s takes so much space from the core die. For the 5% more speed overal ?
The thing is that with 15 and more stage super-scalar pipelines the penalty is much more than 5%. It depend on how much branches the code actualy contains. But they cant count on this and need to make the performance balanced in both cases. IBM power5 and 6 have 14 stage pipelines, Intel Nehalem 16 stage pipeline and amd phenom (i could find it only for opteron which is the same) 12 stage integer/17 stage floating point.
Doing both branches doesnt seem such a bad idea if u would have a very simple core and you could forget branch prediction/speculation.

Reply
RE: Unique Instructions by GourdFreeMan, 96 days ago
"Having memory addresses (or indirect) as sources makes a messy ISA and implementation. Explicitly load into register then calculations use up to three registers is much better (RISC). Only loads, stores and jumps use memory addresses. Old x87 FP stack was horrible."

Everything is a trade-off. Consider the cache footprint and the absolute instruction length of the machine code for both architectures. There are no absolute wins, except in the rose-colored world of academia. I will concede the x87 FP stack was simply a dinosaur from a previous age of microprocessors, however.

"Could Intel or AMD resurrect Alpha or design new RISC with sensible vector extension."

You know, with a clean-slate design you can do anything... except be the dominate microarchitecture in the computing world. More seriously, the only market for new architectures is the HPC domain of supercomputers... which is probably much better served by research into heterogeneous computing systems with a small number of complex cores that handle branchy code augmented by a larger number of simpler cores that do fast vector processing (e.g. Cell, GPCPU, etc.).

Reply
RE: Unique Instructions by GourdFreeMan, 96 days ago
Whoops... spelling error. Change "dominate" to "dominant" in my previous post.

Reply
Comments Page 1 of 3





AnandTech.com Blog Categories
All categories
Anand's Macdates
Anand's Theater Construction
Anand's Updates
Cases and Power Supplies
CeBIT 2008
CES 2008
Computex 2009
Derek Decanted
Eddie's Got Game
Gary's First Looks
IT Computing general
Jarred's Musings
Kris's Corner
Raja's Ramblings
Rob's Experiences...
Ryan's Ramblings
Virtualization
What's New with Wes
Blank
Blank

Blank

Latest news by
DailyTech

 March 12, 2010

Blank
Blank
Blank
Blank
Blank
Blank

 March 11, 2010

Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank

 March 10, 2010

Blank


more Blogs Discussions



pipeboost
Copyright © 1997-2010 AnandTech, Inc. All rights reserved. Terms, Conditions and Privacy Information.
Click Here for Advertising Information