AMD's 64-bit strategy - x86-64

We have yet to talk about a fundamental part of the K8/Opteron architecture - 64-bit support. The K8 is the world's first and only 64-bit x86 core, the reason it is the only such core is because AMD came up with the 64-bit extension to the x86 instruction set.

In the past, Intel had been the ones to extend the x86 ISA (Instruction Set Architecture) beyond its 8-bit foundation. But once Intel hit 32-bit and started to look towards the future and 64-bit microprocessors, they wanted to rid themselves of the somewhat bulky x86 ISA and move towards something much more robust - and thus, IA-64 was born.

The IA-64 ISA is significantly better than the x86 ISA in a number of ways, but the discussion of IA-64 is beyond the scope of this article as we're here to focus on x86. The biggest problem with the IA-64 ISA and thus IA-64 microprocessors is the lack of native x86 compatibility, effectively keeping IA-64 processors from running over two decades worth of software. Intel recognized this and equipped their IA-64 processors (Itanium, Itanium 2, etc…) with an x86-to-IA-64 decoder, that takes x86 instructions and decodes them into IA-64 instructions. This decoder is not the most efficient decoder nor is it the best way to run x86 code (the best way would be to run it natively on an x86 processor), and thus the Itanium and Itanium 2 offer quite poor performance under x86 applications.

The benefits of a 64-bit microprocessor architecture are mainly memory related; if you take two identical microprocessors, make one 64-bit and one 32-bit, the advantage of the 64-bit CPU is that it can address much more memory than the 32-bit CPU (2^64 vs. 2^32). For those that were hitting the limits of 32-bit memory addressability (4GB), Intel's only high performance solution was to transition to Itanium, but if all you're looking for is more than 4GB of memory and solid x86 performance, then you're SOL from Intel's perspective.

AMD's 64-bit strategy is significantly different; AMD has always been focused on the current customer needs, not on the vision of the computing future 5 - 10 years from now and this is reflected in their 64-bit strategy. The strategy is simple and has been done before in the past; stick with a high-performing x86 core, and simply extend the ISA to support 64-bit memory addressability - the end result is what AMD likes to call x86-64.

The features of x86-64, whose implementation in Opteron is called AMD64, are pretty straight forward:

1) backwards compatible with current x86 code
2) 8 new 64-bit general purpose registers (GPRs) as well as 64-bit versions of the original 8 x86 GPRs (only available in 64-bit long mode, described below)
3) SSE & SSE2 support along with 8 new SSE2 registers
4) Increased memory addressability for large dataset applications (only available in long mode, described below)
5) Solid performance in current 32-bit applications with support for 64-bit applications going forward, a good transitional processor

With the ability to run current x86 code as well as future x86-64 code, you can guess that the K8 has two operating modes; they are called "legacy" and "long."

In legacy mode, the K8 will run all native 16 or 32-bit x86 applications, the processor basically acts just as a K7 would.

Things get interesting in "long" mode where a 64-bit x86-64 compliant OS is required; in this mode, the K8 can either operate in full 64-bit mode or in compatibility mode. Full 64-bit mode allows for all of the advantages of a 64-bit architecture to be realized, including 64-bit memory addressability. One of the major features of the K8 architecture is the fact that the number of general purpose registers is doubled when in x86-64 mode, and thus this feature is also taken advantage of in full 64-bit mode.

Compatibility mode gives you none of the advantages of a 64-bit architecture on the application level, as it is designed for running 32-bit apps on a 64-bit OS (hence the name compatibility); The extra registers and 64-bit register extensions are ignored in this mode. Compatibility mode is important because of the 2GB process size limitation under Windows OSes. Although 32-bit Windows offers support for a maximum of 4GB of memory, each process can only use a maximum of 2GB of memory - the remaining 2GB is reserved for the OS. By running a 64-bit version of Windows (when released) and a 32-bit application, compatibility mode allows for each 32-bit process to have up to a full 4GB of memory, with the OS using anything above that marker.

Finally we have 64-bit long mode, where there is more than meets the eye. In addition to > 4GB memory addressability, in 64-bit long mode, applications have access to twice as many named general purpose registers. Remember that registers are basically high speed memory locations on the microprocessor where temporary values are stored. For example, if you were to compute the sum of two numbers, both of those numbers as well as the final result would be stored in these registers.

The problem you run into however is what happens when you run out of registers. You can't simply add more registers because older applications compiled with support for your instruction set architecture would not be able to recognize the new registers; if your ISA is originally designed with 8 registers, you're stuck with those 8 registers unless you change the ISA - something that doesn't happen too often as it tends to break backwards compability unless you take an AMD x86-64 route of extending the ISA, in which case you're still unable to offer more registers to older applications not compiled with the updated ISA in mind.

When you do run out of registers, there's no other option than to use main memory (or cache) as scratch space to store the temporary values. Unfortunately even going to cache is wasteful when compared to storing data in these high speed memory locations, and thus it is imperative that you store data in registers as often as possible. The more registers you have, the less likely you're to run out of them; it's as simple as that. The doubling of the number of registers definitely helps overall performance, assuming you're running a 64-bit OS and have applications that are compiled in 64-bit long mode to take advantage of the added registers.

Go Deep Look what we found, an on-die memory controller


View All Comments

Log in

Don't have an account? Sign up now