A New SSE Instruction Set: AMD Announces SSE5by Ryan Smith on August 30, 2007 4:00 PM EST
- Posted in
While extending the x86 instruction set with new iterations of SSE has become a regular activity in the computing industry, many of these additions are in actuality a gradual reshaping of x86 processors. Although as a general purpose CPU design x86 doesn't have any hard limitations (given enough time you can do any kind of calculation required) it has had several weak points patched up over the years. The basis of identifying and patching these weak points has been looking at what processors - general and specialized - are doing well while x86 is doing poorly at the time. Each iteration of SSE so far has then implemented features that these other processors have to erase these weak points.
All told, SSE5 includes 46 unique "base" instructions, with many of those instructions featuring several variations that work on different data types. With all of these variations, the total number of instructions introduced altogether with SSE5 is 170. For comparison's sake, the entire original x86 instruction set was a mere 80 instructions.
With SSE5, AMD is focusing on 5 groups of instructions. Those groups are:
- Fused multiply accumulate (FMACxx) instructions
- Integer multiply accumulate (IMAC, IMADC) instructions
- Permutation and conditional move instructions
- Vector compare and test instructions
- Precision control, rounding, and conversion instructions
As we hinted to earlier, many of these instructions are implementations of features found elsewhere. DSPs in particular have been and continue to be a major source of new instructions for new versions of SSE, with many of these instructions allowing for a CPU to process data for specialized cases at DSP-like speeds.
Additionally, AMD has taken a particular interest in the weakness of the very core of x86, which is how the instructions are formed and handled. A single binary-form instruction for an x86 processor (or most other processors for that matter) is a combination of two parts: an opcode and operands. The opcode is the segment of the instruction that says what to do, the operands are the data elements that will be operated upon and any further specifiers the processor needs to execute the instruction. As far as the x86 instruction set is concerned, this is normally very cut & dry: 2 bytes for the opcode, and then the rest of the instruction is the operands, with the vast majority of instructions using between 0 and 2 data element operands.
AMD is making changes to both the opcode and operand design as part of SSE5, with the latter in particular intended to make many of the new performance-improving instructions possible. For the opcode, AMD is adding a third byte to the opcode - this is necessary to provide the bits needed to identify the new instructions, and provide some controls over the use of the new operand features. As for the operand, SSE5 includes numerous instructions that require the use of more than 2 data element operands; the format of the operands is not so much the point here as is the potential power of having additional operands. One way to improve performance is to operate on more pieces of data in a given instruction, and this requires the ability to address more than 2 data elements.