The SSE5 Instruction Set

While extending the x86 instruction set with new iterations of SSE has become a regular activity in the computing industry, many of these additions are in actuality a gradual reshaping of x86 processors. Although as a general purpose CPU design x86 doesn't have any hard limitations (given enough time you can do any kind of calculation required) it has had several weak points patched up over the years. The basis of identifying and patching these weak points has been looking at what processors - general and specialized - are doing well while x86 is doing poorly at the time. Each iteration of SSE so far has then implemented features that these other processors have to erase these weak points.

All told, SSE5 includes 46 unique "base" instructions, with many of those instructions featuring several variations that work on different data types. With all of these variations, the total number of instructions introduced altogether with SSE5 is 170. For comparison's sake, the entire original x86 instruction set was a mere 80 instructions.

With SSE5, AMD is focusing on 5 groups of instructions. Those groups are:

  • Fused multiply accumulate (FMACxx) instructions
  • Integer multiply accumulate (IMAC, IMADC) instructions
  • Permutation and conditional move instructions
  • Vector compare and test instructions
  • Precision control, rounding, and conversion instructions

As we hinted to earlier, many of these instructions are implementations of features found elsewhere. DSPs in particular have been and continue to be a major source of new instructions for new versions of SSE, with many of these instructions allowing for a CPU to process data for specialized cases at DSP-like speeds.

Additionally, AMD has taken a particular interest in the weakness of the very core of x86, which is how the instructions are formed and handled. A single binary-form instruction for an x86 processor (or most other processors for that matter) is a combination of two parts: an opcode and operands. The opcode is the segment of the instruction that says what to do, the operands are the data elements that will be operated upon and any further specifiers the processor needs to execute the instruction. As far as the x86 instruction set is concerned, this is normally very cut & dry: 2 bytes for the opcode, and then the rest of the instruction is the operands, with the vast majority of instructions using between 0 and 2 data element operands.

AMD is making changes to both the opcode and operand design as part of SSE5, with the latter in particular intended to make many of the new performance-improving instructions possible. For the opcode, AMD is adding a third byte to the opcode - this is necessary to provide the bits needed to identify the new instructions, and provide some controls over the use of the new operand features. As for the operand, SSE5 includes numerous instructions that require the use of more than 2 data element operands; the format of the operands is not so much the point here as is the potential power of having additional operands. One way to improve performance is to operate on more pieces of data in a given instruction, and this requires the ability to address more than 2 data elements.

Index It’s a MADD, MADD World


View All Comments

  • redpriest_ - Thursday, August 30, 2007 - link

    SSE5 is far more robust than SSE4. Reply
  • ltcommanderdata - Thursday, August 30, 2007 - link

    I was wondering whether there are any copyrights to the SSE and MMX names that Intel owns? SSE was originally started as a polarized opposition to 3DNow!, but I think having both Intel and AMD developing something dubbed "SSE" without a unified standard will get very convoluted very quickly. Like an SSE4a that appears to be a superset of SSE4 but is actually exclusive and SSE5 being only a partial superset. I can only imagine what would happen if Intel decided to label their next instruction set "the real SSE5" or introduce a SSE6 that completely skips over AMD's SSE5.

    You know, one of the things I've found interesting is how there are little things that Intel and AMD are doing can be viewed as appealing to Apple. Intel's Penryn for example has their Super Shuffle Engine which improves SSE packing, unpacking etc. which can be viewed as an attempt to meet the functionality of the Vector Permute Unit in the G4e and G5. Similarly, the move to 3-operand SSE also seems like an appeal to Altivec programmers.
  • MikeyJ79 - Thursday, August 30, 2007 - link

    If I remember correctly, when AMD put MMX instructions in their K6 processor. Intel tried to sue them, but AMD won out. I don't remember the specific details, but I believe that was the jist of it. Reply
  • rcc - Thursday, August 30, 2007 - link

    You don't horn in on other companys' naming schemes etc. It's actually more likely that someone seeing SSE5 will think it's an Intel spec than an AMD spec. I really dislike the whole lawsuit happy system we have going, but AMD needs to be slapped for this one.

  • DigitalFreak - Thursday, August 30, 2007 - link

    It's just AMD marketing BS, trying to make themselves look like they are leading again, like in the x64 days, instead of following. I wouldn't be surprised in the least if this ended up being called something else by the time it finally arrives. Reply
  • Omega215D - Friday, August 31, 2007 - link

    Like AMD64 for Intel is called EM64T? Despite their competitive nature they still license tech from each other here and there. Reply
  • crimson117 - Thursday, August 30, 2007 - link

    How does this benefit AMD?

    Does appearing to be (or becoming) a specification leader give them an advantage; for example is it easier for their chip designers to use AMD-developed specs than Intel-developed specs?

    It seems like this cannot be a low-cost endeavor.

Log in

Don't have an account? Sign up now