x87 floating point to integer conversion (fisttp)The float to integer conversion is rather obvious in function, but some of the other instructions are a little mysterious. The complex math instructions extend functionality for imaginary numbers. The hadd and hsub instructions are horizontal additions and horizontal subtractions. These allow faster processing of data stored "horizontally" in (for example) vertex arrays. Here is a 4-element array of vertex structures.
Complex arithmetic (addsubps, addsubpd, movsldup, movshdup, movddup)
Video encoding (lddqu)
Graphics (haddps, hsubps, haddpd, hsubpd)
Thread synchronization (monitor, mwait)
x1 y1 z1 w1 | x2 y2 z2 w2 | x3 y3 z3 w3 | x4 y4 z4 w4
SSE and SSE2 are organized such that performance is better when processing vertical data, or structures that contain arrays; for example, a vertex structure with 4-element arrays for each component:
x1 x2 x3 x4
y1 y2 y3 y4
z1 z2 z3 z4
w1 w2 w3 w4
Generally, the preferred organizational method for vertecies is the former. Under SSE2, the compiler (or very unfortunate programmer) would have to reorganize the data during processing.
|
||||