Thirteen New Instructions - SSE3

Back at IDF we learned about the thirteen new instructions that Prescott would bring to the world; although they were only referred to as the Prescott New Instructions (PNI) back then, it wasn't tough to guess that their marketing name would be SSE3.

The new instructions are as follows:

FISTTP, ADDSUBPS, ADDSUBPD, MOVSLDUP, MOVSHDUP, MOVDDUP, LDDQU, HADDPS, HSUBPS, HADDPD, HSUBPD, MONITOR, MWAIT

The instructions can be grouped into the following categories:

x87 to integer conversion
Complex arithmetic
Video Encoding
Graphics
Thread synchronization

You have to keep in mind that unlike the other Prescott enhancements we've mentioned today, these instructions do require updated software to take advantage of. Applications will either have to be recompiled or patched with these instructions in mind. With that said, let's get to highlighting what some of these instructions do.

The FISTTP instruction is useful in x87 floating point to integer conversion, which is an instruction that will be used by applications that are not using SSE for their floating point math.

The ADDSUBPS, ADDSUBPD, MOVSLDUP, MOVSHDUP and MOVDDUP instructions are all grouped into the realm of "complex arithmetic" instructions. These instructions are mostly designed to reduce latencies in carrying out some of these complex arithmetic instructions. One example are the move instructions, which are useful in loading a value into a register and adding it to other registers. The remaining complex arithmetic instructions are particularly useful in Fourier Transforms and convolution operations - particularly common in any sort of signal processing (e.g. audio editing) or heavy frequency calculations (e.g. voice recognition).

The LDDQU instruction is one Intel is particularly proud of as it helps accelerate video encoding and it is implemented in the DivX 5.1.1 codec. More information on how it is used can be found in Intel's developer documentation here.

In response to developer requests Intel has included the following instructions for 3D programs (e.g. games): haddps, hsubps, haddpd, hsubpd. Intel told us that developers are more than happy with these instructions, but just to make sure we asked our good friend Tim Sweeney - Founder and Lead Developer of Epic Games Inc (the creators of Unreal, Unreal Tournament, Unreal Tournament 2003 and 2004). Here's what he had to say:

Most 3D programmers been requesting a dot product instruction (similar to the shader assembly language dp4 instruction) ever since the first SSE spec was sent around, and the HADDP is piece of a dot product operation: a pmul followed by two haddp's is a dot product.

This isn't exactly the instruction developers have been asking for, but it allows for performing a dot product in fewer instructions than was possible in the previous SSE versions. Intel's approach with HADDP and most of SSE in general is more rigorous than the shader assembly language instructions. For example, HADDP is precisely defined relative to the IEEE 754 floating-point spec, whereas dp4 leaves undefined the order of addition and the rounding points of the components additions, so different hardware implementing dp4 might return different results for the same operation, whereas that can't happen with HADDP.

As far as where these instructions are used, Tim had the following to say:

Dot products are a fundamental operation in any sort of 3D programming scenario, such as BSP traversal, view frustum tests, etc. So it's going to be a measurable performance component of any CPU algorithm doing scene traversal, collision detection, etc.

The HSUBP ops are just HADDP ops with the second argument's sign reversed (sign-reversal is a free operation on floating-point values). It's natural to support a subtract operation wherever one supports an add.

So the instructions are useful and will lead to performance improvements in games that do take advantage of them down the road. The instructions aren't everything developers have wanted, but it's good to see that Intel is paying attention to the game development community, which is something they have done a poor job of doing in the past.

Finally we have the two thread synchronization instructions - monitor and mwait. These two instructions work hand in hand to improve Hyper Threading performance. The instructions work by determining whether a thread being sent to the core is the OS' idle thread or other non-productive threads generated by device drivers and then instructing the core to worry about those threads after working on whatever more useful thread it is working on at the time. Unfortunately monitor and mwait will both require OS support to be used, meaning that we will either be waiting for Longhorn or the next Service Pack of Windows for these two instructions.

Intel would not confirm whether the instructions can be used in a simple service pack update; they simply indicated that they were working with Microsoft of including support for them. We'd assume that they would be a bit more excited about the ability to bring the instructions to Prescott users via a simple service pack update, maybe indicating that we will have to wait for the next version of Windows before seeing these two in use.

Larger, Slower Cache Half-Time Summary
Comments Locked

104 Comments

View All Comments

  • Stlr22 - Sunday, February 1, 2004 - link

    post*
  • Stlr22 - Sunday, February 1, 2004 - link

    KristopherKubicki

    Earlier you said that I should read the article.
    What was your point? What was it about my first pot that you disagreed with?
  • KristopherKubicki - Sunday, February 1, 2004 - link

    #7:

    I agree 100% with Anand and Derek. This processor will be a non-event until we get in the 3.6GHz range. Similar to Northwood's launch.

    #10:

    Check out our price engine. We have already been listing the processor a week!

    http://www.anandtech.com/guides/priceguide.htm

    http://www.monarchcomputer.com/Merchant2/merchant....

  • cliffa3 - Sunday, February 1, 2004 - link

    In the table on page 14 it shows that the 90nm P4@2.8 will have a 533 MHz FSB, but is that the case? I did some quick google research and can't find anything to support that...please confirm or correct, thanks.
  • NFactor - Sunday, February 1, 2004 - link

    Yes, I must agree this is an amazing article, one of the best i have ever read. Thanks.
  • Xentropy - Sunday, February 1, 2004 - link

    VERY interesting article. Thank you Anand and Derek! One of the best I've read on Anandtech, and I consider yours the best hardware site on the net!

    One correction, on page 7, you say, "if you want to multiply a number in binary by 2 you can simply shift the bits of the number to the right by 1 bit," but don't you mean shift to the left one bit (and place a zero at the end)? It's much like multiplying a decimal number by ten for obvious reasons.

    Anyway, it looks like the Prescott is somewhat of a non-event at this time. Just new cores that perform fundamentally the same as the current ones at current speeds. The real news will come later; Intel has just positioned itself for one hell of a speed ramp to come. Northwood was clearly at the end of the line. One analogy, I suppose, would be that Intel didn't fire any shots in the CPU war today, but they loaded their guns in preparation to fire.

    The coming year will be an exciting one for us hardware geeks. I'm interested in seeing how higher clocked Prescotts play out as well as whether anything 64-bit shows up before 2005 to support AMD's stance that we need it NOW.

    Again, thanks for a very thorough article!
  • Stlr22 - Sunday, February 1, 2004 - link

    KristopherKubicki

    So what's your take on these new Prescotts?
  • KristopherKubicki - Sunday, February 1, 2004 - link

    Anand scolded me for not reading the article :( I only read the conclusion and the graphs. Turns out the decision making isnt as clearcut as it sounds.

    As for the thing with the inquirer. Well, lots of people had prescotts. We had one back in August I believe. The thing is they were horribly slow - 533FSB 2.8GHz. Everyone drew the conclusion that these were purposely slowed processors that were jsut for engineering purposes. While the inq benched this processor, most people didnt just becuase they were under the impression this was not to be the final production model. Hope that clears up some discrepancy about the validity.

    Cheers,

    Kristopher
  • wicktron - Sunday, February 1, 2004 - link

    Hehe, I guess the Inq was right about this one. Where are all the Inq bashers and their claim of "fake" benchies? Haha, I laugh.
  • Stlr22 - Sunday, February 1, 2004 - link

    KristopherKubicki - "read the article..."


    lol that might be a good idea, as I only broswed it and read the conclusion. :D

Log in

Don't have an account? Sign up now