Thirteen New Instructions - SSE3

Back at IDF we learned about the thirteen new instructions that Prescott would bring to the world; although they were only referred to as the Prescott New Instructions (PNI) back then, it wasn't tough to guess that their marketing name would be SSE3.

The new instructions are as follows:

FISTTP, ADDSUBPS, ADDSUBPD, MOVSLDUP, MOVSHDUP, MOVDDUP, LDDQU, HADDPS, HSUBPS, HADDPD, HSUBPD, MONITOR, MWAIT

The instructions can be grouped into the following categories:

x87 to integer conversion
Complex arithmetic
Video Encoding
Graphics
Thread synchronization

You have to keep in mind that unlike the other Prescott enhancements we've mentioned today, these instructions do require updated software to take advantage of. Applications will either have to be recompiled or patched with these instructions in mind. With that said, let's get to highlighting what some of these instructions do.

The FISTTP instruction is useful in x87 floating point to integer conversion, which is an instruction that will be used by applications that are not using SSE for their floating point math.

The ADDSUBPS, ADDSUBPD, MOVSLDUP, MOVSHDUP and MOVDDUP instructions are all grouped into the realm of "complex arithmetic" instructions. These instructions are mostly designed to reduce latencies in carrying out some of these complex arithmetic instructions. One example are the move instructions, which are useful in loading a value into a register and adding it to other registers. The remaining complex arithmetic instructions are particularly useful in Fourier Transforms and convolution operations - particularly common in any sort of signal processing (e.g. audio editing) or heavy frequency calculations (e.g. voice recognition).

The LDDQU instruction is one Intel is particularly proud of as it helps accelerate video encoding and it is implemented in the DivX 5.1.1 codec. More information on how it is used can be found in Intel's developer documentation here.

In response to developer requests Intel has included the following instructions for 3D programs (e.g. games): haddps, hsubps, haddpd, hsubpd. Intel told us that developers are more than happy with these instructions, but just to make sure we asked our good friend Tim Sweeney - Founder and Lead Developer of Epic Games Inc (the creators of Unreal, Unreal Tournament, Unreal Tournament 2003 and 2004). Here's what he had to say:

Most 3D programmers been requesting a dot product instruction (similar to the shader assembly language dp4 instruction) ever since the first SSE spec was sent around, and the HADDP is piece of a dot product operation: a pmul followed by two haddp's is a dot product.

This isn't exactly the instruction developers have been asking for, but it allows for performing a dot product in fewer instructions than was possible in the previous SSE versions. Intel's approach with HADDP and most of SSE in general is more rigorous than the shader assembly language instructions. For example, HADDP is precisely defined relative to the IEEE 754 floating-point spec, whereas dp4 leaves undefined the order of addition and the rounding points of the components additions, so different hardware implementing dp4 might return different results for the same operation, whereas that can't happen with HADDP.

As far as where these instructions are used, Tim had the following to say:

Dot products are a fundamental operation in any sort of 3D programming scenario, such as BSP traversal, view frustum tests, etc. So it's going to be a measurable performance component of any CPU algorithm doing scene traversal, collision detection, etc.

The HSUBP ops are just HADDP ops with the second argument's sign reversed (sign-reversal is a free operation on floating-point values). It's natural to support a subtract operation wherever one supports an add.

So the instructions are useful and will lead to performance improvements in games that do take advantage of them down the road. The instructions aren't everything developers have wanted, but it's good to see that Intel is paying attention to the game development community, which is something they have done a poor job of doing in the past.

Finally we have the two thread synchronization instructions - monitor and mwait. These two instructions work hand in hand to improve Hyper Threading performance. The instructions work by determining whether a thread being sent to the core is the OS' idle thread or other non-productive threads generated by device drivers and then instructing the core to worry about those threads after working on whatever more useful thread it is working on at the time. Unfortunately monitor and mwait will both require OS support to be used, meaning that we will either be waiting for Longhorn or the next Service Pack of Windows for these two instructions.

Intel would not confirm whether the instructions can be used in a simple service pack update; they simply indicated that they were working with Microsoft of including support for them. We'd assume that they would be a bit more excited about the ability to bring the instructions to Prescott users via a simple service pack update, maybe indicating that we will have to wait for the next version of Windows before seeing these two in use.

Larger, Slower Cache Half-Time Summary
Comments Locked

104 Comments

View All Comments

  • Chadder007 - Monday, February 2, 2004 - link

    I can't imagine how HOT that sucker will be when up to 5ghz!!!! 150oC??? LOL
    For the heat issues alone, im thinking about going AMD in my next rig.
  • CRAMITPAL - Monday, February 2, 2004 - link

    Ace's Hardware summed it up well: Prescott is a DOG, or to be exact a HOT DOG ! See the picture in the review of the dog warming it's toes next to the Prescott powered PC. Talk about one sad CPU piece of crap...

    Here is the FLAME THROWER reality check:

    "Currently there is no reason to upgrade to Prescott, as the gaming performance is more or less ok, but many applications report pretty poor results. On top of that, the new Intel CPU gets hot very quickly and requires a well ventilated case. The Athlon 64 3200+ is not always the clear winner in games compared to 3.2 GHz Prescott, but the 3400+ will have little trouble beating the 3.4 GHz Prescott in most benchmarks. Prescott will have to scale incredibly quickly to outperform the Athlon 64, because the latter scales excellently with clockspeed, and we definitely prefer Cool'n'Quiet over Hot'n Prescott! "

    As shown this FLAME THROWER don't scale well, especially when it runs 15-20C hotter than an equal speed Northwood. Intel really fugged up this time. Ya gotta love seeing the Satan eat shit and choke! When every hardware review site on the planet, including THG's tells ya Prescott is a piece of crap, then you might as well resign to reality. DENIAL is futile!

    Dell will be selling FLAME THROWING PC Heaters to any gullible sheep foolish enough to buy a Prescott. A fool and his money are soon parted !
  • AnonymouseUser - Monday, February 2, 2004 - link

    "Ummmm yea, kinda reminds me of cooking an egg on an Athlon XP"

    Yeah, kinda, except the Prescott can do the same work in about half the time. Sounds like something they should advertise that as a feature...
  • Stlr22 - Monday, February 2, 2004 - link

    What happened to CRAM's post???
  • INTC - Monday, February 2, 2004 - link

    #43 cliffa3 - http://www.x86-secret.com/articles/cpu/prescott/p4...

    It doesn't look good for P4G8X with either the 2.8/533 or the 800 MHz FSB flavors.
  • mkruer - Monday, February 2, 2004 - link

    For those who missed it, X-bit gave a temperature comparison, for the all the chip.
    http://www.xbitlabs.com/articles/cpu/display/presc...

    Processor; Idle, Burn
    Pentium 4 (Prescott) 3.2GHz; 45oC, 61oC
    Pentium 4 (Northwood) 3.2GHz; 30oC, 48oC
    Pentium 4 Extreme Edition 3.2GHz; 32oC, 51oC

    This does not bode well for Intel unless they are going to make water cooling a standard.

    But this Quote sums it up nicely IMHO “I am scared to imagine what happens to Prescott when we close the system case…”

  • lmonds - Monday, February 2, 2004 - link

    what??? no talk about heat on this chip? Come on anand this is vital info about prescott. Other sites are reporting temps up around 80c with the stock cooler. I understand that as it gets faster in mhz it will be a better performing chip but what kinda heat are we looking at at 4ghz? No way is a 80c chip going in any of my boxes. If keeping an intel badge on the front of my case means i have to have a delta fan in my box then you can forget about it.
  • Stlr22 - Monday, February 2, 2004 - link

    :D
  • Captante - Monday, February 2, 2004 - link

    Stlr22 ....Re post # 31 Hahahahahahahahahahahahahahahahahahahahahaaha!!!!
    That one had me cracking up for 5 minutes!
    It is good to laugh!!! :-)
  • Stlr22 - Monday, February 2, 2004 - link

    Moreless a Prescott....

Log in

Don't have an account? Sign up now