Final Words

Finding good SSE3 benchmarks wasn't as easy as we would have liked. Other encoding suites react the same way that DivX and AutoGK do. This seems to indicate that the K8 architecture is simply resilient when it comes to unaligned 128bit loads. In the case of Intel's NetBurst, the lddqu instruction may have more impact.

As far as physics and graphics go, the added instructions show potential in our synthetic test. For DCC, CAD, scientific, and other workstation software, the E4 stepping could offer a bit of a performance boost.

In the consumer space, Athlon 64 may not see as much benefit from SSE3, especially since our encoding tests turned up so little performance impact. SSE3 can be used in games, but the impact of this will likely be minimal. As most games will likely remain graphics limited, improvements will have a hard time shining through. Of course, for those who like to use lower cost Athlon 64 processors in cheaper workstations, there could be some advantage.

When we take a look at the Opteron 252 in a workstation environment, we will be able to get a better view of what the total package has to offer. As our workstation tests will be in a DP environment, we'll be able to see how the higher bandwidth helps the Opteron shine.

We would like to have tested more applications in this report on SSE3 performance under the new AMD core. Of interest to us are LINPACK, FLOPS, STREAM, and various other tests that would require us to recompile them with proper SSE3 support. As the Intel compiler is designed to optimize for Intel processors, we haven't had a viable source for high quality SSE3 compilation. Hand optimizing these benchmarks for SSE3 on Opteron would take a little more time than this short investigation will allow. We may look into using GCC for this purpose in future tests. As for real world tests using SSE3, we haven't been able to find many suitable candidates beyond video encoders.

It will likely be the case that current SSE3 optimized code paths will also not show their strengths on Opteron/Athlon until the processors are in developers' hands for a while. The Intel compiler is also hands and feet above any resource AMD have up their sleeve. But since SSE3 offers more choices for optimization and code simplification, compilers may have an easier time generating efficient code. Hand optimized code is still important for tight loops in critical sections of performance oriented code. In this case, more powerful and simple options implemented in hardware will help programmers better optimize their own code.

SSE3 Performance Analysis
Comments Locked

48 Comments

View All Comments

  • DerekWilson - Monday, February 21, 2005 - link

    #46 and #47

    arent monitor and mwait like hardware semaphores/mutexes?
  • PrinceGaz - Saturday, February 19, 2005 - link

    #46- MONITOR tells the processor to detect changes in memory locations (typically in cache), and MWAIT puts a thread into low-power "sleep" until those memory changes are detected.

    MONITOR and MWAIT are meaningless to processors that are only running a single thread, therefore the AMD SSE3 capable processors will just treat them as no operation (it has to be aware of there existence so it can skip past them correctly, and not potentially crash like #44 asked).

    Instructions designed to put one thread to sleep so that the processor can use its full resources on the other thread are only relevant when a single processor is running two or more threads simultaneously. Single-threaded processors will always dedicate full resources to the thread they are running and not put them to sleep as that would be pointless. The O/S still handles thread-switching normally, regardless of how many threads the processor is running.
  • pxc - Friday, February 18, 2005 - link

    #45, that would be disasterous (NOP for a wait condition would mean a thread wouldn't wake up). :P MONITOR and MWAIT are useful for HyperThreading, but not exclusive to HT. Any application with multiple threads can benefit from its use. AMD's internal implementations will of course be different, but the instructions will behave the same way.
  • PrinceGaz - Friday, February 18, 2005 - link

    #44- they'll almost certainly just treat monitor and mwait as a NOP (no operation instruction)
  • quanta - Friday, February 18, 2005 - link

    Since SSE3-based AMD64 CPUs don't have hyperthreading, will application using monitor and mwait crashes the Opterons?
  • Viditor - Friday, February 18, 2005 - link

    Derek - By running at 75% quality, aren't you minimizing the effects of lddqu, as this is mainly of use for motion estimation (which is greatly reduced at lower quality settings...)?

    (Thanks to Mike S for pointing this out to me...)
  • Viditor - Friday, February 18, 2005 - link

    PrinceGaz - "It definitely looks like these new E stepping chips run hotter"

    Unfortunately, you can't really tell with AMD chips...
    All we really know is that under absolutely NO circumstances will it run higher than 92.6w...
    I do wish TDP was a standard across all companies, but I guess that would be impractical.
    Sadly, this is a measurement that none of the review sites ever make...
  • Viditor - Friday, February 18, 2005 - link

    Icehawk - "I've worked for several large corporations (Fortune 500) and none of them have AMD servers anywhere..."

    40% of the Fortune 500 companies are now using Opteron servers. A large percentage of the most powerful new supercomputers are Opterons.
    I am sure that while you worked for those companies the did not have Opterons, as this is only a recent development (over the last year).

    "most vendors only offer Intel boxes"

    This too has changed over the last year. As of now, the only major vendor to be Intel only is Dell. In fact, Sun has cancelled their Xeon line in favour of Opterons...

    http://www.theregister.co.uk/2005/02/10/sun_kills_...

  • PrinceGaz - Friday, February 18, 2005 - link

    #39- although the 2.4GHz x50 D4 Opteron also has the same 85.3W TDP as the 2.2GHz part, the 2.6GHz x52 has a TDP of 92.6W which is higher than any other Opteron including the 130nm parts.

    It definitely looks like these new E stepping chips run hotter, but we need power consumption and temperature tests to say for sure.
  • Brunnis - Friday, February 18, 2005 - link

    #38

    Well, AMD said that power requirements would drop for chips at the same frequency, if I remember correctly. The TDP doesn't say anything about processors currently available. For all we know the 85W figure could be for a future 4GHz Opteron. I'm exaggerating, but you get my point. :)

Log in

Don't have an account? Sign up now