Motherboards Memory Storage Cases/Cooling/PSUs IT Computing Displays Mobile Mac CPUs & Chipsets Video Digital Cameras Linux Gadgets Systems Trade Shows Guides Home Increase Font Size Decrease Font Size Change Page Size
AMD K8 E4 Stepping: SSE3 Performance
AMD K8 E4 Stepping: SSE3 Performance
Date: February 17th, 2005
Topic: CPU & Chipset
Manufacturer: AMD
Author: Derek Wilson
 
 


Introduction

With this week's introduction of the x52 line of Opteron processors, AMD is giving us a little look into the future of their Athlon 64 line. As mentioned in our article on Monday, the new 2.6GHz speed grade is also introducing the new E4 stepping, which adds SSE3 support. The new Opteron also received a face lift in that it is fabbed on a 90nm process, runs coherent HT links at 1GHz, and comes in a shiny new organic package rather than the older ceramic.

The goal of this article is to bring out a quick look at what SSE3 brings to the table for Opteron and the future revision E Athlon 64 cores. As desktop parts do not enable coherent HT links at all, the 1GHz support won't matter. Also, the newer A64 parts are already 90nm on organic packages. Other than the usual small tweaks we see between steppings, the only thing that will be new across the board for K8 processors is SSE3.

What exactly is SSE3? Intel introduced SSE3 as Prescott New Instructions last year. These instructions are generally additions to the SIMD (single instruction multiple data) capabilities of the processor. SIMD processing is based on the idea that sometimes processors must take large amounts of data and perform similar operations across the entire set. This lends itself well to things like audio and video processing. In these areas of computing, large amounts of data flow through the processor, undergoing roughly the same operations, in preparation for display. The philosophy behind SIMD lends itself well to graphics as well. Modern graphics cores incorporate many SIMD processing units in order to churn through vector and pixel data as fast as possible. SIMD processing has also largely overshadowed the use of the x87 floating point unit on x86 processors. Because of this, it is advantageous for AMD to support the extensions to SIMD Intel makes as quickly as possible.

With SSE3, Intel added 10 new instructions targeted at SIMD as well as 3 other instructions that don't touch the SSE registers (fisttp, monitor, mwait). Here's a brief list of SSE3 instructions and what they are for:
x87 floating point to integer conversion (fisttp)
Complex arithmetic (addsubps, addsubpd, movsldup, movshdup, movddup)
Video encoding (lddqu)
Graphics (haddps, hsubps, haddpd, hsubpd)
Thread synchronization (monitor, mwait)
The float to integer conversion is rather obvious in function, but some of the other instructions are a little mysterious. The complex math instructions extend functionality for imaginary numbers. The hadd and hsub instructions are horizontal additions and horizontal subtractions. These allow faster processing of data stored "horizontally" in (for example) vertex arrays. Here is a 4-element array of vertex structures.
x1 y1 z1 w1 | x2 y2 z2 w2 | x3 y3 z3 w3 | x4 y4 z4 w4
SSE and SSE2 are organized such that performance is better when processing vertical data, or structures that contain arrays; for example, a vertex structure with 4-element arrays for each component:
x1 x2 x3 x4
y1 y2 y3 y4
z1 z2 z3 z4
w1 w2 w3 w4
Generally, the preferred organizational method for vertecies is the former. Under SSE2, the compiler (or very unfortunate programmer) would have to reorganize the data during processing.

The lddqu instruction is designed to reduce the impact of 128bit unaligned memory accesses. As unaligned loads happen quite often in video processing, the lddqu instruction is designed to load 256bits of data aligned on a 16byte boundary. The instruction also takes care of extracting the correct 16bytes (as requested) from the 32byte block. Under SSE2, 64bit loads are executed and then the data is recombined.

In order to test these features as implemented by AMD, we tested an Opteron 250 against an Opteron 252. We were able to use crystalcpuid to set the multiplier of the Opteron 252 (though powernow!) to 12 in order to match the 2.4GHz of the Opteron 250. This way, we'll have a direct comparison of the two architectures.

We ran both processors in HP's wx9300 workstation. We used a single CPU configuration and 4x 512MB of RAM at 3:3:3:8. Windows XP SP2 was used in our tests. In an MP environment (with more memory bandwidth), the Opteron has a greater potential for improvement with SSE3. Unfortunately, we were unable to perform a direct comparison of the older and newer cores under a DP configuration. Attempting to use powernow! to adjust the multiplier with more than 1 processor installed resulted in a BSOD (machine check exception).

SSE3 Performance Analysis   Next Page

 
  Index

Tools Share
Find lowest prices Find the lowest prices
Digg   del.icio.us   E-mail  
Print This Article Print this article  

48 Comments - Last by DerekWilson, 1814 days ago
Username:
Password:
No Subject by jimmy43, 1818 days ago
In any case, AMD is slowly catching up to Intel in the media encoding segment..Hey more features, im not complaining!

Reply
No Subject by Fricardo, 1818 days ago
Do you guys have any word on when the revision E stepping comes out for the Athlon 64's? I wonder how long of a gap AMD wants to leave before releasing their desktop parts.

Reply
No Subject by skiboysteve, 1818 days ago
its funny how intel comes out with SSE, SSE2, SSE3... to compensate for weak x87 FP and a long pipe, but because of marketing AMD has to adopt these instructions as well on a very resiliant cpu that doesnt have such pickyness about code... so slap on SSE2 sticker and the performance is no better.

you could almost blame the kick ass FP performance?

im not trying to be biased, but i mean, look at the numbers, its the truth. it takes allot of work to make a long pipe work great in all areas.

Reply
No Subject by ozzimark, 1818 days ago
i know these are opterons, but are we going to get an overclocking article on the new core soon?

Reply
No Subject by dannybin1742, 1818 days ago
isn't this rev supposed to use strained silicon too?

Reply
No Subject by DerekWilson, 1818 days ago
Unfortunately, the platforms I have available to test the Opteron on (nforce 3 pro and nforce pro 2200) only offer overclocking in the form of nTune. And these platforms do not like being pushed out of spec.

We also have many more tests to run on these processors and platforms and don't wish to see an unfortunate lab accident consume our samples before we squezee all the data out of them we are looking for.

If we finish all our planned tests with Opteron 252, we may look into overclocking. But that will sit on the back burner for some time either way.

Reply
No Subject by bigpow, 1818 days ago
Funny.
I work at one of the largest high tech company today and I can't find any of these Opteron servers. My friends also notice the same trend.
Large corporations are sticking with Intel, enough said.

Nice step forward for AMD, still far away to catch Intel.

For my PC, I use AMD AthlonXP (soon-to-be A64). I wouldn't go with Intel for my use. But then again, I wouldn't go with Opteron too.

Who's buying this Opteron again?

Reply
No Subject by SkAiN, 1818 days ago
No Subject by SkAiN, 1818 days ago
Sorry for the blank post.

When I first began reading this article, I became excited, looking forward to seeing the benchmarks this "upgrade" was supposed to bring, especially in the area of encoding.

Then I saw the benchmarks.

Seriously, it looks as if AMD is getting the short end of the stick when it comes to the cross-licensing deal with Intel. Intel gets awesome new architechture, A64's get Intel's bogus hype...

Reply
No Subject by Samadhi, 1818 days ago
It has been written in a number of places that as well as adding SSE3 units the SSE2 units were to be improved in the latest chip revision.

Any chance we could get some SSE2 vs SSE2 results for the two processors tested in this article?

Reply
Comments Page 1 of 5

Unlicensed Software at Your Last Company
Anonymously Report Unlicensed Software with Our Form Now. Get Up to $1 Million.
We Buy Laptop and PC Memory! Sell to Us!
Min of 25 pieces required. Call us today at 239.354.1230.
Special Offer from The Economist
Get 12 issues of The Economist for $12. US subscribers only.
Free Forrester Risk Management Report
Demystifying Enterprise Risk Management. Download Free With Registration.
Download Microsoft Visual Studio ® Team System
Streamline Dev processes, Reduce time to market. Try Microsoft Visual Studio Team System, FREE!




Latest news by
DailyTech

 February 9, 2010

Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank

 February 8, 2010

Blank


more CPU & Chipset Discussions



pipeboost
Copyright © 1997-2010 AnandTech, Inc. All rights reserved. Terms, Conditions and Privacy Information.
Click Here for Advertising Information