AMD's Quad-Core Barcelona: Defending New Territory

Name: AMD's Quad-Core Barcelona: Defending New Territory
Item: AMD's Quad-Core Barcelona: Defending New Territory
Author: Johan De Gelas

by Johan De Gelas on September 10, 2007 12:15 AM EST

Posted in
IT Computing

46 Comments | Add A Comment

46 Comments

64-bit Linux HPC Performance: LINPACK

There is one kind of code where Core really ate the AMD CPUs for breakfast. It was close to embarrassing: floating point intensive code that makes heavy use of vector SIMD, also called packed SSE (and SSE2/SSE3) runs up to two times as fast on a Xeon 5160 (3GHz) than on Opteron 2222 (3GHz) . This is also one of the (but probably not the main) reason why AMD was also falling a bit behind in the gaming area.

AMD has really gone a long way to improve the performance of 128-bit packed SSE instructions:

Instruction fetch has been doubled to 32 bytes
128-bit SSE computations now decode into a single micro-op (two in K8)
The load unit can load two 128-bit numbers from the L1 cache each cycle
FP Reservation stations are still 36 entry, but they're now 128-bits wide instead of 64-bits
All three FPU executions units were widened to 128-bit (64-bit before)
The L2 cache has double the bandwidth to cope with this

Together with the excellent memory subsystem, Barcelona should be ready to take on the Intel Core architecture when it comes to pure SIMD/SSE power.

Meet LINPACK, a benchmark application based on the LINPACK TPP code, which has become the industry standard benchmark for HPC. It solves large systems of linear equations by using a high performance matrix kernel. We used Intel's version of LINPACK, which uses the highly optimized Intel Math Kernel Library. The Intel MKL is quite popular and in an Intel dominated world, AMD's CPUs have to be able to run Intel optimized code well.

We used a workload of square matrices of sizes 5000 to 30000 by steps of 5000, and we ran four (dual dual-core) or eight threads (dual quad-core). As the system was equipped with 8GB of RAM, the large matrixes all ran in memory. LINPAC is expressed in GFLOPs (Giga/Billions of Floating Operations Per Second). We'll start with the quad-core scores (one quad or two duals).

Yes, this code is very Intel friendly but it does exist in the real world, and it is remarkably interesting. Look at what Barcelona is doing: it is outperforming a 60% higher clocked Opteron 2224 SE. That means that clock for clock, the third generation Opteron is no less than 142% faster. That is a massive improvement!

Thanks to meticulous tuning for the Intel's cores, the Xeon is still winning the benchmark. A 17% higher clocked Xeon 5345 is about 25-26% faster than Barcelona, but the days where this kind of code resulted in embarrassing defeats for AMD are over. We are very curious how a LINPACK compiled with AMD's math kernel libraries and other compilers would do, but the late arrival didn't allow us to do much recompiling.

Now let's take a look at the eight thread results. We kept the Xeon 5160 (four threads) in this graph, so you can easily compare the results with the previous graph.

Normally you would expect that this kind of code with huge matrices has to access the memory a lot, but masterly optimization together with hardware prefetching ensures most of the data is already in the cache. The quad-core Xeon wins again, but the victory is a bit smaller: the advantage is 20%-23%. Let us see if Intel can still keep the lead when we look at a benchmark which is very SSE intensive and which is optimized for Intel CPUs, but this time it's developed by a third party.

64-bit Linux Java Performance: SPECjbb2005 Software Rendering: zVisuel (32-bit Windows)

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

46 Comments

View All Comments

kalyanakrishna - Tuesday, September 11, 2007 - link
I don't deny people use MKL ... I dont agree that anyone targeting performance on AMD Opteron will use MKL. No one running HPL/Linpack for Top 500 submission would use MKL on Opteron. No one who wishes to test his Opteron for performance would use MKL to do so. No one wishing to have the fastest possible results from his Opteron will do so.

Even ISV's now provide code that is optimized for Xeon and Opteron separately.
JohanAnandtech - Tuesday, September 11, 2007 - link
Ok, point taken. Give us some time, and we'll follow up with new compilations of Linpack.
kalyanakrishna - Wednesday, September 12, 2007 - link
Thank you. Appreciate the effort.
leexgx - Monday, September 10, 2007 - link
and how offen do you read anandtechs Previews and reviews

unlike when intels core 2 came out all the hipe was real, to bad for AMD this time

this cpu is going to be good, problem is will it be able to compleat with Intels new cpu when it comes out

i still useing an amd system if your wundering and so all the rest of my pcs apart from my server as i just thow in an old P4 mobo to just file sharein house (all second hand parts apart from the hdds)
phaxmohdem - Monday, September 10, 2007 - link
I wonder if it would be feasible for AMD to take the Intel approach, and slap two of there new native quad cores together and release an octal core CPU in the near future. Or would they remain the multi-core purists they have become... Similarly I wonder if 2 65nm Barecelona cores could even fit under that heat spreader... or come in under an acceptable thermal envelope.
Accord99 - Monday, September 10, 2007 - link
It won't fit on Socket F:

http://www.madboxpc.com/news/am2/AMD_barcelona.jpg">http://www.madboxpc.com/news/am2/AMD_barcelona.jpg
fic2 - Monday, September 10, 2007 - link
Page 8, 3DS Max 9 last paragraph:
"Dual 3GHz Opteron 2222 is capable of generating about 29 frames per hour", but then
"potential 3GHz Barcelona will be able to spit out ~35 frames per second". I think that is supposed to be ~35 frames per hour. Otherwise that is an extremely impressive speedup!
JohanAnandtech - Monday, September 10, 2007 - link
No, it is "per second". We used a Octalcore 2THz Barcelona there.

... Thanks, fixed that one :-)
phaxmohdem - Monday, September 10, 2007 - link
Got SuperPi times for that beast? ;)
Roy2001 - Monday, September 10, 2007 - link
Kentsfield has 2*143mm^2 dies. Barcelona is 280+ mm^2. Penry would be even smaller, 2*100 mm^2. So unless AMD can increase the frequency to 3.0+Ghz soon and price their new quad-core processors higher than Intel's, AMD would be still in red unless it oursouces Athlon 64 to TSMC.

AMD's Quad-Core Barcelona: Defending New Territory

Post Your Comment

46 Comments

View All Comments

kalyanakrishna - Tuesday, September 11, 2007 - link

JohanAnandtech - Tuesday, September 11, 2007 - link

kalyanakrishna - Wednesday, September 12, 2007 - link

leexgx - Monday, September 10, 2007 - link

phaxmohdem - Monday, September 10, 2007 - link

Accord99 - Monday, September 10, 2007 - link

fic2 - Monday, September 10, 2007 - link

JohanAnandtech - Monday, September 10, 2007 - link

phaxmohdem - Monday, September 10, 2007 - link

Roy2001 - Monday, September 10, 2007 - link

Log in

Don't have an account? Sign up now