Socket-AM2 Performance Preview
Without major architectural changes to the new AM2 CPUs, we wanted a quick and easy way to showcase the performance differences between AM2 and Socket-939. What we've got is a massive table below with all of our usual CPU benchmarks and their results for the same CPU in both Socket-939 and AM2 varieties, and the performance benefit offered by AM2:
| Benchmark | Socket-939 (DDR-400) | Socket-AM2 (DDR2-800) | % Advantage (Socket-AM2) |
| PC WorldBench 5 | 115 | 115 | 0% |
| Business Winstone 2004 | 23.3 | 23.2 | -0.4% |
| Multimedia Winstone 2004 | 38.4 | 38.9 | 1.3% |
| SYSMark 2004 | 220 | 224 | 1.8% |
| ICC SYSMark 2004 | 282 | 286 | 1.4% |
| OP SYSMark 2004 | 171 | 175 | 2.3% |
| 3dsmax 7 | 2.38 | 2.38 | 0% |
| Adobe Premier Pro 1.5 (Export w/ Adobe Media Encoder) | 130 s | 128 s | 1.5% |
| Adobe Photoshop CS2 | 210.6 s | 210.3 s | 0.1% |
| DivX 6.1 | 11.6 fps | 12.0 fps | 3.4% |
| WME9 | 35.2 fps | 35.6 fps | 1.1% |
| Quicktime 7.0.4 (H.264) | 3.63 min | 3.63 min | 0% |
| iTunes 6.0.1.4 (MP3) | 43 s | 43 s | 0% |
| Quake 4 - 10x7 (SMP) | 111.3 fps | 117.4 fps | 5.5% |
| Call of Duty 2 - 10x7 | 59.3 fps | 60.1 fps | 1.3% |
| F.E.A.R. - 10x7 | 92 fps | 94 fps | 2.1% |
| Multitasking Test (LAME + WME + Anti Virus + Zip) | 216.3 s | 213.4 s | 1.4% |
| ScienceMark 2.0 (Bandwidth) | 5007 MB/s | 6805 MB/s | 36% |
| ScienceMark 2.0 (Latency 512-byte stride) | 53.83 ns | 49.77 ns | 7.5% |
We'll start at the bottom of the table and go up from there. Rev F processors feature a 128-bit DDR2-800 memory controller, which works out to offer a peak theoretical bandwidth to/from memory of 12.8GB/s. As you can expect, that's twice the bandwidth of Rev E CPUs' 128-bit DDR-400 controller at 6.4GB/s. Thus to see a 36% increase in memory bandwidth according to ScienceMark is to be expected, albeit a bit on the low side. The old DDR-400 memory controller is able to deliver 5GB/s out of a maximum of 6.4GB/s, but now we're only seeing 6.8GB/s out of a maximum of 12.8GB/s with AM2. This however is a huge step for AMD, as it is the first spin of the Rev F silicon that we've been able to see such a significant advantage in theoretical memory bandwidth over previous DDR-400 cores.
What's even more important than the increase in memory bandwidth is that access latency has been reduced by 7.5% over the DDR-400 memory controller in the Rev E cores. Lower latency and more bandwidth means that, at bare minimum, performance won't go down. At least, not perceptibly: .4% slower in one test that has a 1-2% variability is nothing to worry about.
It also doesn't guaranee that performance will go up, as you can see from the results above. If we only count the overall SYSMark score and leave out the synthetic tests, the real world performance advantage averages out to a little under 1.3%. There are some special cases such as Quake 4 and DivX were performance goes up fairly reasonably, which can be expected since both of those tasks are fairly bandwidth intensive and make good use of both cores. However similar benchmarks, such as F.E.A.R. and Windows Media Encoder 9 show lower improvements, so it is very dependent on the specific application and workload.
It's important to note that until recently, AM2 samples were not able to produce scores even on par with Socket-939, so the fact that we're seeing a performance increase at all is a major step from where we were just a couple of months ago. The real question is, is this all we get?

105 Comments
View All Comments
flemlion - Monday, April 10, 2006 - link
This seems to be just a quicky review. In the conclusion it is mentioned that the usefullness of memory bandwith increases as the CPU clock speed increases. But still a lower speed was used for this test than for the DDR1-400 versus DDR1-500 evaluation. It seems to me at least this test should either have been done at different speeds to get a feel of this impact or at minimum at the same speed of the DDR1-400 versus DDR1-500 article.As a sidenote, it's also interesting to see that the test config has no mention of the CPU speed that was used. If this is NDA, then say so, if not it just appears as hiding the details that would expose this article as gossip instead of information.
Reply
andrewln - Monday, April 10, 2006 - link
I meant...Intel will see how the next generation of AMD works just 5% faster....wouldn't they tune down conroe to match or make it just a bit faster than AMD and sell at the premium price? Since the demand will be almost the same.
1) AMD fanboi will keep on buying AMD
2) Intel fanboy will keep on buying Intel
3) But this time, people that wants performance, will be buying Intel (even though its only a 10% faster than the competitor, or 40$)
This way, when AMD makes a new gen of procesor, Intel only have to tune up Conroe which is cheaper than making another big modification that might or might not work. Reply
Conroe - Monday, April 10, 2006 - link
They said 20%, and thats where they plan on staying. Theu could have more. The FX-62 has extra cache, it may give 10% who knows? ReplyAnand Lal Shimpi - Monday, April 10, 2006 - link
Every FX-62 I've seen hasn't had any more cache than what's in the table in the review.Take care,
Anand Reply
Dfere - Monday, April 10, 2006 - link
I’ve got to disagree- I don’t think this makes sense to even upgrade from a 754 system to AM2.Why? Because if you remember Nforce 2- and all the Mb’s with “future- proof” DDR-400 systems, the MB makers did not live up to their claims. For most recognized mfg’s it took the revision after DDR-400 memory was available before most of them got it right.
So I don’t see where AM2 can even be thought of as an upgrade path, especially before final revisions have been made in silicon. A MB you buy initially might work, but with future memory or processors… forget it. Anybody wanna take a bet ($1 will get you $10), that the first MB’s out by lets say- ASUS, do not allow for different memory timings or the latest memory say March of 07?, let alone a top of the line processor, same date?
While the author did say many changes are still in the works, final silicon may not yet even been achieved. How can buying a MB now be considered a possible upgrade in the future?
For this reason, and many price/performance reasons, I have a 754 system, and I will hope that after tax season ends I can build a 939 for a better price. That’s it.
The numbers per the review state this clearly. This is not about performance. And it will be expensive. The analysis on the forum here site seems to indicate that the relative analysis is expected future performance, when Anand admittedly and AMD (by not making announcements about performance) seem to indicate (and I explicitly do) that this is not about performance…. Yet either. So how can this even be recommended as an upgrade path when there is very little real world benefit and future compatibility a MB purchased now and memory or processors is not even known.
I am an avid fan of AMD, but I think excess hype can kill a product as quickly as bad rumors.
Reply
HammerFan - Monday, April 10, 2006 - link
I'm suprised that nobody has considered the bottlenecks in AMD's systems as of late. Recently, it seems that all AMD really needs to do with the K8 is keep squeezing more MHz out of it. Clearly the CPU has enough memory bandwidth to spare, so bring the rest of the processor up to speed. IIRC, AMD is starting to implement an improved version of SOI in their new CPU cores (or is it 65nm cores?), which will help increase clock-speed headroom. Also, as quality continues to improve, AMD might be able to add higher clock speeds to take advantage.just my $.02
HF Reply
ozzimark - Monday, April 10, 2006 - link
one thing that would REALLY help K8... follow intel's footsteps with netburst and try to double-pump the ALU. faster SSE execution never hurts either :) Replystill - Monday, April 10, 2006 - link
Double-pumping the ALU is only going to limit scaling and increase heat... what the K8 core really needs is better L1 and L2 cache subsystem.... The L1 is sort of ok but getting old it the same one the K7 (7 year old). They improved the L2 of the K8 over K7 but half heartedly. It still has too narrow of a path and too high of a latency. I can just imagine what the K8 can do with a 4M low latency cache that has 256 or 512 bit width data path (+ ECC of course).While they are there lower the L1 latency to 2 cycles. That alone is 5-10 % improvement.
And they need to seriously improve the SIMD execution units. The current AMD SIMD units are almost as lame as the Intel implementation of AMDs 64bit instructions.
Oh yeaa and write some decent compilers to make use of the 64 bit goodness like extra register - where are the promised 20 % improvements?
The K8 core can scale better than Conroe and can crunch trough more instructions/data if the cache subsystem can feed all these to the execution units. Albeit the K8 has to be clocked slightly higher to do that - such is the tradeoff of 3 vs. 4 IPC. Reply
mino - Tuesday, April 11, 2006 - link
1) 3-cycle L1 on K7/K8 is the fastest required, it goes from the internal structure if the scheduler and the pipeline that 2-cycle chache would do almost no good. Also they would have to reduce L1 size to 32k+32k which would hurt. It simply does not make sense to change L1 at all, maybe on K8L but IMHO 128k+128k would help much more than 2-cycle latency.2) 17-cycle L2 is PRETTY GOOD for 1M L2 with exclusive structure!!! IMHO it is possible to do 16-cycle, maybe 15, but nowhere near Dothan's 10-cycle. Also remember lower-latency L2 has scaling problems (that's why intel made prescott's L2 slower than NW's)
3) Concerning the memory subsystem(caches + memory) (on single-socket K8/K8L) the biggest issue is the robustness(amount of on the fly acceses to memory) and latency of the memory controller. To solve this is not trivial thing. IMHO to add 2-4M L3 with random access ~50 cycles would do.
4) In the >4 sockets front all they need is effective caching of MOESI snoops.
You are also forgot K7/K8 is mostly KISS architecture. It is just wery well balanced so has good performance in the end. However do one wrong change and you are screwed.
KISS == Keep It Simple Silly
About "weak" SIMD implementation on AMD, don't fool yourselves guys. Only x86 architecture faster than K8 on SSE/SSE2 is Netburst aka SIMD-by-intel.
About conroe, ita has twice as wide ALU's and FPU's than PIII/K7/K8, this means it has huge resources at disposal to calculate SIMD.
Same goes for K8L 2 quarters later. That said K7/K8 core has far more FP power than P6 architecture. On FP Conroe and K8 are about aquall.
but K8L will wipe the floor with K8 and Conroe on FP. Conroe will wipe K8 on INT and be still faster than K8L by decent margin.
Overall we are for another PIII vs. K7 battle with single very important change - AMD has a platform it had not back in the K7 vs. PIII days. Reply
fitten - Thursday, April 13, 2006 - link
I find the K8L a somewhat odd strategy. I guess they are targeting the Itanium market because Opterons already have a good part of the HPC market. Given that the HPC people are the ones that really care about FPU performance and that they are still a fairly small market segment, it seems an odd target. Integer performance rules the roost for servers... web, database, and just about everything else you can think of other than number crunching simulations and the like. Desktop uses for FPU are a few like games and some mathmatical stuff. Intel is focusing on integer performance at least as much as FPU with Conroe (Conroe gets a good dose of both), which makes sense to me since so much of the work done on computers, both desktops and servers, is dominated by integer operations. K8L speculation says only FPU horsepower will be added... just doesn't seem like a sound decision to me. Reply