Linux and L2 Cache; Sempron vs. Athlon

by Kristopher Kubicki on 8/18/2004 2:29 AM EST
POST A COMMENT

59 Comments

Back to Article

  • - Saturday, October 24, 2009 - link

    sell:nike shoes$32,ed hardy(items),jean$30,handbag$35,polo shirt$13,shox$34 Reply
  • n0cmonkey - Friday, August 27, 2004 - link

    HAHA! Still beaten by the VIA c3:
    [q]sempr0n
    aes-128 cbc 48860.49k 50275.58k 51223.21k 51349.16k 51565.91k

    [b]C3
    aes-128-cbc 13090.59k 51065.12k 174593.45k 426600.92k 735548.02k[/b]

    xp
    aes-128 cbc 42891.92k 43746.44k 44291.99k 44433.72k 44470.25k

    64 3000
    aes-128 cbc 54463.69k 56014.76k 56781.82k 57404.76k 57005.40k[/q]

    And that's on OpenBSD. ;)
    Reply
  • balzi - Tuesday, August 24, 2004 - link

    woo-hoo.. let it be known, persistence pays. I even think it's biblical -- gotta be good. Reply
  • KristopherKubicki - Monday, August 23, 2004 - link

    Future-proofing is impossible :)

    But anyways, i am still going to stand by my conclusion; which was that for a 32-bit processor its worth the difference in price. I've got half a dozen machines here that are dying for the K8 architecture but dont need 64-bit; sempron is the better choice since it saves me money and doesnt cash me out for performance.

    Kristopher







    Reply
  • ThePlagiarmaster - Sunday, August 22, 2004 - link

    Kristopher

    I won't argue the 10% of a machine as a whole (you're right). But then you're talking about a completely different machine. I thought we were just talking about a choice between a machine with just two different chips with all other things being equal, just as in the article...The point was which chip to buy, not which computer. Right? In both of my posts I'm talking $16, not $100. In this case, a bit of future-proofing for a measly $16 is a VERY good thing.

    I never mentioned anything about taking 10% from the whole machine (nor did I see that as a point in the article- I saw ONE config, with two different chips compared). Of course that's a huge difference and in that case the results wouldn't look like your article does which shows a pretty close run between the A64 2800+ and Sempron 3100.

    Yes, as I just posted, Win64 looks like 1st half next year. Thats not years away (9 months tops). Once Win9x hit most of the games within 1yr were 32bit (only EA games lagged and still ran on dos for a while). If it's so far away, why are you running 64bit articles? How many 64bit chips do you think will be sold by next Xmas (Intel's selling them now, numbers should rise drastically within the next 15 months). We already know of some game makers saying 64bit is worth it, unreal being the front runner.

    After looking here: http://www.xbitlabs.com/articles/cpu/display/athlo... I'm left wondering if any of your benchmarks are truly 64bit on 64bit OS. Look at RSA Decrypt, thats 4+ times faster! RSA Encrypt is 3+ Times faster! Gzip 2 times faster. Good increase in Divx too (15% or so isn't bad). AES encrypt/decrypt showing 50%+. The worst result I see in the chart for 32bit app+32bit OS vs. 64bit app and 64bit OS is 10%. Thats the WORST result. With a BETA OS that dates back to sept last year (machine only had 1GB of ram also) and beta drivers. Can't get any slower this next year can it? We know filters could show 57% improvement (AMD just showed this a few weeks back with the panorama deal and 47% with the other app Crafty Chess). How many people are buying digital cameras these days? Photo apps are getting easier and easier to use for newbs, filters are common even in the crappy lowend apps. We know the Sempron is slower than the 2800+, so comparing it in these benchmarks (as in xbitlabs) would show the sempron even worse.

    Your own LAME/LINUX 32bit vs 64bit HERE: http://www.anandtech.com/cpuchipsets/showdoc.aspx?... shows a 34% improvement. Why doesn't your lame benchmarks now show this? Don't tell me MP3's aren't popular :) Your own quote in that article (well Anand himself quoted not YOU per se) "The performance improvement here is astounding - in 64-bit mode the Athlon 64 FX managed to finish the encode 34% quicker than in 32-bit mode, if these results are any hint of what could be in store for Windows users, there's a lot of promise behind the Athlon 64...assuming we get software support in time." So we have Encryption (up to 4x+ faster), MP3's (up to 34% faster), Divx (15% faster), Photo filter type crap (up to 57% faster), games (unreal makers say 64bit is faster, yes w/o 4GB+ memory too). What's left? Seems like you need to say $16 is worth it. If you are running 64bit Linux (I believe you are) and 64bit APPS ALSO (can't tell in article) then clearly linux needs work. Beta Win64 stuff shows huge benefits as shown above. I think I showed enough proof on win64 with 64bit above. So I end this here. Know more I can say, nor should I have to.

    Viditor, yes (as anandtech confirmed in earlier sempron article) NXbit is on S754 variants, and NOT on S462 variants.
    Reply
  • Viditor - Saturday, August 21, 2004 - link

    Kris...it seems my question got lost. Does the Sempron have NX bit? Reply
  • KristopherKubicki - Saturday, August 21, 2004 - link

    balzi: Working on it.

    Kristopher
    Reply
  • KristopherKubicki - Saturday, August 21, 2004 - link

    ThePlagiarmaster: My whole argument has been you get an Athlon XP with an onboard memory controller, and you'll see larger performance from the memory controller than the 64-bit things. Sure if 64-bit was here now, i would probably side even stronger with you.

    >If sales people would do their jobs, nobody would ever buy these things.

    I disagree. 10% difference in price is not something to thwart at. There is a budget sector for a reason; I am sure YOU can afford the extra cost, but the fact of the matter is a lot of people cannot. If you could realistically shave 10% off every component in a computer, it becomes a difference between a $900 machine and a $1,000 machine - and those type of costs are significant.

    The argument of "*only* 10% in cost" for features not needed for years still is not valid. Have you checked the last shipping date for Windows 64-bit lately?

    Kristopher

    Reply
  • ThePlagiarmaster - Saturday, August 21, 2004 - link

    Viditor,

    Supposedly all S754 variants will have it. NONE of the S462's will. But I still can't see buying a sempron when we're only talking $20 for the real thing. Scratch that, $16 just checked Pricewatch. Pretty soon we'll be dealing with a lot more encrypted pages etc. Not to mention apps (filters are easily optimized) and games that will be 64bit friendly (no I'm not talking over 4gb needed here). In encryption and zipping alone we're talking more than 50% improvements. Some of the encryption stuff shows 2x-3x improvements. Why would anyone want a sempron? I guess I'm saying I have REAL problems with the conclusion in the article :) People should be told "PLEASE, buy the 64bit chip and forget Sempron - Do yourself a favor".
    If AMD wanted a value chip they should have just made it a BIT slower, not turned off features that are VERY important. I can understand the AthlonXP version, they're trying to keep the socket alive for mainboard makers etc, but a disabled S754 version just sucks. In 6 months people will be saying, "damn, you mean I could have had all that for $16 more?". Lets also not forget that it is actually a FASTER chip for that $16 also. Not by much, but it is faster in everything. People only buy these things because they are completely uninformed about what they are LOSING by saving $16. If sales people would do their jobs, nobody would ever buy these things.
    Reply
  • Viditor - Saturday, August 21, 2004 - link

    Kris - Thanks for the article.

    About the conclusion, one point that wasn't mentioned (and that I have been unable to find info on) is whether or not the Sempron has NX bit?
    If the answer is no, then that would be another benefit to going with the A64 2800+...
    Reply
  • KristopherKubicki - Saturday, August 21, 2004 - link

    Aces options actually degrade performance on our test machine.

    Kristopher
    Reply
  • KristopherKubicki - Saturday, August 21, 2004 - link

    I am not making these up... really.


    Xeon 3.6GHz EM64T, 1GB DDR2-400, TSCP 1.8.1
    =================================================================
    linux:~/work/tscp181 # /opt/gcc-mainline/bin/gcc -v
    Reading specs from /opt/gcc-mainline/lib64/gcc/x86_64-suse-linux/3.4.1/specs
    Configured with: ../configure --enable-threads=posix --prefix=/opt/gcc-mainline --with-local-prefix=/usr/local --infodir=/opt/gcc-mainline/share/info --mandir=/opt/gcc-mainline/share/man --libdir=/opt/gcc-mainline/lib64 --libexecdir=/opt/gcc-mainline/lib64 --enable-languages=c,c++,f77,objc,java,ada --enable-checking --enable-libgcj --with-gxx-include-dir=/opt/gcc-mainline/include/g++ --with-slibdir=/lib64 --with-system-zlib --enable-shared --enable-__cxa_atexit x86_64-suse-linux
    Thread model: posix
    gcc version 3.4.1 20040508 (prerelease) (SuSE Linux)
    =================================================================

    -O3 -funroll-loops -frerun-cse-after-loop -march=nocona
    Nodes per second: 388145 (Score: 1.596)

    -O2 -funroll-loops -frerun-cse-after-loop -march=nocona
    Nodes per second: 365722 (Score: 1.504)

    -O3 -funroll-loops -frerun-cse-after-loop
    Nodes per second: 378021 (Score: 1.555)

    -O2 -funroll-loops -frerun-cse-after-loop
    Nodes per second: 365722 (Score: 1.504)

    -O3 -march=nocona -funroll-loops -fomit-frame-pointer -ffast-math -fprofile-arcs
    Nodes per second: 311526 (Score: 1.281)

    -O2 -march=nocona -funroll-loops -fomit-frame-pointer -ffast-math -fprofile-arcs
    Nodes per second: 299173 (Score: 1.230)

    -O2 -funroll-loops -fomit-frame-pointer -ffast-math -fprofile-arcs
    Nodes per second: 279724 (Score: 1.150)

    -O3 -funroll-loops -fomit-frame-pointer -ffast-math -fprofile-arcs
    Nodes per second: 299173 (Score: 1.230)

    Reply
  • Matthew Daws - Saturday, August 21, 2004 - link

    Not true. These options are on at least GCC 3.2.2, and on the P4 system I have access to (it's a university computer) I get 422K nodes/sec using the above compiler settings from Ace's.

    --Matt
    Reply
  • KristopherKubicki - Saturday, August 21, 2004 - link

    Matthew Daws: Again, he is using GCC 3.4.1 which has huge optimizations and is something we havent moved over to yet.

    Kristopher
    Reply
  • ThePlagiarmaster - Saturday, August 21, 2004 - link

    Oops, forgot, MS says 1st half 2005 now for Win64. So we can expect it in June...ROFL. Still the Semprons will be eaten for lunch then by next xmas by 64bit chips that are only $20 more right now. Then again, AMD could just solve the problem by turning on 64bit for Semprons :)

    Plag
    Reply
  • Matthew Daws - Saturday, August 21, 2004 - link

    Kris,

    Sorry to keep harping on here. But if you look over at Ace's:

    http://www.aceshardware.com/forum?read=115094123

    You'll find the compiler options you need to get much better results (I'm getting 291K now, on a 2GHz celeron). The general opinion is that TSCP favours the P4 without some careful compiler work. The Athlon numbers, with stock compiler options, are probably OK. But the P4 numbers in the older article seem very low...

    --Matt
    Reply
  • ThePlagiarmaster - Saturday, August 21, 2004 - link

    I'm having a hard time with any recommendation of the sempron over 64bit cpus that are only 10% more (we're talking like $20 here). Nobody will use more than 4gb with these. Thats a given. However, the 64bitness can't be overlooked. Look at the examples AMD has already showed (recently for example). That panorama filter they showed with 57% improvement in speed, and the other thing in the same news post showing 47% improvement. AFAIK neither of these were using more than 4GB. This is with a BETA Win64!
    http://www.amd.com/us-en/Corporate/VirtualPressRoo...

    These are only two examples of TONS that will be on the way shortly (immediately following the OS from MS that is). Intel is now backing this stuff too. Expect more 64bit ports, especially with MS finally getting off the collective ARSES and saying windows64 will be done this year (nah, I say jan/feb...but the point's still valid). This stuff is coming (encryption shows HUGE benefits, and zipping too with nowhere near 4GB), why cut yourself from the game for $20? If $20 is going to break your bank, you have no business buying a PC. Spend it on your kids diapers or shoes instead...LOL

    Plag
    Reply
  • Matthew Daws - Saturday, August 21, 2004 - link

    Kris,

    I found the following in the source file main.c for TSCP 1.8.1:

    /* Score: 1.000 = my Athlon XP 2000+ */

    Checking, this means that the author gets circa 243K nodes/sec with his Athlon XP 2000+. I think, in light of this, that my numbers seem correct and yours seem way of base.

    Cheers, --Matt
    Reply
  • balzi - Saturday, August 21, 2004 - link

    Helloooo.. !!!! am I using a mute account??
    is there any answer to the muddle of benchmark graphs.. please humour me by actually saying something.. Even 'I couldn't be stuffed fixing them' would be good.

    thanks
    Reply
  • PrinceGaz - Friday, August 20, 2004 - link

    40-bit physical address space is 1TB, the 48-bit virtual address space allows for a range of up to 256TB. I think that should be sufficient for the lifetime of the Opteron / Athlon 64. Reply
  • Matthew Daws - Friday, August 20, 2004 - link

    Kris: Oh, okay, sorry, yes, that makes sense. Have you tried the Windows executable yet? I've verified that with TSCP 1.7.3 I'm getting reasonable results, so it seems likely that my results with v1.8.1 are not too far off base...

    --Matt
    Reply
  • KristopherKubicki - Friday, August 20, 2004 - link

    Rys:

    Correct, but that doesnt mean it still does not lack 64-bit addressing ;) And in reality, how critical would any desktop CPU today need to address more than 48bits ? Isnt that something like 1TB?

    Kristopher

    Reply
  • KristopherKubicki - Friday, August 20, 2004 - link

    Matthew: I read this:

    >You shouldn't see any difference with linux:
    >indeed, only a linux box I have access to, with GCC
    >3.2.2 (I *think* it's a P4 2.8GHz, but I'm not 100%
    >sure: I'm doing a remote-login right now, so cannot
    >check!) I get 365K with (-O3 -march=pentium4).

    So your 2.8C is getting the same marks as my 3.6 nocona -- is what i meant.

    Kristopher
    Reply
  • Rys - Friday, August 20, 2004 - link

    You repeatedly mention the Sempron's 'lack of 64-bit addressing'. None of the CPU's on test, including the Athlon 64, can address a 64-bit memory space. All current AMD64 implementations can only address a 40-bit physical address space and a 48-bit virtual address space.

    Rys
    Reply
  • Matthew Daws - Friday, August 20, 2004 - link

    Kris,

    I am confused: in the link you gave me, the Xeon is getting circa 350K, which is way better than I getting, as expected... Okay, so it's low clock for clock, but you said: "Youre getting higher numbers than i got with my Xeon 3.6GHz chip."

    --Matt
    Reply
  • Matthew Daws - Friday, August 20, 2004 - link

    Kris,

    I've downloaded TSCP 1.7.3 which Tom Kerrigan has collected a lot of benchmarking data about. It also gives a MIPS rating: I get 2136 MIPS (with GCC -O3 -march=pentium4) and 2174 (with the included windows benchmark) These compare well with the data on his website (http://home.comcast.net/~tckerrigan/bench.html) where this puts my 2GHZ Celeron at about the same level an a P4 1800, which seems reasonable for a heavy CPU benchmark.

    I suggested that on your test system you run the precompiled Windows executable which Tom gives: this should give an approximate value, as Visual C++ and GCC produce roughly the same performance of code, and with this benchmark, switching between Windows and Linux really shouldn't make a difference. You might also try the earlier code, as I have just done, and then you'll have a 3rd party (namely Tom's list) to compare against...

    --Matt
    Reply
  • Wesley Fink - Friday, August 20, 2004 - link

    #32 - I would think the CPU scaling charts in Doom 3 at http://www.anandtech.com/cpuchipsets/showdoc.aspx?... would be all the proof you need to see the 3100+ is the better value. The 3100+ is 75.3FPS, the XP 3200+ is 68FPS, and the 2500+ is 55.6FPS.

    If DX9 game performance is not convincing, then you might refresh your memory in pages and pages of benchmarks comparing the 3100+ 754 and 2500+ Socket A in http://www.anandtech.com/cpuchipsets/showdoc.aspx?...
    Reply
  • thornc - Friday, August 20, 2004 - link

    My main problem with the article is that the Athlon XP 2500+ with Barton core was not included!
    I am thinking of getting a new system and I intend to use a Barton XP, but if I might change my mind if I see prove that the Sempron is a better deal.

    Reply
  • TauCeti - Friday, August 20, 2004 - link

    Hi Matthew, Kris and also hello to dougSF30 from siliconinvestor ;)

    Nice discussion here!

    I think there is a way to _really_ understand e.g. the TSCP benchmark scores on all AMD CPUs.

    AMD offers (for free) the "AMD CodeAnalyst™ Performance Analyzer for Linux 2.2" and the "AMD Simulation Utilities 2.1"

    afaik it is possible with those tools to profile any target code (in this case the TSCP bench) and then even simulate the execution of that code for different CPUs down to assembler code/cache state and even deeper into CPU execution unit usage.

    quote from AMD: "The data presented is at the assembly instruction level and is not intended to assist a programmer working in a high-level language. The detailed data on the execution of each instruction takes into account the previous instructions executed and the state of the processor caches. The data is obtained by running the target block of code, then using the debug capabilities of the processor to single step through each opcode to obtain an execution trace. This execution trace is then fed into a Processor Simulation that analyzes the execution."

    I think getting used to this tool and knowing how to interpret the results would be very fruitful for any reviewer (and programmer). Dresdenboy at mersenneforum.org used it to dissect the low SSE2 Performance of the AMD64s in the prime95 code.

    The downside is that this does not look like an easy task even for a 'normal' experienced programmer. I even don't know if such large code-blocks like TSCPs bench can be used with the tool or if it is only suitable for inner-loop optimization.

    Tau
    Reply
  • balzi - Friday, August 20, 2004 - link

    So I take it by the ignorance that no one really cares if the graphs are readable -- oh well.. I'll still read Anandtech but maybe I won't enjoy it as much in the future. Reply
  • dougSF30 - Friday, August 20, 2004 - link

    Kris,

    It doesn't matter if Intel is formally using GHz any longer or not. (And GHZ is still featured prominently in nearly every Intel part or system offered for sale today, but that is really beside the point.)

    The simplest way to put it is, whether or not any of us LIKES it, Sempron PR is not designed to be equivalent to A64 PR.

    Thus it is misleading to imply that there may be something wrong with the Sempron 3100+ PR rating based on relative performance with any A64 parts.

    Sure, I'd like it if AMD copied Intel model numbers for all their parts, but there may be a legal reason they are hesistant to do so.

    Anyway, just a small nitpick in an otherwise excellent review.

    Doug
    Reply
  • KristopherKubicki - Thursday, August 19, 2004 - link

    Hi Doug,

    Since i noticed you copied your email along in the comments, ill copy your response back into the comments too:

    Hi Doug,

    I would agree with your logic except the fact that Celerons no longer go by any sort of similar rating system. If you look back to May before the Sempron line was announced, AMD was beginning to rollout a Product Code numbering scheme.

    http://www.anandtech.com/cpuchipsets/showdoc.aspx?...

    Then suddenly, this was scrapped from the roadmaps and AMD went back to the PR rating system, *after* the induction of the Celeron D line. You and I know the Sempron 3100+ competes against a Celeron D ~340 but that is definitely obscured by the PR rating.

    Your claim is that since Intel uses a higher GHz rating for its older Celeron CPUs, AMD should be allowed to do the same for its budget Semprons. I don't think its acceptable for AMD nor Intel to use a PR or GHz rating system to sell their processors if they don't adhere by the same rating standards from processor to processor!

    Let's face it, AMD already does it with the 3400+ and the 3500+; dual channel or not they perform within 1% of each other! Do dual channel Athlons get a different rating system than single channel? In the same argument do we claim half cache processors get a different rating system than full cache ones?

    Kristopher Kubicki
    Senior Editor, AnandTech.com
    email: kristopher@anandtech.com
    Reply
  • dougSF30 - Thursday, August 19, 2004 - link

    Kris,


    AMD has made it very clear that the Sempron PR rating system is NOT equivalent to the A64 PR rating system.


    So you can't conclude that you cannot "vouch" for the Sempron rating of 3100+, compared to the A64 3000+ or 2800+, as those figures are NOT MEANT to be compared.


    Sempron PR is designed to rate against Celeron clockspeed, whatever AMD says officially about a "different suite" of benchmarks for legal reasons.


    And A64 PR is designed to rate against the full P4.


    So, given that Celeron performance is much less than P4 performance at the same clockspeed, another way to say this is:


    For a given level of performance, Celeron clock is much higher than P4 clock.


    Thus is follows automatically that Sempron PR is higher than A64 PR for a given level of performance.

    It's not "moot" because Intel is also labeling Celeron parts with Model numbers... the point is still valid: Sempon PR is a completely different rating system than A64 PR.

    The only way AMD could have been less confusing would have been to copy the Intel Celeron model numbers, with the Sempron "330" "340", etc., but there may be reasons (legal?) they cannot do that.

    Pretty simple, no?


    Doug
    Reply
  • KristopherKubicki - Thursday, August 19, 2004 - link

    Matt did you see this as well:

    http://www.anandtech.com/linux/showdoc.aspx?i=2163...

    Youre getting higher numbers than i got with my Xeon 3.6GHz chip.

    Kristopher
    Reply
  • Matthew Daws - Thursday, August 19, 2004 - link

    Some further comments about TSCP: I found an old article over at Ace's which used it. The numbers don't seem comparable, but the article does say that the P4 does very, very well. Still not sure why I am getting different numbers to you. Have you run the windows benchmark which can be downloaded: that should give an indication of the numbers you might expect on linux...

    --Matt
    Reply
  • PrinceGaz - Thursday, August 19, 2004 - link

    Interesting results but it would probably have been more relevant to the majority if it was the standard Windows benchmarks as everything was 32-bit.

    When I saw "L2 cache: Sempron vs Athlon" and "three 1.8GHz offerings from AMD", I really expected to see the Sempr0n 3100+ (256K), and Athlon 64 2800+ (512K) that you used, tested along with a Athlon 64 3200+ (1MB) set to a 9x multiplier. Then we'd see all three cache sizes on otherwise identical chips in 32-bit mode to truly show what effect L2 cache size has. Throwing in an Athlon XP instead as a third AMD 1.8GHz chip was rather meaningless as there are far too many other differences.

    The results do reflect what we've seen in the past that the 512K -> 256K L2 cache halving doesn't have a significant impact on performance in most apps, certainly not the crippling effect it has on the P4 architecture. Of course with the exclusive 128K L1 cache we're really only looking at a 40% (640K -> 384K) cache reduction.

    I've got to disagree with your conclusion I'm afraid. Given what is a very small price difference between the Sempr0n 3100+ and A64 2800+, spending the $20 extra for the A64 2800+ is a no brainer when you consider total system cost. Throw in just a S754 mobo and the performance difference alone already makes the A64 2800+ a viable option. People buying S754 systems aren't seriously looking to upgrade in the future (else they'd go S939). And being stuck with a Sempr0n 3100+ means you miss out on all the benefits of 64-bit in a year or two.
    Reply
  • Matthew Daws - Thursday, August 19, 2004 - link

    Kris: Hmm, well, using -O3 -march=pentium4, under Windows, I get with my Celeron 2GHz:

    GCC 3.3.3 -- 272K
    GCC 3.4.1 -- 280K

    GCC 3.3.3 (-02 -march=pentium4) -- 273K
    GCC 3.3.3 (-O3 only) -- 262K

    Just with pure clock-speed scaling, I'd expect 20% increase with a 2.4C, so 300K or so...

    You shouldn't see any difference with linux: indeed, only a linux box I have access to, with GCC 3.2.2 (I *think* it's a P4 2.8GHz, but I'm not 100% sure: I'm doing a remote-login right now, so cannot check!) I get 365K with (-O3 -march=pentium4). This seems to be pretty linear clock scaling, which we might expect if the memory usage is low...

    An Athlon *should* excel at this sort of test, at least given other benchmarks.

    Just to check: I am using the source-code from Tom Kerrigan, at http://home.comcast.net/~tckerrigan/tscp181.zip

    --Matt
    Reply
  • KristopherKubicki - Thursday, August 19, 2004 - link

    Hi Matt: Just -O2/3 -march=athlon

    I emailed a few people about your results. I have a 2.4C here that only does 250K with GCC 3.3.3

    Kristopher
    Reply
  • Matthew Daws - Thursday, August 19, 2004 - link

    Kris: I don't have an AthlonXP, so I cannot comment. I was using GCC 3.4.1 (MinGW version for Windows) which might explain the difference. Still, I would expect any of the processors in this test to completely thrash my Celeron...

    What flags were you using with the AthlonXP? I'm pretty sure it's not an SSE(2) issue...

    --Matt
    Reply
  • KristopherKubicki - Thursday, August 19, 2004 - link

    Matthew Daws: I was getting wild results with my TSCP on the athlon xp, which is why i didnt include it. I assumed there was some optimization somewhere that shouldnt have been.

    which GCC version are you using? on the a64 platforms ive seen as much as 30% increases using GCC 3.4.1 over 3.3.3.

    Kristopher
    Reply
  • Gatak - Thursday, August 19, 2004 - link

    I would like to see a Gentoo 64bit Linux comparison. I know this would take a little longer to achieve, but It would probably show better what 64bit performance would be as everything, including GCC and GLibC would be compiled for the platform. Reply
  • Matthew Daws - Thursday, August 19, 2004 - link

    As a followup to this, I've now realised that TSCP is a chess program! Thus it is most unlikely that GCC is getting any performace gain out of SSE(2) (although, again, it might be using a few SSE commands). That is, unless the source-code for TSCP explicitly uses SSE2, either via intrinsics, or via inline assembly.

    Having looked at the source-code, this is not the case. GCC is in no way making large use of SSE or SSE2. So I fully agree with you Tau

    Curious: On my Celeron 2GHz laptop, I get a score of 258 K Nodes/sec with the default executable I downloaded (TSCP 1.81). Compiling with GCC "march=pentium4 -O3" I get 269K and with "march=pentium4 -O2" I get 260K. Methinks something is wrong, as this is what Kris gets with an Athlon64 2800+

    Kris: Maybe you need to look at what is going on here...
    Reply
  • Matthew Daws - Thursday, August 19, 2004 - link

    #6: GCC can indeed produce SSE(2) output. There are two modes for SSE: scalar and packed. In scalar, what you get is basically x87 with a flat register file: this makes compiler writing easier, and generally improves performance a bit (a lot for P4 systems, as they don't have the FXCHG intstruction for free anymore). In packed mode, SSE runs in proper SIMD mode, with possibly huge performance increases.

    Now, GCC can issue scalar SSE instructions: indeed, this seems to be the default for the 64-bit compiler, and on my 32-bit system, I notice GCC sneaking in some SSE instructions to do with integer to floating-point convert, say. Under certain -march options, GCC will do most floating-point math in scalar SSE (I am currently trying to help debug some issues with this under Windows, in fact).

    GCC cannot automatically issue packed code though, which I guess is what was bothering you: indeed, it takes a very, very clever compiler to automatically start doing SIMD stuff.

    However, this does mean that I am a little surprised that the AthlonXP was dropped for this test:

    i) AthlonXP DOES HAVE SSE, just not SSE2. As SSE2 only introduces support for "double" floating-point types (at least as far as GCC can exploit), does TSCP use double types?

    ii) As I mentioned about, moving from x87 to scalar SSE(2) only makes a noticable difference on P4 systems: P3 and Athlons have much better x87 (hacks one could say) so I wouldn't expect a huge difference.

    In summary, I wouldn't expect to see SSE2 make a huge difference here, but it is probably being used.

    --Matt
    Reply
  • theoldwizard - Thursday, August 19, 2004 - link

    I come from the "commercial" world where 64 bit processors (Alpha EV4, 5, 6 and 7 and UltrSparc III) are realy 64 bits. By this I mean all internal data paths, registers, etc, etc are really 64 bits.

    If an Athalon 64 is really a 32 bit core with extra opcodes and microcode to make it look like a 64 bit processor I am very disappointed.

    Everyone has been saying the big advantage of 64 bits is the large address space to handle huge data sets. Trust me, in the "commercial" world, very few Alphas or SUN/Sparcs will ever have even close to 2**32 bytes of memory. The reall advantage has always been in floating point, especially double precision floating point performance.

    www.SPEC.org has been benchmarking processors for many years, and several of their key benchmarks stress the double precision capability of the processor.

    So do any members of the Athalon 64 family have "true" 64 bit internal data paths and registers ?

    Another tip from the Alpha engineers. External data buses were as wide a 256 bits ! Helps to fill that cache fast !!
    Reply
  • balzi - Wednesday, August 18, 2004 - link

    And further more - the article states that there's 41 replies before this.. when only 14 show up -- this will be 15..
    "things are looking very fishy in Denmark"
    "ahh Switzerland?"
    "yes.. there too"
    Reply
  • balzi - Wednesday, August 18, 2004 - link

    I have to agree with johnsonx here.
    the graphs were extremely weird..
    The order of the entries was rarely related to anything at all - like normally, the winner would be first, followed by second, etc.. or maybe you'd keep the same order for many different graphs from one benchmark.

    The most annoying thing I came across was when a test was compiled with a bunch of flags.. the "Option" legend entries were exactly upside-down to the graphs.. my brain hurt trying to figure out what was benefitting where??.. owww!!!

    just some thorts.. hope they help.

    Balzi
    Reply
  • frinky525 - Wednesday, August 18, 2004 - link

    keep up the linux articles kris!

    jason tower
    trilug treasurer
    raleigh, nc
    Reply
  • KristopherKubicki - Wednesday, August 18, 2004 - link

    JohnsonX, i would agree with you except the fact that the Sempron 3100+ is really just a Newcastle with half cache disabled (and 64-bit disabled). The big difference is the on CPU memory controller.

    Kristopher
    Reply
  • johnsonx - Wednesday, August 18, 2004 - link

    Regarding model numbers, whatever AMD says the model number targets, what's important to remember is that the model number is only meant to be compared within a single AMD processor family. In the current scheme, Sempron model numbers mean less performance than AthlonXP model numbers, which in turn mean less performance than Athlon64 model numbers.

    This mostly works well except when AMD mixes two processors from different architectures into the same family as they have with the Sempron; it's really tough to apply the same metric to a K7 and a K8.

    I'm not sure if AT has done this, but it might be interesting to compare an AXP 3200+ to a Sempron 3100+; in theory the extra 400Mhz of core clock and extra 256k of cache should enable the AXP to outrun the Sempron in most cases.
    Reply
  • TrogdorJW - Wednesday, August 18, 2004 - link

    Well, one thing that the benchmarks do show is how the Sempron 3100+ compares with the XP2200+ when they both have the same amount of cache and clock speed. The bus speed is something of a factor, but I doubt that would make up the remaining deficit in performance. It's pretty clear that the integrated memory controller on the Sempron is more than enough to help is pass the Athlon XP in typical Linux use.

    It would be interesting to see an XP-M Barton core clocked at 1.8 GHz with a 9X multiplier, just to take the bus speed out of the equation. But really, it's academic: for the price, the Sempron 3100+ is a good buy.

    Regarding the conclusion with the comment on model numbers, I think it's fair enough for AMD to rate the Sempron agains the Celeron. Which is to say, I hate model numbers in general, but you already know that. :)
    Reply
  • AnonymouseUser - Wednesday, August 18, 2004 - link

    Yes, more everyday apps for benches, please. I wanna know which of the CPUs will be fastest to compromise Windows XP on a fresh install, how fast XP can install IE toolbars and Comet Cursor, how many IE and Messenger popups can be done in one minute, how long it takes to run a full system virus and spyware scan, etc. Also, I need to know which one boots fastest for all the reboots necessary. :)

    FWIW, I think the XP 2200+ is a good choice for comparison. Same clock speed and cache of the Sempron 3100+ shows how much better the new core is.
    Reply
  • phaxmohdem - Wednesday, August 18, 2004 - link

    I would love to see an old school 1.8 GHz P4 400 FSB pitted against these processors. Not quite fair I know, but I like seeing how incredibly crappy those old p4s were. Clock for clock comparisons interest me though, good article, however as sems to be the general concensus, mre everyday apps would be helpfull for benchmarks. Reply
  • Illissius - Wednesday, August 18, 2004 - link

    It's not a bad review, but I don't entirely get the point (or rather, entirely don't). Using Linux makes good sense when you're benching either 64-bit and/or server processors, but this was neither. Most people who're actually deciding between an A64 2800+ and a Sempron 3100+ would've been much more interested in your standard benchmark suite of desktop applications and games. Reply
  • TauCeti - Wednesday, August 18, 2004 - link

    First: kudos for the new comparison. I would imagine myself still cursing (and worse) the unfair readers after the recent onslaught ;)

    Second: TSCP/SSE2

    Ok, i admit that i ditched compiler lectures at university BUT: Did GCC really generate SSE2-code for the TSCP sources?

    You wrote that you ommitted the XP scores because of SSE2. Did you check if SSE2 code was generated on the AMD64s?

    I checked TSCP source but i have no idea where the compiler would opt to use SSE2 at all.

    PLease give me a hint (this is not ironic, i really want to know how the compiler managed to use SSE2 for TSCP)

    Tau
    Reply
  • johnsonx - Wednesday, August 18, 2004 - link

    In general, I found all the graphs to be oddly arranged. Since the point of the article was to compare the Sempron 3100+ to the A64 2800+, it would have been a little clearly if those two had always been graphed together. As it stands now, the 3100+ was always at the top of the multi-bar graphs, followed by the 3000+, then the 2800+ and finally the AXP. I kept having to jump over the 3000+ scores to see the benefit (or lack thereof) of the 512k cache. In general, I think the order of the graphs should be the same throughout the article, and any chip(s) that are the particular highlight of the article should be group together somehow.

    Secondly, I'd like to second the suggestion that an AthlonXP 2500+ would have made an interesting point of comparison as well, though I do realize the 1.8Ghz Socket-754 were in fact the point of the article.

    Regards,

    Dave
    Reply
  • DerwenArtos12 - Wednesday, August 18, 2004 - link

    I really wish you had included the Athlon XP 2500+ barton as a reference cpu as it runs at 1.83ghz on the socket A platform and has a 333fsb wich is easir to bomapre to the 400fsb A64 and 400fsb Sempron 3100+. plus that woudl give an idea of how the cache per platform makes a difference as it has the 512k l2 cache to compare to the 256 L2 on the 2200+ plus the core revisions of going to barton give a better idea of current competitors. teh 2200+ is in a completely different price range than the other three processors here where the 2500+ would also closer there. Just my opinion. but I think it would have added a much better current market perfomance comparison. Reply
  • skiboysteve - Wednesday, August 18, 2004 - link

    on your Gzip bench the graph is ordered odd Reply
  • skiboysteve - Wednesday, August 18, 2004 - link

    wierd Reply
  • KristopherKubicki - Wednesday, August 18, 2004 - link

    Something happened to the document engine and the article posted while I was still working on it. That has been fixed.

    Kristopher
    Reply

Log in

Don't have an account? Sign up now