We ran MySQLd 4.0.20d from 32-bit binaries precompiled by the SuSE team. The two benchmarks below are sql-bench's test-select and test-insert.


MySQL 4.0.20d - Test-Insert

MySQL 4.0.20d - Test-Select

It is very likely there are a few different factors at work here that put the Socket 754 marks better than the lowly 1.8GHz Socket 462 Athlon 2200+. For starters, we have no doubt the on-CPU memory controller for the Socket 754 family plays a critical role in hastening selects and inserts. Our whole database is stored in memory, so the largest issue becomes IO from the CPU to the RAM. The nForce3 northbridge is also far superior to the nForce2 used on the Athlon 2200+ platform, which also accounts for some IO gains.

Index Chess Benchmarks
Comments Locked

59 Comments

View All Comments

  • Gatak - Thursday, August 19, 2004 - link

    I would like to see a Gentoo 64bit Linux comparison. I know this would take a little longer to achieve, but It would probably show better what 64bit performance would be as everything, including GCC and GLibC would be compiled for the platform.
  • Matthew Daws - Thursday, August 19, 2004 - link

    As a followup to this, I've now realised that TSCP is a chess program! Thus it is most unlikely that GCC is getting any performace gain out of SSE(2) (although, again, it might be using a few SSE commands). That is, unless the source-code for TSCP explicitly uses SSE2, either via intrinsics, or via inline assembly.

    Having looked at the source-code, this is not the case. GCC is in no way making large use of SSE or SSE2. So I fully agree with you Tau

    Curious: On my Celeron 2GHz laptop, I get a score of 258 K Nodes/sec with the default executable I downloaded (TSCP 1.81). Compiling with GCC "march=pentium4 -O3" I get 269K and with "march=pentium4 -O2" I get 260K. Methinks something is wrong, as this is what Kris gets with an Athlon64 2800+

    Kris: Maybe you need to look at what is going on here...
  • Matthew Daws - Thursday, August 19, 2004 - link

    #6: GCC can indeed produce SSE(2) output. There are two modes for SSE: scalar and packed. In scalar, what you get is basically x87 with a flat register file: this makes compiler writing easier, and generally improves performance a bit (a lot for P4 systems, as they don't have the FXCHG intstruction for free anymore). In packed mode, SSE runs in proper SIMD mode, with possibly huge performance increases.

    Now, GCC can issue scalar SSE instructions: indeed, this seems to be the default for the 64-bit compiler, and on my 32-bit system, I notice GCC sneaking in some SSE instructions to do with integer to floating-point convert, say. Under certain -march options, GCC will do most floating-point math in scalar SSE (I am currently trying to help debug some issues with this under Windows, in fact).

    GCC cannot automatically issue packed code though, which I guess is what was bothering you: indeed, it takes a very, very clever compiler to automatically start doing SIMD stuff.

    However, this does mean that I am a little surprised that the AthlonXP was dropped for this test:

    i) AthlonXP DOES HAVE SSE, just not SSE2. As SSE2 only introduces support for "double" floating-point types (at least as far as GCC can exploit), does TSCP use double types?

    ii) As I mentioned about, moving from x87 to scalar SSE(2) only makes a noticable difference on P4 systems: P3 and Athlons have much better x87 (hacks one could say) so I wouldn't expect a huge difference.

    In summary, I wouldn't expect to see SSE2 make a huge difference here, but it is probably being used.

    --Matt
  • theoldwizard - Thursday, August 19, 2004 - link

    I come from the "commercial" world where 64 bit processors (Alpha EV4, 5, 6 and 7 and UltrSparc III) are realy 64 bits. By this I mean all internal data paths, registers, etc, etc are really 64 bits.

    If an Athalon 64 is really a 32 bit core with extra opcodes and microcode to make it look like a 64 bit processor I am very disappointed.

    Everyone has been saying the big advantage of 64 bits is the large address space to handle huge data sets. Trust me, in the "commercial" world, very few Alphas or SUN/Sparcs will ever have even close to 2**32 bytes of memory. The reall advantage has always been in floating point, especially double precision floating point performance.

    www.SPEC.org has been benchmarking processors for many years, and several of their key benchmarks stress the double precision capability of the processor.

    So do any members of the Athalon 64 family have "true" 64 bit internal data paths and registers ?

    Another tip from the Alpha engineers. External data buses were as wide a 256 bits ! Helps to fill that cache fast !!
  • balzi - Wednesday, August 18, 2004 - link

    And further more - the article states that there's 41 replies before this.. when only 14 show up -- this will be 15..
    "things are looking very fishy in Denmark"
    "ahh Switzerland?"
    "yes.. there too"
  • balzi - Wednesday, August 18, 2004 - link

    I have to agree with johnsonx here.
    the graphs were extremely weird..
    The order of the entries was rarely related to anything at all - like normally, the winner would be first, followed by second, etc.. or maybe you'd keep the same order for many different graphs from one benchmark.

    The most annoying thing I came across was when a test was compiled with a bunch of flags.. the "Option" legend entries were exactly upside-down to the graphs.. my brain hurt trying to figure out what was benefitting where??.. owww!!!

    just some thorts.. hope they help.

    Balzi
  • frinky525 - Wednesday, August 18, 2004 - link

    keep up the linux articles kris!

    jason tower
    trilug treasurer
    raleigh, nc
  • KristopherKubicki - Wednesday, August 18, 2004 - link

    JohnsonX, i would agree with you except the fact that the Sempron 3100+ is really just a Newcastle with half cache disabled (and 64-bit disabled). The big difference is the on CPU memory controller.

    Kristopher
  • johnsonx - Wednesday, August 18, 2004 - link

    Regarding model numbers, whatever AMD says the model number targets, what's important to remember is that the model number is only meant to be compared within a single AMD processor family. In the current scheme, Sempron model numbers mean less performance than AthlonXP model numbers, which in turn mean less performance than Athlon64 model numbers.

    This mostly works well except when AMD mixes two processors from different architectures into the same family as they have with the Sempron; it's really tough to apply the same metric to a K7 and a K8.

    I'm not sure if AT has done this, but it might be interesting to compare an AXP 3200+ to a Sempron 3100+; in theory the extra 400Mhz of core clock and extra 256k of cache should enable the AXP to outrun the Sempron in most cases.
  • TrogdorJW - Wednesday, August 18, 2004 - link

    Well, one thing that the benchmarks do show is how the Sempron 3100+ compares with the XP2200+ when they both have the same amount of cache and clock speed. The bus speed is something of a factor, but I doubt that would make up the remaining deficit in performance. It's pretty clear that the integrated memory controller on the Sempron is more than enough to help is pass the Athlon XP in typical Linux use.

    It would be interesting to see an XP-M Barton core clocked at 1.8 GHz with a 9X multiplier, just to take the bus speed out of the equation. But really, it's academic: for the price, the Sempron 3100+ is a good buy.

    Regarding the conclusion with the comment on model numbers, I think it's fair enough for AMD to rate the Sempron agains the Celeron. Which is to say, I hate model numbers in general, but you already know that. :)

Log in

Don't have an account? Sign up now