TSCP

We apologize for the broken TSCP Makefile in the previous review which rendered our initial results inaccurate.  Fortunately we posted the file so that others were able to detect the error and not find fault with the processors instead.  The large issue many of our readers have brought to our attention are the severe difference in performance between various optimizations.  Below you can see how various compile flags affected our benchmark scores.

The first benchmark is run with the optimization flags:

-O2 -funroll-loops -frerun-cse-after-loop
TSCP 1.8.1 -O2

The next benchmark is run with the optimization flags:

-O3 funroll-loops -frerun-cse-after-loop
TSCP 1.8.1 -O3

Finally, we have the architecture optimized flags as well:

(Intel) -O3 - march=nocona -funroll-loops -frerun-cse-after-loop 
(AMD) -O3 - march=k8 -funroll-loops -frerun-cse-after-loop 
TSCP 1.8.1 -O3 -march

You are reading these charts correctly, the O3 flag actually penalizes the AMD CPU.  We also compiled the program with -O2 -march=k8 but we got virtually the same score with or without the march flag.

We were informed others have been capable of much faster nodes per second using GCC 3.4.1 and the flagset:

-O3 -march=athlon-xp -funroll-loops -fomit-frame-pointer -ffast-math -fbranch-probabilities

We did not have time to fully test GCC 3.4.1, although there is a strong likelihood that 3.4 encourages better optimizations (particularly on the x86_64 platforms).

Crafty

For good measure, we have included Crafty into our chess benchmarks section.  Crafty was only built using the "make linux-amd64" target.  From the Makefile, it seems as though the "AMD64" moniker is slightly inappropriate.  The target claims:

#   -INLINE_AMD       Compiles with the Intel assembly code for FirstOne(),

#                     LastOne() and PopCnt() for the AMD opteron, only tested #                     with the 64-bit opteron GCC compiler.

The benchmark was generated by running the "bench" command inside the program.

Crafty v19.15

It is clear the difference between both processors is quite severe in this instance.  Although it is difficult to pin an exact culprit, there are likely multiple arch optimizations were left untapped, and thus our reasoning for discouraging overusage of optimizations in general.

Database Benchmarks Rendering Benchmarks
Comments Locked

92 Comments

View All Comments

  • Decoder - Thursday, August 12, 2004 - link

    Kris,

    Great review! I wish someone would benchmark AMD 64 and EM64T in 64 bit mode with MORE THAN 4 Gigs of RAM. I heard EM64T takes a hit with more than 4 gigs.

  • offtangent - Thursday, August 12, 2004 - link

    Kris,

    This was a great followup article, and certainly cleared a lot of things up. I was just wondering if its possible to use the SPEC benchmarks in addition to the ones you've used, so we can get the SPECint & SPECfp values to go with it. There are some published values for these on the spec website, but the setups for each of those published results are not the same, so its difficult to put them in perspective. Since you ususally have access to very similar setups, I was wondering if you could add those two tests to your set of benchmarks. Thanks!

    OT
  • Viditor - Thursday, August 12, 2004 - link

    Kris - "To be honest i wouldnt have known some of the mistakes i made had people not been so critical. I am not upset with the final outcome, it happens to everyone"

    And that is why AT is the first site I come to for information...
    Great job, and thanks!

    Cheers,
    Charles
  • T8000 - Thursday, August 12, 2004 - link

    When I compare this review to the previous one, I see two interesting points:

    1. Most benchmarks ran a lot faster without hyperthreading, a scenario that was not tested here.

    2. When enough users (or a user with a lot of names) complain about their favorite product not winning the benchmarks, their product will come out better soon therafter. I wonder if the Celeron 335 would have outperformed the Athlon 64 3800+ as well when this was required in the comments of the Celeron 3xx review by enough user names.
  • trooper11 - Thursday, August 12, 2004 - link

    i just wanted to say im a long time anandtech reader and I appluade the work done wiht this review to clear up the problems with the previous one.

    it takes some guts to come out and admit things were done badly and I can say I can respect the reviews more knowing you all are willing to admit those things, some sites have a problem with that and work with the readers to solve the problem. i have been a fan of the site for several years and I was very suprised at the first review, but now i see you trying to make up for that and go forward, I just want to thank you for the work done on this review.

    it may still not be perfect in answering all the questions, but it certainly goes along way versus the first article. i look forward to follow ups.
  • KristopherKubicki - Thursday, August 12, 2004 - link

    #55: Had a typo when i moved the table back over to make it readable :) They are both registered C3.

    I will work on the color issue more in the future, i just picked the default colors this time around.

    There are new Xeon processors, dubbed Iriwindale, that use 2MB L2 cache. However, the Xeons you see now with large cache are L3.

    Kristopher
  • Anemone - Thursday, August 12, 2004 - link

    It's a small word, but means a lot...

    Thankyou.

  • 2002cbr600f4i - Thursday, August 12, 2004 - link

    Kris,

    First off, MUCH better.... At least this seemed like a more fair fight.

    2 concerns/gripes/comments though...

    1) In the hardware config I noticed that one machine had Unregistered memory with CAS2, the other had Registered CAS 3 memory. Since I know that Opteron requires registered, I'd assume that made the Opteron run the CAS 3 stuff. I really would have prefered to see a CAS 2 to CAS 2 fight (just to keep the apples to apples as much as possible.)

    Second, (and this is a personal gripe against most benchmarking sites) either pick a color code for each brand's processor and use that color for ALL charts showing that processor, or always list them in the same order. Showing the "best one first" can be rather confusing when they're changing order from one chart to the next.

    One other thing... Doesn't the Xeon have more than 1MB L2 cache? I thought the newer ones were all using 2MB or more of that or L3???

    Anyhow, thanks for going back and redoing this work. I don't think any of us hates you personally, we just want to see FAIR and EVEN reporting in general across the board. This review has gone a long way towards restoring my faith in this site.

    --Mike
  • Pumpkinierre - Thursday, August 12, 2004 - link

    Agree with 42 and 52 something wrong with your statement on Blowfish. Also agree with 50 on the power of different optimisations (and its early days for the Nocona). Thanks also for waking up my interest in linux.
  • adiposity - Thursday, August 12, 2004 - link

    Hey snore...

    I noticed the unreadable table, too. I think it's some IE specific code, because I could view it in IE, just not firefox. You'd think linux benchmarks would have mozilla-compliant html :)

    Now, I don't know if it's just me, but I couldn't bring up the forum popup in firefox, either. Why not?

    -Dan

Log in

Don't have an account? Sign up now