Original Link: https://www.anandtech.com/show/1428



Since the excuse to not compare Athlon 64s to Intel Pentium based processors has always been "you can't compare apples to oranges," we found ourselves fairly entertained to come into the possession of a 3.6GHz EM64T Xeon processor. Intel's EM64T is Intel's true x86_64 initiative. This 3.6GHz Xeon processor is actually the exact same CPU in as the LGA775 Pentium 4F we will see in just a few weeks. We are offering a preview of an unreleased processor on 64-bit Linux systems. Now, we have Intel and AMD 64-bit x86 processors, 64-bit Linux operating systems and a few days to get some benchmarking done.

We are going to run the benchmarks for this review slightly different than we have in the past. We want to make our numbers easily replicable for those who have the necessary components, but we also want to show the fullest capabilities of the hardware that we have. Many of our previous benchmarks are not multithread (POV-Ray) or do not scale well. Unfortunately, this forces us to use a lot of synthetic benchmarks; but we feel the overall results are accurate and reflective of the hardware used.

The delicate bit for this review was using the SuSE 9.1 Pro (x86_64) installation rather than compiling it from scratch (à la Gentoo). This was done to preserve the ability to replicate our benchmarks easily. Fedora Core 2 refused to install on the IA32e machine because there was no recognized AMD CPU.

 Performance Test Configuration
Processor(s): Athlon 64 3500+ (130nm, 2.2GHz, 512KB L2 Cache)
Intel Xeon 3.6GHz (90nm, 1MB L2 Cache)
RAM: 2 x 512MB PC-3500 CL2 (400MHz)
2 x 512MB PC2-3200 CL3 (400MHz) Registered
Memory Timings: Default
Hard Drives Seagate 120GB 7200RPM IDE (8Mb buffer)
Operating System(s): SuSE 9.1 Professional (64 bit)
Linux 2.6.4-52-default
Linux 2.6.4-52-smp
Compiler: GCC 3.3.3
Motherboards: NVIDIA NForce3 250 Reference Board
SuperMicro Tumwater X6DA8-G2 (Only 1 CPU)

As there may have been a little confusion from the last review, the DDR PC-3500 only runs at 400MHz. The Infineon Registered RDIMMs used on the Xeon runs at slightly high latencies. All memory runs in dual channel configurations. We removed 1 CPU for the tests in this benchmark, but since HyperThreading was enabled, we used the SMP kernel. During the second half of the benchmarks, SMP was disabled and the tests were re-run under the single CPU generic kernel. These are both 64-bit CPUs, and so, all benchmarks are run on 64-bit OSes with 64-bit binaries wherever possible.




Audio Encoding

Lame was compiled from source without optimizations. We only ran ./configure and make, without any flags. We realize that some people would like to verify our binaries and sample files for their own benchmarks. In order to save bandwidth and prevent copyright infractions, we will provide our test files and binaries under limited circumstances to serious inquiries. We ran lame on a 700MB .wav file using the command equivalent to the one below:

# lame sample.wav -b 192 -m s -h >/dev/null

Encoding time, lower is better.

lame 1.96

POV-RAY

Although POV-RAY is limited in application (particularly when compared against Mental Ray), it does provide a free open source solution for basic rendering. POV-Ray 3.50c was our choice of render engine for this benchmark. For benchmark specifics, we run the exact benchmark as specified by the POV-Ray official site. We use the precompiled RPM for this test.

Render Time in Seconds, less is better.

POV-Ray 3.05c

POV-Ray does not have multithread support, so we were not surprised to see the HyperThreading configuration slowing down to the configuration without HT. We see the Athlon 64 processor pull way ahead; render tasks are extremely CPU and memory dependant. With the memory controller on the CPU, Athlon 64 becomes the stronger offering in this situation.

GZip

To throw in some rudimentary tests for GZip, we used the included GZip 1.3.5 to compress the .wav file from the benchmark above. We do not want to limit our I/O on writing to the hard drive, so the operation is performed as below:

# time gzip -c sample.wav > /dev/null

Gzip 1.3.5

Intel wins their first bout of the analysis, albeit not by much. We will find a recurring pattern later on with integer based calculations and the Nocona Xeon processor. The entire Prescott family of Intel CPUs received a dedicated integer multiplier rather than continually using the floating point multiplier. This becomes extremely useful in some of our other benchmarks.


Database Performance

We will run the standard SQL-bench suite included with RPM MySQL 4.0.20d.

MySQL 4.0.20d - Test-Select

MySQL 4.0.20d - Test-Insert

Of all our benchmarks, the SQL-bench becomes the most baffling. The extremely threaded database application performs particularly poorly with HyperThreading enabled. The Althon 64 outperforms Intel again in this benchmark, and a lot of it is almost certainly accredited to the on die memory controller again.
Update: We copied the 32-bit marks from our benchmark in previous testing instead of the 64-bit. You can view the previous articles here from a month ago. The graphs have also been updated.



Synthetic Benchmarks

Our Nocona server was setup in a remote location with little access, so we had limited time to run as many real world benchmarks as we are typically accustomed to. Fortunately, there are multitudes of synthetic benchmarks that we can use to deduce information quickly and constructively.

Sieve of Atkin (primegen)

Primegen is an older, but still useful library for generating prime numbers in order using the Sieve of Atkin. We compiled the Bernstein implementation by simply running "make". We ran the program as so:

# time ./primes 1 100000000000 > /dev/null

primegen 0.97

We found the benchmark to be extremely reliable and we replicated our figures continually with less than 1% difference.

Super Pi

We ran the Linux compilation of Super Pi 2.0, which is a closed source application. We are not aware of which optimizations are compiled with the program and we are prohibited from redistributing the binaries. Please download the latest binaries from ftp://pi.super-computing.org/Linux. We ran the command:

# ./super_pi 20

Below is the program's output of calculation time in number of seconds.

Super Pi 2.0

After re-running the program several times, our benchmarks never deviated outside of 1%. In a mathematical operation-only situation, the Intel processor has outpaced the AMD offering twice now.




Synthetic Benchmarks (continued)

TSCP

TSCP is a simple chess program, which you may read more about here. We compiled the program using our own Makefile, which you can download here. Once compiled, we ran the "bench" command inside the program. Using the -m64 flags provided no change in performance.

TSCP 1.8.1

As you can see, there appears to be no advantage with HyperThreading for this application. This also appears to be the largest lead that the Intel processor takes over the AMD during the duration of our analysis.
Update:We have retested this part of the benchmark with the -O2 flag in the correct place for both machines. The score has changed to reflect this. br>

ubench

Finally, we have ubench, which stands as the definitive Unix synthetic benchmark. Feel free to learn more about the program here. We compiled the program using ./configure and make with no optimizations. The benchmark was run on a loop ten times to assure that we were getting a true average.

Ubench 0.32 - CPU

Ubench 0.32 - MEM

Ubench 0.32 - AVG

Here, we see HyperThreading working against the Xeon processor in a distinct fashion. According to the Ubench website, both of these machines with single processors outperform dual Xeon 2.4GHz machines, even though they are only running on one processor. The program runs several math-intensive floating point and integer operations over the course of three minutes.




John the Ripper

Out of all of our synthetic benchmarks, John the Ripper is perhaps the most robust; we can benchmark a wide range of encryption algorithms with many or no options very easily and quickly. For this benchmark, we downloaded John the Ripper 1.6. We had originally intended to build the program with the generic Linux make configuration. Unfortunately, John did not want to play nicely with that idea. We only ran the Intel CPU with HyperThreading for this portion of the benchmark.

linux:~/john-1.6/src # make linux-x86-any-elf
ln -sf x86-any.h arch.h
make ../run/john ../run/unshadow ../run/unafs ../run/unique \
JOHN_OBJS="DES_fmt.o DES_std.o BSDI_fmt.o MD5_fmt.o MD5_std.o BF_fmt.o BF_std.o AFS_fmt.o LM_fmt.o batch.o bench.o charset.o common.o compiler.o config.o cracker.o external.o formats.o getopt.o idle.o inc.o john.o list.o loader.o logger.o math.o memory.o misc.o options.o params.o path.o recovery.o rpp.o rules.o signals.o single.o status.o tty.o wordlist.o unshadow.o unafs.o unique.o x86.o" \
CFLAGS="-c -Wall -O2 -fomit-frame-pointer -m486"
make[1]: Entering directory '/root/john-1.6/src'
gcc -c -Wall -O2 -fomit-frame-pointer -m486 -funroll-loops DES_fmt.c
'-m486' is deprecated. Use '-march=i486' or '-mcpu=i486' instead.
cc1: error: CPU you selected does not support x86-64 instruction set
make[1]: *** [DES_fmt.o] Error 1
make[1]: Leaving directory '/root/john-1.6/src'
make: *** [linux-x86-any-elf] Error 2

Undeterred, we proceeded to build John with the generic configuration instead. John optimizes itself during the build, so you may view the builds of each configuration here (Intel) and here (AMD).

For those of you who downloaded the text files, you already know that the Intel CPU has pulled ahead, at least according to John. Below are some of the scores John posted while testing the utility.

John the Ripper 1.6 - Blowfish x32

John the Ripper 1.6 - FreeBSD MD5

John the Ripper 1.6 - DES x725 64/64 BS

As we saw in the intensive math benchmarks, the Athlon 64 has trouble keeping up with the Intel CPU.




Conclusions

Although the Athlon 64 3500+ and the Xeon 3.6GHz EM64T processors were not necessarily designed to compete against each other, we found that comparing the two CPUs was more appropriate than anticipated, particularly in the light of Intel's newest move to bring EM64T to the Pentium 4 line. Once we obtain a sample of the Pentium 4 3.6F, we expect our benchmarks to produce very similar results to the 3.6 Xeon tested for this review.

Without a doubt, the 3.6GHz Xeon trounces over the Athlon 64 3500+ in math-intensive synthetic benchmarks. Again, not that it is really a comparison between the two chips yet anyway, but perhaps something of a marker of things to come. However, real world benchmarks, with the exception of John the Ripper is where AMD came ahead instead. Even though John uses several different optimizations to generate hashes, in every case, the Athlon chip found itself at least 40% behind. Much of this is likely attributed to the additional math tweaking in the Prescott family core, and the lack of optimizations at compile time.

That's not to say that the Xeon CPU necessarily deserves excessive praise just yet. At time of publication, our Xeon processor retails for $850 and the Athlon 3500+ retails for about $500 less. The 3.6F processor the Xeon represents does not even exist in retail channels yet. Also, keep in mind that the AMD processor is clocked 1400MHz slower than the 3.6GHz Xeon. With only a few exceptions, synthetically the 3.6GHz Xeon outperformed our Athlon 64 3500+, whether or not the cost and thermal issues between these two processors are justifiable.

We will benchmark some SMP 3.6GHz Xeons against a pair of Opterons in the near future, so check back regularly for new benchmarks!

Update: We have addressed the issue with the -02 compile options in TSCP, the miscopy from previous benchmarks of the MySQL benchmark, and various other issues here and there in the testing of this processor. Expect a follow up article as soon as possible with an Opteron.

Log in

Don't have an account? Sign up now