32-bit vs. 64-bit Performance

Our entire benchmark suite to this point has been on 32-bit applications under a 32-bit OS, mostly because there are no good desktop 64-bit applications at this point in a popular 64-bit OS (not to mention the issues with 64-bit Windows XP we described earlier).

Under Linux however we don't have to wait for applications to be released in a 64-bit version, we can simply recompile them. Linux would thus provide us with an excellent venue to see the tangible performance increases from exposing the additional general purpose registers in 64-bit mode.

We ran all benchmarks on Red Hat Enterprise 2.9.5WS (Taroon), a beta release, booted in single user mode to avoid system services interfering with benchmark results. Neither Red Hat 9 nor 9.0.93 Beta (Severn) supply a 64-bit compiler or libraries, which is why we used Taroon.

The Taroon kernel initially had issues on startup requiring us to disable APIC and ACPI support to get it to install. Once actually running the OS was quite stable however DMA disk access was disabled for some reason.

We used the following compiler that came with Taroon:

gcc 3.2.3 20030502 (Red Hat Linux 3.2.3-16)

And the following kernel:

2.4.21-1.1931.2.393.ent

With this compiler and kernel we ran the following tests:

Whetstone

A simple C loop measuring floating point performance, configured to do double precision calculations.

Compiled with:
-O3 -msse2 -mfpmath=sse (and -m32 for 32bit, -m64 for 64bit)

The performance improvements due to 64-bit are in the 10 - 20% range we mentioned earlier.

Bytemark

An old integer CPU benchmark (FP results were discarded) - for more information on the tests visit this site.

Compiled with:
-O3 -msse2 -mfpmath=sse (and -m32 for 32bit, -m64 for 64bit)

Here we do see a small 2% drop in performance when moving to 64-bit in one test, however the rest of the tests show a 0 - 15% improvement across the board.

Lame 3.93

A MP3 encoder; encoded a 40minute .wav file (403MB).
Lame args: -b 192 -m s -h --quiet <file> - >/dev/null
(192kbps, simple stereo, high quality, output to nothing to avoid disk hits)

Compiled with:
-O3 -fomit-frame-pointer -fno-strength-reduce -malign-functions=4 -funroll-loops -ffast-math -msse2 -mfpmath=sse (again, -m32 for 32bit, -m64 for 64bit)

The performance improvement here is astounding - in 64-bit mode the Athlon 64 FX managed to finish the encode 34% quicker than in 32-bit mode, if these results are any hint of what could be in store for Windows users, there's a lot of promise behind the Athlon 64...assuming we get software support in time.

We wanted to do a transcode benchmark but that didn't work out - one library found a bug in gcc and transcode refused to compile. It actually forced a compile error because a structure came out padded, meaning they didn't expect anyone to run it on a 64bit machine just yet.

3D Rendering Final Words
Comments Locked

122 Comments

View All Comments

  • Anonymous User - Thursday, September 25, 2003 - link

    The Athlon64 FX doesn't have a multiplier lock either, but we never saw any results from that. Also I don't think a chip overclocking well means it's designed for "higher clock speeds".
  • Anonymous User - Thursday, September 25, 2003 - link

    toms just revised their review, "Update Sept 24,2003: Unfortunately we have made a mistake in the original article: In addition to the official P4 EE 3.2GHz we had included benchmark scores of the P4 Extreme 3.4GHz and 3.6GHz. These values were planned for a future THG article and were not intended to be included here. We would like to apologize especially to those readers who misinterpreted our charts. The two bars of the P4 Extreme 3.4GHz and 3.6GHz have now been removed. However, this issue does not affect our conclusion as we have only compared the official P4 3.2GHz EE to all other test candidates in our original article. For your information: The press sample of the P4 Extreme provided by Intel does not have a multiplier lock and is already designed for higher clock speeds. "
  • Anonymous User - Thursday, September 25, 2003 - link

    #81
    I also question why toms have a review to overclock P4 3.2 EE to 3.6 to win every performance chart. Is it fair to AMD? I like Intel CPU but I also like fair review.
  • Anonymous User - Thursday, September 25, 2003 - link

    AMD needs to almost give this thing away so that it can sell well thus attracting a flood of 64 big developers. I think they should even do this to the detriment of their profit margins because if this doesnt sell well then all the software wont be developed. Its kinda like the chicken and the egg here and I think AMD should take a beating now in terms of $ to get this thing out and get 64 bit in the hands of the people. If everyone has it the software will follow.
  • Anonymous User - Thursday, September 25, 2003 - link

    Logic dictates that people whom use the term "fanboy" are mentally disturbed persons whom feel the need to categorize others into a certain group to make themselves feel better. On a side note though I think the Athlon64 3200+ is winner given its current availability, price, and performance. I’m just curious as to how far AMD hopes to scale the processor for the remainder of the year as though I already know there will be a 3400+ release in short time, I am wondering if there will be a 3600+ release in anticipation of Prescott. I’m also curious as to how quickly AMD will transition it to 90nm as I’m thinking one of the main reasons AMD hasn’t really made full effort in mass producing K8 processors are the manufacturing costs at 130nm. Either way it’s nice to see such a chip out, especially at the price it is being quoted for (though it seems some people are having fits that they can’t buy A64s for $100).
  • Anonymous User - Thursday, September 25, 2003 - link

    I think Intel is faring pretty well considering that AMD has reduced latency four fold with its integrated memory controller, incresed transistor performance by %30 with SOI, and doubled cache to 1MB. I think Intel will only close the gap with the upcomng Prescott but will pull ahead with LGA 775 Prescott and Grantsdale with PCI Express. Fanboys, save your speeches. Argue with logic.
  • Anonymous User - Wednesday, September 24, 2003 - link

    When is somebody going to come up with "folding" for people. We could use all the extra time people have on their hands debating what chip is better, to access their brain power to come up with cures for world hunger, A.I.D.S and introducing fanboys to fangirls. That being said, I appreciate all your opinions in helping me decide what chip to buy. Taking in to account the proccesing power I need for work and play, I have decided to buy an Xbox and a typewriter and forgo the 64 or P4EE.
  • Anonymous User - Wednesday, September 24, 2003 - link

    THIS FANBOY CRAP HAS TO STOP HOW NERDY CAN U BE??i am glad i am not so much into computers as most of u ;)...watch if one of these companies go out of business u see the survivor amd or intel making poor performing cpu's sold for $$$$ with a "take it of leave it" attitude...QUIT THE FANBOY CRAP truth is these companies don't give a shite about you only that little friend in your pocket that holds ur money
  • sprockkets - Wednesday, September 24, 2003 - link

    The PM people believe that since they see the current situation in that Intel pays everyone not to use AMD, and that makes them a niche market. It's not due to AMD being slower or more error prone. Let's face it, Intel is bigger and has more to deal with, but as I've said before, they also can waste millions, perhaps a billion or so on Itanium and it's going nowhere. Perhaps it will now, but it's pretty stupid to see why. Sure it doesn't suffer from x86 legacy code. But look at what it took to get there, redoing software, apps, hardware, and a huge 400mm die. The Alpha people look to turn it into something, but that's alpha that made it something, otherwise it sucks.

    It's pretty stupid to argue here that the P4 3.2 ghz is faster or the emergency (good one :) ) edition is, the Xenon or even Itanium architecture with the cpus sharing a FSB and memory via a hub or northbridge architecture sucks compared to the hyper transport architecture the Opteron uses, and no amount of clock speed or memory speed is going to change that.

    I wonder if Intel can now use it's own Itaniums instead of Alphas to run it's chip production line.
  • Anonymous User - Wednesday, September 24, 2003 - link

    #91, That would be an expected outcome when half the tests are media/encoding benchmarks which are optimized for HT/SSE2. Not that there is anything wrong with that, just a simple note.

Log in

Don't have an account? Sign up now