Benchmarks MySQL: 64 bit versus 32 bit

Sixteen registers, more than 4GB physical and virtual memory, using 64 bit software should have nothing but advantages. To see how much advantage that the new 64 bit binaries offer, we tested both the Xeon Irwindale and the Opteron on 32 bit MySQL and 64 bit MySQL, both version 4.0.18.

The Opteron was tested on the MSI board for these tests, contrary to previous tests where the Iwill board was used. The Intel CPU was running on the Intel board as in all previous tests.

Concurrency Dual Xeon (Irwindale)
3.6GHz with HT 64 bit
Dual Xeon (Irwindale)
3.6GHz with HT 32 bit
Dual Opteron 248 64 bit Dual Opteron 248 32 bit Dual Xeon (Irwindale)
3.6GHz 64 bit versus 32 bit
Dual Opteron 248
64 bit vs 32 bit
1 286 245 324 261 16% 24%
2 450 379 532 421 19% 26%
5 497 534 642 485 -7% 32%
10 517 563 691 509 -8% 36%
20 545 631 692 527 -14% 31%
35 506 616 670 514 -18% 30%
50 495 559 666 516 -11% 29%
AVG 512 580 672 510 -12% 32%
MAX 545 631 692 527 -14% 31%

This is really remarkable, as the Xeon does not benefit from 64 bit at all. Worse, a 10% performance penalty is paid for moving over to 64 bit. The Opteron, however, thrives on 64 bit and gets a 30% boost from 64 bit.

Now, it is possible that the 64 bit binary is simply very well optimized for the Opteron. The 64 bit compiler used by the MySQL engineers (obviously not the Intel compiler, gcc) might not have the necessary optimisations to get the best out of the Xeon architecture. That is probably the most important reason why the difference (+30% versus - 12%) is so big.

However, when we take a look at the numbers in DB2, you will notice that the Xeon runs about 2 to 3% slower, while the Opteron gains a 12% boost from 64 bit. IBM's 32 bit binaries make the Xeon run as fast as the best Opterons. Once we turn to 64 bit binaries, the Opteron gets the upper-hand again. So, there is more: for some reason, the Xeon is not too happy with 64 bit binaries. We can only speculate, but maybe (some) 64 bit calculations have to cycle twice through the ALU's of the Prescott/Nocona/Irwindale architecture.

The consequence is that a Xeon running a 32 bit application is quite a bit faster than the competition, but once you switch to 64 bit, the Xeon does not stand a chance against the Opteron.

Benchmarks MySQL: Single core versus Dual core

Some of you might already get nervous: where is the dual core Opteron? SUSE SLES 9 Linux was a little more stubborn. With the original SLES 9 kernel 2.6.5-97, the dual Opteron would just crash. We applied Service Pack 1 (2.6.5-157smp) and the new Opteron would boot and recognize the two cores, but the second CPU was disabled because of APIC IRQ problems.

Therefore, we were only able to run the Dual core Opteron on Gentoo with a 2.6.12 kernel. The Iwill board still had trouble running two cores. We run the tests on the MSI board. To give you an idea of how Gentoo and the new kernel compare to SUSE SLES 9 SP1, and IWill K8ES to MSI's K8Master2-FAR, we ran a few tests with SUSE on the MSI board too.

Concurrency Dual Core Opteron 875 - MSI -
Dual Opteron 248 - MSI -
Dual Opteron 248
Dual Opteron 248 -
Iwill - SUSE
Dualcore vs Dual CPU Gentoo vs SUSE Iwill vs MSI
1 288 270 324 264 7% -17% -19%
2 463 443 532 461 4% -17% -13%
5 583 558 642 591 5% -13% -8%
10 616 601 691 670 2% -13% -3%
20 648 610 692 683 6% -12% -1%
35 664 611 670 659 9% -9% -2%
50 628 579 666 662 8% -13% -1%
AVG 628 592 672 653 6% -12% -3%
MAX 664 611 692 683 9% -12% -1%

SUSE SLES 9 SP1 is quite a bit faster than a standard tuned Gentoo installation. Some of the improvements in kernel 2.6.12 might have traded performance in for more stability.

The second CPU on the MSI board does not have its own local memory, and has to access the RAM via the Hypertransport connection to the crossbar switch of the first CPU. Just like one dual core Opteron, the two CPUs have to share the bandwidth of one dual channel memory bus. Therefore, the comparison of one dual core Opteron and two single Opterons at the same clock speed is very interesting: it gives us some insight on how much performance is gained by letting the two cores talk over the System Request Queue instead of over the Hypertransport connection. How much does this design boost performance? Quite a bit, according to our benchmarks. This relatively simple design decision offers a 6% performance increase.

The Iwill board is a tiny bit slower than the MSI board, and that might raise some eyebrows. However, Vtune tells us that the Xeon Nocona (1 MB L2) needs to access the RAM memory only 2% of the time. Assuming that the Opteron with its 1 MB cache needs about the same, it is clear that memory bandwidth is not going to determine the results by much. Slightly more aggressive timings (and thus lower latency) or clock speeds might give MSI the edge. These tiny performance differences are not important, however.

Benchmarks MySQL: Hyperthreading?

What can hyperthreading do for MySQL performance?

Concurrency Dual Xeon (Irwindale)
3.6GHz with HT
Dual Xeon (Irwindale)
3.6GHz no HT
HT On vs HT off
1 286 287 0%
2 450 457 -2%
5 497 559 -11%
10 517 583 -11%
20 545 561 -3%
35 506 573 -12%
50 495 570 -13%
AVG 512 569 -10%
Max 545 583 -7%

Amazingly, Hyperthreading decreases performance by quite a bit. This leads to a rather weird conclusion. If you want maximum MySQL (Read) performance from your Xeon server, you have to disable Hyperthreading and run in 32 bit mode. The former is of course not dramatic. The latter might, in some cases, be a serious limitation.

Benchmarks MySQL InnoDB: Intel versus AMD

What if we change the MyISAM engine for the ACID compliant, row level locking InnoDB engine under the hood of MySQL? Surely that should make scaling better, as the MyISAM table locking mechanism is simple, but could be one of the reasons why it scales less in multi-CPU configurations. Let us take a look.

Concurrency Dual Xeon (Irwindale)
3.6GHz with HT
with InnoDB
Single Xeon (Irwindale)
3.6GHz with HT
with InnoDB
Dual Xeon (Irwindale)
3.6GHz without HT
with InnoDB
Dual Opteron 248
Dual Channel

With InnoDB
Single Opteron 248
Dual Channel

With InnoDB
1 207 191 210 216 192
2 283 201 303 312 223
5 324 219 334 396 259
10 319 204 360 397 242
20 301 199 330 357 236
35 281 193 308 353 221
50 274 181 298 333 209
AVG 300 199 326 366 233
MAX 324 219 360 397 259

The InnoDB engine is at about 60% of the speed of the MyISAM engine. Let us analyze these numbers in detail.

Concurrency Dual versus Single Xeon Dual versus Single Opteron Dual Opteron vs Dual Xeon HT on vs off
1 8% 13% 3% -2%
2 41% 40% 3% -6%
5 48% 53% 19% -3%
10 57% 64% 10% -11%
20 51% 51% 8% -9%
35 45% 60% 15% -9%
50 51% 59% 12% -8%
AVG 51% 57% 13% -8%

Yes, we only used the 2.2 GHz Opteron 248, due to time constraints. We tested with this CPU because we also tried to get some numbers on the Dual core Opteron 275 (also 2.2 GHz), but as you know, we could not get that CPU running at dual core in SUSE SLES 9 SP1. It is pretty clear that a 2.6 GHz Opteron 252 would bring in another 16% - 18%. So, even with a different engine, the Opteron keeps outperforming the Xeon with a significant margin. This margin can again be lowered by disabling Hyperthreading.

The Opteron scales a little better than the Xeon in this test. All in all, the InnoDB scales better than the MyISAM engine, but not spectacular: a second CPU offers a 50% - 57% boost instead of 40% - 41% one.

What happens if we use the Dual core Opteron 275? To make this work, we had to resort to the Gentoo distribution again, with the 2.6.12 kernel. All CPUs are running at 2.2 GHz.

Concurrency Dual Dual Core 875 Single Dual Core 875 Dual Opteron 248 Dual Dual core vs One Dual core Dual core vs Dual single
1 199 206 200 -3% 3%
2 308 305 293 1% 4%
5 397 368 338 8% 9%
10 401 379 345 6% 10%
20 400 359 308 11% 17%
35 388 342 305 14% 12%
50 361 322 290 12% 11%
AVG 389 354 317 10% 12%
MAX 401 379 345

InnoDB does not scale better with 4 cores than MyISAM. On the contrary, both Engines show very small performance benefits from more than 2 cores. Interestingly once again, the dual core CPU is quite a bit faster than our Dual CPU (single core) machine. A 10% bonus is nothing to sneeze at, especially when you consider that server boards with only one socket are quite a bit cheaper. It seems that one dual core Opteron is an ideal solution for a rather powerful MySQL database server.

Next, we test with an enterprise database solution: DB2 8.2.

Benchmarks Benchmarks (continued)


View All Comments

  • JohanAnandtech - Saturday, June 18, 2005 - link

    Mino, thanks for pointing that out. Query cache enabling has nothing to do with "stressful". It has to do with accelarting a few queries that are run over and over again. Which is very interesting for reducing the response time of a website serving up the last article, but which is not limited by CPU power at all.

  • JohanAnandtech - Saturday, June 18, 2005 - link

    To the people who make a fuss about disabling the query cache: this has nothing to with the Opteron not performing well in that situation. Single Xeon: 980 queries/s. Dual xeon: 985 queries/s Opteron 250: 1020 queries/s . Get it now why I say "other bottlenecks started to kick in"?

    It impossible that a dual xeon can't outperform a single one in these tests. We tried to find the bottleneck and even used a quad opteron 850 as client. The client was not the problem. My bet is on the network latency, but I have no knowledge of tools to profile the complete machine. The disk was not the problem, we tested that. Network bandwidth neither. My bet is on the network latency, or even the OS as the bottleneck kicked in a lot sooner w kernel 2.4
  • mino - Friday, June 17, 2005 - link

    #32 try to think for a moment
    "Because the Opteron can't perform that well in stressful situations you won't post the scores?"

    If the CPU is not the bottleneck in the query cache scenario then why test the effect of CPU at all !!!

    You reminded me friend of mine who "tested" effect the "FSB" has on A64 system NOT having an FSB at all !!! ;-)
    Funny guy indeed.

    And about an intel compiler not beeing used.
    Like it or not, It IS a fact that it is not widely adopted especially among the target audience of this site an article.

    BTW given the past experience intel compiler would produce better code even on AMD systems so don't be so sure! Best code for K7 is made by intelcc set to PIII config. Albeit it does not use 3DNow! functionality at all.
  • ElMoIsEviL - Friday, June 17, 2005 - link

    I think I have to agree with #20, as much as I am un-biased I feel this test was doctored by AMD... it ressembles the tests we see released by Apple often...

    "We didn't use the Intel compiler version as we have reason to believe that this version is not used a lot in the real world. We might try it out in a future article."

    Translation, "with the intel compiler AMD lost so being a marketing force for AMD we opted not to post those scores".

    and also as was mentioned before...
    ""The " query cache" was off, as we wanted to test worst case performance. In some cases, the query cache was able to push a single Xeon to 1000 queries per second, and the CPU was still capable of doing more, as the CPU load was at 50% - 70%."

    Why not?
    Because the Opteron can't perform that well in stressful situations you won't post the scores?

    Seriously.. this test is the biggest load of BS I have ever read... and I'm a current AMD adopter.
  • JohanAnandtech - Friday, June 17, 2005 - link

    Viditor, it is possible that the IOMMU might have to do something with it.

    The IOMMU is a memory mapping unit sitting between the I/O bus and physical memory.

    Memory mapping is AFAIK only necessary if a certain device (PCI devices come to mind) can not do a 64 bit DMA. Now it seems that almost everything inside the newest Intel southbridges can do 64 bit DMA.

    So the IOMMU can only play a role when the driver is a 32 bit only, and the memory mapping has to happen. Now I would think that Intel would have an advantage here with their ultra modern southbridges. There might be a device that I am overlooking of course. Maybe our SCSI controller... But I don't think so.
  • Viditor - Friday, June 17, 2005 - link

    Johan, if you're still reading (great article BTW)...
    A question I have had for quite awhile now is what effect the IOMMU has on these tests.
    The reasons I'm asking are
    1. I noticed that there was quite a disparity between the AMD and Intel 64bit performance (which you mentioned).
    2. I know that one difference between the 2 platforms is that AMD has a hardware IOMMU (of sorts) and Intel (at present) does not.
    3. I saw a thread last year with Linus T mentioning this quite a bit. He seemed to think that this would impair the EM64T substantially...

    Your thoughts?
  • JohanAnandtech - Friday, June 17, 2005 - link

    If your database is running many "identical databases".... I meant "queries"

  • JohanAnandtech - Friday, June 17, 2005 - link

    Juhl: It was 2.6.12rc5.

    Viditor: thanks for the helpful comment. Indeed, if you turn on the query cache, your CPU is doing very little.
    Everybody else: note the "identical" word in viditor's quote. If your database is running many identical databases, than you are not going to spend time reading this kind of article: you simply buy the cheapest decent server. Any CPU today can run 1000s of querries if everything comes out the query cache.

    Running benchmarks with the query cache on is simply not interesting. The query cache is all about accelerating the IDENTICAL queries that are run from time to time. You might reserve a bit of RAM to make sure that the most common queries (getting the latest article of a website for example) are run faster.

    But those numbers don't tell you anything about the load that your server is going to be able to take. You want worst case performance numbers!
  • Viditor - Friday, June 17, 2005 - link

    Questar - the reason the query cache was turned off (guessing here) is to more reasonably simulate a real-world test. Obviously in this test, the same queries are repeated quite often. But that is not usually the case in the real world...
    For those who don't know what the heck a "query cache" is:

    "the query cache stores the text of a SELECT query together with the corresponding result that was sent to the client. If the identical query is received later, the server retrieves the results from the query cache rather than parsing and executing the query again"
  • Questar - Friday, June 17, 2005 - link


    We don't know, it specifically says Xeon. We don't have any idea what happens on an Opteron.

Log in

Don't have an account? Sign up now