Benchmarks MySQL 4.0.18: Intel versus AMD

A Linux database server report would not be complete without the open source database MySQL. Many of our readers requested that we test with both MyISAM (default storage engine in MySQL 3.x) and InnoDB (default storage engine in MySQL 4.x), so we performed many more tests than last time.

It must be said that the MySQL results had a large margin of error (3% - 4%) compared to DB2, especially at high levels of concurrency.

Here is our MySQL configuration:

           Read_buffer=2GB
           Port=3306
           socket = /var/lib/mysql/mysql.sock
           skip-locking
           set-variable = max_user_connections= 2000
           set-variable = max_connections= 2000
           key_buffer=2G
           Read_buffer=2G
           table_cache=1024
           tmp_table=128M
           max_heap_table=256M
           read_rnd_buffer = 64M
           thread_cache=16
           net_buffer_length=16k

The " query cache" was off, as we wanted to test worst case performance. In some cases, the query cache was able to push a single Xeon to 1000 queries per second, and the CPU was still capable of doing more, as the CPU load was at 50% - 70%. At 1000 queries/s and more, other bottlenecks started to kick in, such as the latency of the network driver, the operating system and so on.

All numbers are expressed in queries per second. All concurrency tests below 5 are not reliable enough to make any firm conclusion as the margin of error is much higher.

Concurrency Dual Xeon (Gallatin)
with L3 cache
Single Xeon (Gallatin)
with L3 cache
Dual Xeon (Nocona)
with HT
Single Xeon (Nocona)
with HT
Dual Xeon (Irwindale)
3.6GHz with HT
Dual Core Intel
3.2GHz
Dual Opteron 250
2.4Ghz
Single Opteron 250
2.4GHz
Single Opteron 252
2.6 GHz
1 243 248 280 277 286 233 290 298 319
2 357 317 423 338 450 344 438 370 399
5 466 356 473 358 497 442 543 435 470
10 505 361 521 375 517 487 629 465 502
20 496 350 531 371 545 507 670 455 498
35 508 355 555 371 506 490 665 470 507
50 497 348 526 368 495 502 669 472 508
                   
AVG 494 354 521 368 512 486 635 460 497
MAX 508 361 555 375 545 507 670 472 508

Those were the raw numbers. Let us now analyse this...

Concurrency Dual versus Single Xeon Galatin Dual versus Single Xeon Nocona/ Irwindale Dual Opteron 250 vs Single
1 -2% 1% -3%
2 12% 25% 18%
5 31% 32% 25%
10 40% 39% 35%
20 42% 43% 47%
35 43% 50% 41%
50 43% 43% 42%
       
AVG 40% 41% 38%

MySQL ISAM is an incredibly fast database engine in our benchmark situation: it handles the same workload about twice as fast as DB2. I have to emphasize "our benchmark situation" because we cannot forget that our workload is mainly about reading the database and not writing. And of course, it must be said that the MySQL ISAM engine does less work on each query than DB2; it does not support transaction-safe (ACID compliant) commit, rollback, and crash recovery capabilities.

MySQL, as we have also noticed 6 months ago, doesn't seem to scale as well as DB2. At best, you get a 40% - 45% performance increase when the concurrency level is high enough. When we move to quad CPUs, we only get a 20% - 30% increase while DB2 still offers a 70% increase. The better scaling of DB2 means that with enough CPUs, it runs almost as fast as the MySQL ISAM engine, and offers all the transaction-safe capabilities as a bonus.

Let us check if the architectural differences between the CPUs make a difference . Again, don't pay too much attention to the results of the lower concurrency levels.

Concurrency Dual Xeon Irwindale versus Nocona (3,6 GHz) Xeon Nocona (3,6 GHz) vs Galatin (3,06) Opteron 2.6 vs Nocona 3.6 Opteron 2.6 vs Pentium-D Xeon Nocona 3,6 GHz vs Pentium-D
1 2% 12% 15% 37% 19%
2 6% 7% 18% 16% -2%
5 5% 1% 31% 6% -19%
10 -1% 4% 34% 3% -23%
20 3% 6% 34% -2% -27%
35 -9% 5% 37% 4% -24%
50 -6% 6% 38% 1% -27%
           
AVG -2% 4% 35% 2% -24%
MAX -2% 4% 36% 0% -26%

The bigger L2-cache of the Xeon Irwindale did nothing more than compensate for the slightly higher latency of the L2-cache. The Xeon Irwindale and Nocona perform alike.

MySQL, unless you get the special Intel Compiler optimized version, remains the stronghold of the Opteron. The fastest (single core) Opteron outperforms the best Intel CPU by a 35% margin. We didn't use the Intel compiler version as we have reason to believe that this version is not used a lot in the real world. We might try it out in a future article.

The relatively limited scaling also means that high clocked single CPUs can be an interesting option. This is illustrated by the Opteron 252 2.6 GHz, which outperforms the dual core Pentium-D 3.2 GHz by a small margin.

Benchmark Configuration Benchmarks (continued)
POST A COMMENT

45 Comments

View All Comments

  • JohanAnandtech - Saturday, June 18, 2005 - link

    Mino, thanks for pointing that out. Query cache enabling has nothing to do with "stressful". It has to do with accelarting a few queries that are run over and over again. Which is very interesting for reducing the response time of a website serving up the last article, but which is not limited by CPU power at all.



    Reply
  • JohanAnandtech - Saturday, June 18, 2005 - link

    To the people who make a fuss about disabling the query cache: this has nothing to with the Opteron not performing well in that situation. Single Xeon: 980 queries/s. Dual xeon: 985 queries/s Opteron 250: 1020 queries/s . Get it now why I say "other bottlenecks started to kick in"?

    It impossible that a dual xeon can't outperform a single one in these tests. We tried to find the bottleneck and even used a quad opteron 850 as client. The client was not the problem. My bet is on the network latency, but I have no knowledge of tools to profile the complete machine. The disk was not the problem, we tested that. Network bandwidth neither. My bet is on the network latency, or even the OS as the bottleneck kicked in a lot sooner w kernel 2.4
    Reply
  • mino - Friday, June 17, 2005 - link

    #32 try to think for a moment
    "Because the Opteron can't perform that well in stressful situations you won't post the scores?"

    If the CPU is not the bottleneck in the query cache scenario then why test the effect of CPU at all !!!

    You reminded me friend of mine who "tested" effect the "FSB" has on A64 system NOT having an FSB at all !!! ;-)
    Funny guy indeed.

    And about an intel compiler not beeing used.
    Like it or not, It IS a fact that it is not widely adopted especially among the target audience of this site an article.

    BTW given the past experience intel compiler would produce better code even on AMD systems so don't be so sure! Best code for K7 is made by intelcc set to PIII config. Albeit it does not use 3DNow! functionality at all.
    Reply
  • ElMoIsEviL - Friday, June 17, 2005 - link

    I think I have to agree with #20, as much as I am un-biased I feel this test was doctored by AMD... it ressembles the tests we see released by Apple often...

    "We didn't use the Intel compiler version as we have reason to believe that this version is not used a lot in the real world. We might try it out in a future article."

    Translation, "with the intel compiler AMD lost so being a marketing force for AMD we opted not to post those scores".


    and also as was mentioned before...
    ""The " query cache" was off, as we wanted to test worst case performance. In some cases, the query cache was able to push a single Xeon to 1000 queries per second, and the CPU was still capable of doing more, as the CPU load was at 50% - 70%."

    Why not?
    Because the Opteron can't perform that well in stressful situations you won't post the scores?

    Seriously.. this test is the biggest load of BS I have ever read... and I'm a current AMD adopter.
    Reply
  • JohanAnandtech - Friday, June 17, 2005 - link

    Viditor, it is possible that the IOMMU might have to do something with it.

    The IOMMU is a memory mapping unit sitting between the I/O bus and physical memory.

    Memory mapping is AFAIK only necessary if a certain device (PCI devices come to mind) can not do a 64 bit DMA. Now it seems that almost everything inside the newest Intel southbridges can do 64 bit DMA.

    So the IOMMU can only play a role when the driver is a 32 bit only, and the memory mapping has to happen. Now I would think that Intel would have an advantage here with their ultra modern southbridges. There might be a device that I am overlooking of course. Maybe our SCSI controller... But I don't think so.
    Reply
  • Viditor - Friday, June 17, 2005 - link

    Johan, if you're still reading (great article BTW)...
    A question I have had for quite awhile now is what effect the IOMMU has on these tests.
    The reasons I'm asking are
    1. I noticed that there was quite a disparity between the AMD and Intel 64bit performance (which you mentioned).
    2. I know that one difference between the 2 platforms is that AMD has a hardware IOMMU (of sorts) and Intel (at present) does not.
    3. I saw a thread last year with Linus T mentioning this quite a bit. He seemed to think that this would impair the EM64T substantially...

    Your thoughts?
    Reply
  • JohanAnandtech - Friday, June 17, 2005 - link

    If your database is running many "identical databases".... I meant "queries"

    Reply
  • JohanAnandtech - Friday, June 17, 2005 - link

    Juhl: It was 2.6.12rc5.

    Viditor: thanks for the helpful comment. Indeed, if you turn on the query cache, your CPU is doing very little.
    Everybody else: note the "identical" word in viditor's quote. If your database is running many identical databases, than you are not going to spend time reading this kind of article: you simply buy the cheapest decent server. Any CPU today can run 1000s of querries if everything comes out the query cache.

    Running benchmarks with the query cache on is simply not interesting. The query cache is all about accelerating the IDENTICAL queries that are run from time to time. You might reserve a bit of RAM to make sure that the most common queries (getting the latest article of a website for example) are run faster.

    But those numbers don't tell you anything about the load that your server is going to be able to take. You want worst case performance numbers!
    Reply
  • Viditor - Friday, June 17, 2005 - link

    Questar - the reason the query cache was turned off (guessing here) is to more reasonably simulate a real-world test. Obviously in this test, the same queries are repeated quite often. But that is not usually the case in the real world...
    For those who don't know what the heck a "query cache" is:

    "the query cache stores the text of a SELECT query together with the corresponding result that was sent to the client. If the identical query is received later, the server retrieves the results from the query cache rather than parsing and executing the query again"
    Reply
  • Questar - Friday, June 17, 2005 - link

    #23,

    We don't know, it specifically says Xeon. We don't have any idea what happens on an Opteron.
    Reply

Log in

Don't have an account? Sign up now