MySQL Results: Scaling

Back to our main subject, our astute readers have probably already noticed a weird anomaly. Let us analyze this further. If you look closely at both our measurements, Quad-core and Dual-core x86, you'll notice that the scaling is negative. To make it more clear, we made an average of all concurrency numbers from 5 and higher.

MySQL Linux (Queries/s)
Sun T1
4/8 cores 1 GHz
MSI K2-102A2M
Opteron 275
Xeon 5160
Woodcrest 3 GHz
MSI K2-102A2M
Opteron 280
Average Dual-core
(T1: quad-core)
362 749 996 805
Average Quad-core
(T1: octal-core)
433 590 904 622
Speedup Dual to Quad 20% -21% -9% -23%


This is nothing short of amazing. It seems like an anomaly, but this is not the case. These benchmarks have been checked, verified and checked again. They are accurate. The x86 cores running on Linux perform better with two cores than with four cores, but the T1 running Solaris actually improves performance going from 4 to 8 cores.

So who is guilty? Linux or the Opteron system? We had to test with Solaris on the Opteron to be sure. However, the Serverworks chipset of our MSI 1U server was not supported by x86 Solaris. So we went back to our homebuilt server, based on the MSI K8N Master2-FAR.

MySQL Solaris (Queries/s)
Sun T1 4/8 cores 1 GHz Opteron 280 Solaris Opteron 280 Linux
Average Dual-core
(T1: quad-core)
362 456 799
Average Quad-core
(T1: octal-core)
433 605 625
Speedup Dual to Quad 20% 33% -22%


And this puts the performance of our UltraSparc T1 in a whole different perspective. First of all, it is clear that while MySQL might not be the most scalable database, the current kernel of Linux is not helping matters. We did tweak the Linux kernel two ways: the 2.6.15 kernel was optimized for either Intel's or AMD's architecture and the AMD architecture also got NUMA support.

So what is going on here? After talking to our MySQL guru (P. Zaitsev), it turns out that in some circumstances, MySQL might cause trouble for the Linux mutex (mutual exclusion) implementation: "mutex ping-pong". The mutex implementation makes sure that two threads cannot access data in the main memory that is locked by another thread.

It seems however more a MySQL problem than a Linux one, as other databases like DB2 scale very well in Linux. For DB2 under the same load we noticed a performance increase of no less than 80-85% when going from two to four cores. Also, with some loads, the bad scaling kicks in later than our "Select dominated" load. Intel's performance labs told us that they also ran into the same problem.

These issues are not as severe as the problems we encountered with MySQL in Mac OSX. Note that Apple seems to have recognized the problem and seems to offer a workaround. We'll report back with other MySQL workloads to investigate the MySQL scaling problem further.

PostGreSQL Results

PostgreSQL 8.0.7, another open source database, uses processes and not threads to deal with connections. The consequence is that the benchmark numbers are a lot more stable: once each core is busy with it's process, you almost get maximum performance. In other words, the results didn't change much from 5, 10 or 25 concurrent users. To keep things simple, we only list the numbers with 20 users, which results in peak performance. The queries per second numbers at 5 and 25 were only a few percent lower. We did not include the T2000 Sun Server as the optimal PostGreSQL configuration is still under investigation.

PostgreSQL 8.0.7 (Queries/s)
DL385 1 x Opteron 280 517
Intel 2 x Xeon "Irwindale" 3.6 GHz 448
MSI 1U 1 x Opteron 275 490
MSI 1U 1 x Opteron 280 524
Intel 1 x Xeon 5160 WC 3 GHz 673


Another clear victory for Woodcrest. On the Opteron, every 10% in clockspeed increase seems to result in a 7% performance increase. So if we extrapolate, an Opteron 3 GHz would arrive at 616 queries per second.

MySQL Benchmarks Performance Analyses
POST A COMMENT

91 Comments

View All Comments

  • rayl - Thursday, June 08, 2006 - link

    "Best Performance/Watt in the high end "

    Which part of performance per watt do you not understand? Do more, pay less.
    Reply
  • MrKaz - Thursday, June 08, 2006 - link

    Dual Opteron 275 HE 2CPU's (275HE) - 4 GB RAM 192 Watts!!!
    Dual Opteron 275 2CPU's - 4 GB RAM 239 Watts!!!
    Dual Xeon 5160 3 GHz 2 CPU's - 4 GB RAM 245 Watts!!!

    http://www.intel.com/performance/server/xeon/ppw.h...">http://www.intel.com/performance/server/xeon/ppw.h...
    Even Intel numbers show Xeon 3.6Ghz on par with AMD (obvious fake)

    And the do more pay less, is not like you say on the server market, while your PC is doing lot of work (processing) with a computer game, most servers stand there doing almost nothing. Our servers for example from 0:00 to 8:00 do almost zero. Even in the day they work very little. Our Xeon 2.4 is more than enough, and I think most people think the same. Of course this depends a lot what you do, but this is generic. I think you know why virtualization is very important right?
    Reply
  • rayl - Thursday, June 08, 2006 - link

    Isn't this obvious to you. Those are power consumption numbers at 100% CPU load. This is where performance/watt number really matters.

    If you're running idle, the power saving mode starts kicking in, you'll need a separate table to draw your conclusion.

    Why this preoccupation with power consumption? 6-watts for a performance leap; it's moot.
    Reply
  • coldpower27 - Thursday, June 08, 2006 - link


    It will be interesting to note the Delta difference between 1 Woodcrest 5160 and 2 is 59W as reported by TechReport, and since the TDP for Woodcrest 5160 is 80W TDP we can extrapolate and since the TDP for Woodcrest 5148 is 40W I can expect it to spew about 30W per processor.

    245W - (2x29W) = 187W

    This bring the Low Power Woodcrest system to ~ the same power usage as the HE Opteron 275's even with the heat spewing FB-DIMM's with higher performance per watt, pretty impressive.
    Reply
  • Questar - Thursday, June 08, 2006 - link

    Yeah I'm worried about those six watts of power when I'm getting twice the performace. Reply
  • fikimiki - Thursday, June 08, 2006 - link

    You forgot about Intel chipset consumption - 22 Watts.
    So Intel has 245+22=267 vs. 192 and even if you are running in power-saving mode, chipset is running all the time...
    Reply
  • coldpower27 - Thursday, June 08, 2006 - link

    No Wrong, they measured the system power consumption hence why the Woodcrests systems are so hungry in comparison to the Opteron the FB-DIMM's are what eating away at the wattage.

    So in the end it's 223 + 22 = 245, if indeed the chipset is consuming 22W.
    Reply
  • Questar - Thursday, June 08, 2006 - link

    That was system power consumption - it included the chipset dufus. Reply
  • Saist - Wednesday, June 07, 2006 - link

    I amd going to make the argument that evaluating only one version of Linux in this type of situation is not a good idea in and of itself. Not to knock Gentoo directly, it is a fine distro to itself, but it has a very small slice of the Linux market. It would have made more sense for Anandtech to have benchmarked using other distrobution types for a couple of reasons.

    The first reason is the ability to duplicate the tests. This is actually a strike against Gentoo for what the operating system is. While it possible to duplicate an installation of Gentoo and the applications used, generating an exact copy of the exact configuration used without clear description of the compile targets used is very hard. This means that anybody wishing to reproduce these results on their own will be very hard-pressed to do so.

    The second reason is commercial and residential use. Gentoo has it's market, that market just isn't very widespread. It would have made more sense for Anandtech to have tested a RPM based distro such as Mandriva, RedHat, Fedora Core, Novell Suse, or OpenSuse against a .deb based distro such as Debian(sid), Ubuntu, Mepis, or Xandros. The reason why it would have made more sense is that .deb and .rpm distros are actually used in the commercial and residential spheres, and used in great quantities. Had Anandtech used a distrobution that is in active use it would mean more to buyers currently looking to replace their Windows computers with a new system.

    It would only be in the interests in providing a point of perspective that one would test a different type of Linux distrobution like Gentoo or Slackware.

    Going back to the first point, had Anandtech benchmarked these on a Debian based system it would be fairly easy to duplicate the tests. Anandtech would just need to list the base version of the Debian distro they used, list the apt-repositories they pulled from, and the application in apt that were pulled. Anybody else who comes along afterwords with a Debian based distro would easily be able to duplicate the steps and the benchmarks.

    The overall point is that while it is nice to see a non-dedicated Linux site approaching hardware, this isn't the way to approach it. As it stands now, the Anandtech tests are useless, reguardless of whatever results the benchmarks returned.
    Reply
  • BasMSI - Thursday, June 08, 2006 - link

    These tests are also 100% useless.....
    The MSI K2-102 is numa aware....
    But for some reason the K8N-Master isn't shown in the graphs....that board is NOT NUMA aware.
    I'm also missing the HP server everywhere in the graphs.

    I realy believe all these tests are done on the K8N-Master board for all Opteron tests.
    No way the graphs are showing all the systems.

    These tests are a total fraude, letting us believe Intel all of a sudden became that fast.
    No way on earth I believe any of these results.

    Also, why using Gentoo? Why not Debian 64bit?
    This puzzles me, as Gentoo is compiled but not known to be faster on every system.
    Why not using precompiled Linuxes? Like Debian 64bit....that one is stable as hell and incredible fast!
    Too much parameters missing here to get any judgement at all.
    Do it better, this is 100% rubbish.

    Bas.

    Reply

Log in

Don't have an account? Sign up now