MySQL 5.5.17 "Percona Server"

Many readers asked us why we only tested MySQL in a virtualized environment and not on "native" Linux. Indeed, it has been years since we tested MySQL "natively". The reason is simple: MySQL 5.1 and earlier versions scaled pretty badly beyond 4-8 cores, so there is no incentive to run them on modern dual socket servers. However, starting in December 2010, MySQL 5.5 has been available and it should feature much improved scalability. Even better, the people at Percona released their version of MySQL and the Innodb Storage Engine, Percona Server with XtraDB. This MySQL/Innodb combination is engineered for even better scalability.

To test this, we installed Percona Server 5.5.17-55 (Release 22.1, November 2011) on top of a Ubuntu 11.10 x86-64 linux with the 3.0.0-14 kernel. This kernel was the latest stable version at the time and is "Bulldozer/Interlagos aware".

We migrated the "Nieuws.be" database to MySQL to have a test similar to our SQL server test. That migration is not perfect as not all stored procedures were successfully converted, so you should not use the benchmark results below to compare SQL Server and MySQL. However, the profile of the test is the same: it is 99% complex selects that scan large parts of the database. The database is several tens of gigabytes instead of one.

MySQL Sysbench

The results are abysmal for the latest Interlagos Opteron. The best Xeon score is 84% better than the best Opteron score. The results indicate what went wrong: the 8 thread Opteron 6220 at 3GHz scores better than the 16 thread Opteron 6276 at 2.3GHz. A clockspeed advantage of 30% has prevailed over twice as many threads. So we can suspect that the scaling problems are not gone, at least in this test.

Let us take a closer look by performing the same test on a different number of threads and cores. The BIOS of the SuperMicro H8DGU-F allowed us to disable the second integer unit or one or more modules of the new Opterons. (Disabling both at the same time was not possible.) The Asus Z8PS-D12-1U was more flexible: we could disable Hyper-Threading and/or several cores of the Xeon. Here are the scaling results.

MySQL

First, we focus on the results with few cores and threads. Two Bulldozer Modules are capable of slightly outperforming four cores of the Opteron Magny-Cours. The ideas behind Bulldozer are sound: two modules are smaller (157 mm²) and more power efficient than four K10 cores (231 mm²). At the same time they perform equal to the Xeon X5650—which is clocked higher—with the same amount of threads. At eight threads this is still the case, and the gap between the newer and older Opteron widens in favor of the former.

Beyond eight threads, the new Opteron starts to scale badly. Doubling the number of modules to eight delivers a very small 5% performance advantage. Double the number of modules again and you end up with negative scaling. To make matters worse, the Xeon doesn't have this problem. From eight to 16 threads we get a 76% performance boost. The end result is that a quad-core Xeon beats the best Opteron by a large margin. Let us investigate the matter further.

MS SQL Server 2008 Power Analysis MySQL OLAP Analyzed
Comments Locked

46 Comments

View All Comments

  • Scali - Friday, February 10, 2012 - link

    No, because if you read the ENTIRE benchmark configuration page, you'd see that all the AMD systems had 2 CPUs as well.
  • Scali - Saturday, February 11, 2012 - link

    Oh, and while we're at it... the Intel system had only 48 GB of 1333 memory, where the AMDs had 64 GB of 1600 memory.
    (Yes, Bulldozer is THAT bad)
  • PixyMisa - Saturday, February 11, 2012 - link

    Or rather, MySQL scales that poorly.

    What we can tell from this article is that if you want to run a single instance of MySQL as fast as possible and don't want to get involved with subtle performance tuning options, the Opteron 6276 is not the way to go.

    For other workloads, the result can be very different.
  • JohanAnandtech - Saturday, February 11, 2012 - link

    Feel free to send me a suggestion on how to setup another workload. We know how to tune MySQL. So far none of these settings helped. The issue discussed (spinlocks) can not be easily solved.
  • Scali - Saturday, February 11, 2012 - link

    I'm not sure if you bothered to read the entire article, because MySQL was not the only database that was tested.
    There were also various tests with MS SQL, and again, Interlagos failed to impress compared to both Magny Cours-based Opterons and the Xeon system.
  • JohanAnandtech - Saturday, February 11, 2012 - link

    The clockspeed of the RAM has a small impact here. 64 vs 48 GB does not matter.
  • Scali - Saturday, February 11, 2012 - link

    Not saying it does... Just pointing out that the AMD system had more impressive specs on paper, yet failed to deliver the performance.
  • JohanAnandtech - Saturday, February 11, 2012 - link

    Again, it is not CMT that makes AMD's transistor count explode but the combination of 2x L3 caches and 4x 2M L2-caches. You can argue that AMD made poor choices concerning caches, but again it is not CMT that made the transistor count grow.

    I am not arguing that AMD's performance/billion transistors is great.
  • Scali - Saturday, February 11, 2012 - link

    I think you are looking at it from the wrong direction.
    You are trying to compare SMT and CMT, but contrary to what AMD wants to make everyone believe, they are not very similar technologies.
    You see, SMT enables two threads to run on one physical core, without adding any kind of execution units, cache or anything. It is little more than some extra logic so that the OoOE buffers can handle two thread contexts at the same time, rather than one.

    So the thing with SMT is that it REDUCES the transistorcount required for running two threads. By nearly 100%.
    CMT on the other hand does not reduce the transistorcount nearly as much. So if you are merely looking at an 'exposion of transistor count', you are missing the point of what SMT really does.

    Other than that, your argument is still flawed. Even an 8-thread Bulldozer has a higher transistor count than the 12-thread Xeon here. It's not just cache. CMT just doesn't pack as many threads per transistor as SMT does... and to make matters worse, CMT also has a negative impact on single-threaded performance (which again, if you are looking at it from the wrong direction, may look like better scaling in threadcount... but effectively, both with low and high threadcounts, the Xeon is the better option... and this is just a midrange Xeon compared to a high-end Interlagos. The Xeon can scale to higher clockspeeds, improving both single-threaded and multithreaded performance for the same transistorcount).

    So what your article says is basically this:
    CMT, which is nearly the same as having full cores, especially in integer-only tasks such as databases, since you have two actual integer cores, has nearly the same scaling in threadcount as conventional multicore CPUs.
    Which has a very high 'duh'-factor, since it pretty much *is* conventional multicore.
    It does not reduce transistorcount, nor does it improve performance, so what's the point?
  • JohanAnandtech - Friday, February 10, 2012 - link

    Semantics :-). I can call it a core with CMT, or a module with 2 cores. Both are valid.

Log in

Don't have an account? Sign up now