MySQL OLTP

Currently we don't have a good transactional (OLTP) benchmark that works with our vApus stress test client, so we went back to the MySQL Sysbench utility. Sysbench allows us to place an OLTP load on a MySQL test database, and you can chose the regular test or the read only test. We chose read only as even with several SSDs our benchmark remained disk I/O limited. Our current Promise JBOD does not work with SATA SSDs, so we can only use the three remaining SATA interfaces in our supermicro server.

The read only setting makes the test less real world, but a Sysbench test is rather synthetic anyway. The main reason why we tested with Sysbench is to get a huge amount of queries that only select very small parts (a few or one row) of the tables, so we can see how our platforms behave in this kind of scenario. And the results were very different from our OLAP benchmarks.

Since we could not use the capabilities of our vApus client, we were not able to perform an in-depth analysis like we did on the MS SQL Server tests. Yes, Sysbench allows you to test with any number of threads you like, but there is no "think time" feature. That means all queries fire off as quickly as possible, so you cannot simulate "light" and "medium" loads.

The response times are very small, which is typical for an OLTP test. To take them into account, we are showing you the highest throughput at around 3 ms (2.8 ms to 3.3 ms). We tested with 1 million records, but 10 million records gave very similar results.

MySQL Sysbench Read Only

The Intel X5650 gets a 30% boost from SMT, which is more or less equal to adding two extra cores (compare Xeon X5650, which is a hex-core, and the E5640 quad-core). This shows that this benchmark scales well over more cores, threads, or clusters.

The second integer cluster inside the new Opteron offers 40% more performance. So once again, CMT does the job. The Opteron 6276 does well but does not really break away from the pack. For example, if we take the small clockspeed advantage of the 6276 into account, the new Opteron is hardly faster than its predecessor.

How about power? We didn't test all configurations but the Xeon X5650, Opteron 6174 and Opteron 6176 are in the same league. The huge increase in TPC-c performance that AMD touts is for a significant part the result of using better SSDs, but we estimate that the new chip is about 20-30% faster that the previous one. The new Opteron appears very capable when it comes to OLTP.

MySQL OLAP Analyzed SAP S&D Benchmark
Comments Locked

46 Comments

View All Comments

  • sonofgodfrey - Thursday, February 9, 2012 - link

    Have you explicitly tested one socket vs. two sockets? We've found an immense increase in contention once a cache-line has to be shared between sockets on some systems.
  • JohanAnandtech - Friday, February 10, 2012 - link

    That is one suggestion I will try out next week. Thanks!
  • Klimax - Thursday, February 9, 2012 - link

    Hello.

    Nice tests.

    However I would like to see MySQL tested on Windows Server 2008 R2
    Would be interesting comparsion.

    (Especially due to http://channel9.msdn.com/shows/Going+Deep/Arun-Kis... )
  • Klimax - Thursday, February 9, 2012 - link

    Title of post is wrong... (I have deleted second thing and forgot to fix title)
  • Scali - Thursday, February 9, 2012 - link

    Unless I'm mistaken, the Xeon 5650 is a 1.17B transistor chip, where the Interlagos 6276 is a 2.4B transistor chip.
    In that light, doesn't that make Intel's SMT implementation a lot better than CMT?
    I mean, yes CMT may give more of a performance boost when you increase the threadcount. But considering the fact that AMD spends more than twice the number of transistors on the chip... well, that's pretty obvious.
    AMD might as well just have used conventional cores.
    The true strength of SMT is not so much that it improves performance in multithreaded scenarios, but that it does so at virtually no extra cost in terms of transistors (and with little or no impact on the single-threaded performance either).
  • JohanAnandtech - Friday, February 10, 2012 - link

    Interlagos is 1.2 billion chip (maybe 1.3 but anyway). Most of those transistors are spend on the L3 cache: about 0.5 billion. Only 213 million transistors are in a module and each module contains a 2 MB L2-cache, probably good for 120 million transistors. That leaves 90 million transistors to the core, and it has been stated that the second cluster added 12%. So that second cluster costs about 12 million transistors, or 48 million on the total 4 module die. That is less than 5% of the total transistor count but you get a 30-90% performance boost!

    So for AMD, this was clearly a great choice.

    SMT is perfect for Intel, as the Intel architecture puts all instructions in one big ROB.

    For very low IPC serverworkloads, I think the CMT approach gives better results. Unfortunately AMD lowered some of the CMT benefits by keeping the datacache so small and the low associativity of the Icache.
  • Scali - Friday, February 10, 2012 - link

    Uhhh, I think you're wrong here... the 4-module Bulldozer is a 1.2B chip (Zambezi). But you tested the 8-module Interlagos (16 threads), which is TWO Zambezi dies in one package.
    Hence 2*1.2 = 2.4B transistors.
  • JohanAnandtech - Friday, February 10, 2012 - link

    Ok, it is two chips of 1.2 billion. That doesn't change anything about our analyses of CMT.
  • Scali - Friday, February 10, 2012 - link

    Not in the article, because you did not factor in transistor count (which is the flaw I tried to point out in the first place... comparing two chips, where once is twice the transistor count of the other, is quite the apples-to-oranges comparison. One would expect a chip with twice the transistorcount to be considerably better in multithreading scenarios, not 'catching up' to the smaller chip).

    But in your above post, I think it changes everything about your analysis. All your figures have to be done times two.
    Which makes it a very poor comparison, not only to Intel, but also to AMD's own previous line of CPUs.
    The 6174 Magny Cours is actually beating Interlagos, with 'only' 12 threads, no kind of CMT/SMT, and 'only' 1.8B transistors.

    How does that make CMT look like a great choice for AMD?
  • slycer.tech - Friday, February 10, 2012 - link

    What i read on benchmark configuration page, Anand used 2x Intel Xeon X5650. So 2x 1.17B = 2.34B. I think it is comparable to AMD CPU used in this test. Am I right?

Log in

Don't have an account? Sign up now