MySQL 5.6.0

Last time we made a small error in our script, causing the Sysbench test to write to our SSDs anyway. That did not make the test invalid, but as we really want to isolate the CPU performance. However due to these changes, you cannot compare this with any similar Sysbench based benchmarking we have done before.

The Intel servers were running Percona Server 5.6 (the best-optimized MySQL server for x86), the ThunderX system was running a special ThunderX optimized version of MySQL 5.6. We used sysbench 0.5 (instead of 0.4) and we implemented the (lua) scripts that allow us to use multiple tables (8 in our case) instead of the default one. According to Cavium, there is still a lot headroom to improve MySQL performance. A ThunderX optimized version of Percona Server 5.7 should improved performance quite a bit.

For our testing we used the read-only OLTP benchmark, which is slightly less realistic, but a good first indication for MySQL Select performance.

MySQL Sysbench Read-only

A single ThunderX core is capable of 270 transactions/s and scales well: with 32 threads and one thread per core we still get about 8000 tr/s (or 250 tr/s/core). But beyond that point, scaling is much more worse: add another 16 cores and we only get 17% more performance.

MySQL Sysbench Read-only: 95th percentile response time

But when we look at the response times, things look a lot less rosy. The ThunderX is a lot slower when handling the more heavy SQL statements.

It is clear that the ThunderX is no match for high frequency trading and other database intensive applications. However, when MySQL serves as just a backend for a website and satisfies simple "get data x or y" requests, the 4 extra ms are a small nuisance.

Compression & Decompression Java Performance
Comments Locked

82 Comments

View All Comments

  • vivs26 - Wednesday, June 15, 2016 - link

    Not necessarily - (read Amdahl's law of diminishing returns). The performance actually depends on the workload. Having a million cores guarantees nothing in terms of performance unless the workload is parallelizable which in the real world is not as much as we think it could be. I'm curious to see how xeon merged with altera programmable fabric performs than ARM on a server.
  • maxxbot - Wednesday, June 22, 2016 - link

    Technically true but every generation that millstone gets a little smaller, the die area and power needed to translate x86 into uops isn't huge and reduces every generation.
  • jardows2 - Wednesday, June 15, 2016 - link

    Interesting. Faster in a few workloads where heavy use of multi-thread is important, but significantly slower in more single thread workloads. For server use, you don't always want parallelized tasks. The results are pretty much across the board for all the processors tested: If the ThunderX was slower, it was slower than all the Intel chips. If it were faster, it was faster than all but the highest end Intel Chips. With the price only being slightly lower than the cheapest Intel chip being sold, I don't think this is going to be a Xeon competitor at all, but will take a few niche applications where it can do better.

    With no significant energy savings, we should be looking forward to the ThunderX2 to see if it will bring this into a better alternative.
  • ddriver - Wednesday, June 15, 2016 - link

    There is hardly a server workload where you don't get better throughput by throwing more cores and servers at it. Servers are NOT about parallelized task, but about concurrent tasks. That's why while desktops are still stuck at 8 cores, server chips come with 20 and more... Server workloads are usually very simple, it is just that there is a lot of them. They are so simple and take so little time it literally makes no sense parallelizing them.
  • jardows2 - Wednesday, June 15, 2016 - link

    In the scenario you described, the single-thread performance takes on even more importance, thus highlighting the advantage the Xeon's currently have in most server configurations.
  • niva - Wednesday, June 15, 2016 - link

    Not if the Xeon doesn't have enough cores to actually process 40+ singlethreaded tasks con-currently.
  • hechacker1 - Wednesday, June 15, 2016 - link

    But kernels and VMWare know how to schedule multiple threads on 1 core if it's not being fully utilized. Single threaded IPC can make up for not having as many cores. See the iPhone SoCs for another example.
  • ddriver - Wednesday, June 15, 2016 - link

    Not if you have thousands of concurrent workloads and only like 8 cores. As fast as each core might be, the overhead from workload context switching will eat it up.
  • willis936 - Thursday, June 16, 2016 - link

    Yeah if each task is not significantly longer than a context switch. Context switches are very fast, especially with processors with many sets of SMT registers per core.
  • ddriver - Thursday, June 16, 2016 - link

    If what you suggest is correct, then intel would not be investing chip TDP in more cores but higher clocks and better single threaded performance. Clearly this is not the case, as they are pushing 20 cores at the fairly modest 2.4 Ghz.

Log in

Don't have an account? Sign up now