« November 2015 | Main | September 2015 »

Monday, 26 October, 2015

MySQL Performance: Yes, we can do more than 1.6M QPS (SQL) on MySQL 5.7 GA

It was exactly 2 years now since we reached 500K QPS with MySQL 5.7 -- it was a great milestone, and the highest ever result seen on MySQL with true SQL queries ;-)) And this was on 32cores-HT machine.. (the linked article contains the whole long story how we arrived to 250K QPS first, then 275K, 350K, 440K, and finally 500K QPS) -- the main improvement here is coming from greatly redesigned transactions and transaction list management in MySQL 5.7.

However, when the 40cores-HT server arrived to MySQL LAB we were not yet ready to show more than 500K QPS -- there were still some other contentions on external to InnoDB level.. Once these all were fixed, we finally reached 645K QPS on 40cores-HT server and pushed the limit more far yet ;-) -- this is what we were able to announce a year ago, during OOW2014.

Since then, an important work was made in MySQL 5.7 code to resolve all known contentions on the "server" side to let InnoDB run on a full speed without limits.. The work was done, but no one from our LAB servers was able to show the potential benefit of all these changes.. - in fact our servers became too old over a time, and it was no more clear what is the real limit of 5.7 and how far we can go today..

So, we started to look for our HW upgrade, and Intel new CPU chips were announced exactly on time (see http://wccftech.com/intel-unleashes-haswell-ex-xeon-e7-v3-processors-18-cores-45-mb-l3-cache-12-tb-ddr4-memory-support-57-billion-transistors/ -- sounds really good "57-billion-transistors" for MySQL 5.7 ;-)) -- and finally in collaboration with Intel we were able to get our hands on our new server with these new CPUs pretty quickly ;-)

Note :

  • the server we used is exactly the same as we have with 32cores-HT and 40cores-HT
  • same 4CPU sockets, etc.
  • just that per CPU socket we got 18cores-HT now (instead of 8 or 10 before)
  • and a slightly higher CPU frequency - 2.5Ghz instead of previous 2.3Ghz
  • however, the difference is not only in the number of cores and CPU frequency ;-)
  • the chip itself is really much more powerful !..
  • NOTE: this CPU chips are already widely comercially available and from a long time ready for production
  • (e.g. I'm not telling you here something new ;-))

Used HW config :
 Intel(R) Xeon(R) CPU E7-8890 v3 (codename Haswell-EX)
 4 CPU sockets x 18 cores-HT (144 CPU threads)
 2.5 Ghz, 512GB RAM
 Linux kernel 3.16


well, a long story short -- here is the QPS graph with the results I've obtained on this machine :



So, yes, our new record is over 1.6M QPS of true SQL queries !!!!! ;-))
(same OLTP_RO Point-Selects as before, same conditions, etc.)

And this is really killing! ;-))

NOTE: from profiler stats I've monitored this is still NOT a limit !!! we're scaling on such kind of workloads perfectly well! -- the limit is just a CPU time..

NOTE: this is near x2.5 times better -vs- 40cores-HT the best 5.7 results, so it's pretty amazing to see such a huge progress on Intel chips (40cores is 2.3Ghz, 72cores is 2.5Ghz, but it's not just frequency and amount of CPU cores)...

NOTE: all other RO results are also mostly x2.5 times better ;-))

An this was yet in August 2015..

Let's see now how well can scale today other vendors on this server and the same workload.

Sysbench RO Point-Selects 1M x 8-tables 72cores-HT :

Observations :
  • MySQL 5.7 is simply x3 times better than any other..
  • no comments ;-)


Now, what about scalability? (each Engine is now running only on 1, then 2, then all 4CPU sockets)

MySQL 5.7 Scalability @Sysbench RO Point-Selects 1M x 8-tables 72cores-HT :


MySQL 5.5 Scalability @Sysbench RO Point-Selects 1M x 8-tables 72cores-HT :


MySQL 5.6 Scalability @Sysbench RO Point-Selects 1M x 8-tables 72cores-HT :

Percona Server 5.6 Scalability @Sysbench RO Point-Selects 1M x 8-tables 72cores-HT :


MariaDB 10.1 Scalability @Sysbench RO Point-Selects 1M x 8-tables 72cores-HT :

SUMMARY :
  • MySQL 5.7 is showing a near perfect scalability up to 72cores-HT on 4CPU sockets
  • no one of other engines not scaling even up to 2CPU sockets (32cores-HT)
  • interesting that MySQL 5.6 and Percona 5.6 are able to out-pass 500K QPS here
  • not MariaDB 10.1


Well, now what if you have a single table hot in your workload? -- this was from a long time a very painful problem in MySQL, which is finally completely resolved with MySQL 5.7 :
Sysbench RO Point-Selects 10M x 1-table 72cores-HT :
Observations :
  • near the same performance on MySQL 5.7 on one 10M table as on 8x 1M tables
  • cannot say the same about others: something like 20% regression everywhere..
  • NOTE: x8 Sysbench processes are running here in parallel to get a rid of Sysbench's own contentions.

And scalability?

MySQL 5.7 Scalability @Sysbench RO Point-Selects 10M x 1-table 72cores-HT :

MySQL 5.5 Scalability @Sysbench RO Point-Selects 10M x 1-table 72cores-HT :

MySQL 5.6 Scalability @Sysbench RO Point-Selects 10M x 1-table 72cores-HT :

Percona Server 5.6 Scalability @Sysbench RO Point-Selects 10M x 1-table 72cores-HT :


MariaDB 10.1 Scalability @Sysbench RO Point-Selects 10M x 1-table 72cores-HT :

No comments ;-)


Now, let's get a look on impact of this new Intel CPU..

MySQL 5.7 Scalability @Sysbench RO Point-Selects 1M x 8-tables :

Observations :
  • it's amazing to see how the gap is big...
  • NOTE: even on 8 concurrent users the result is near x2 times better than on any other server..
  • killing.. ;-)


MySQL 5.7 Scalability @Sysbench RO Point-Selects 10M x 1-table :

Same here..

Well, it's really amazing to see here that on the new CPU with 2 sockets (36cores-HT) I'm way faster than on 40cores-HT (4 sockets) before.. -- seems to me we should be prepared to see a lot of changes in MySQL Performance with these CPU chips..

Few more details about the presented test :
  • I've used exactly the same config params as mentioned 2 years ago (see: http://dimitrik.free.fr/blog/archives/2013/10/mysql-performance-the-road-to-500k-qps-with-mysql-57.html)
  • each engine as before was tested with tuned spin wait delay= 6/ 96 (helps a lot to lower InnoDB contentions here)
  • the best obtained results is used then for each load level
  • exactly the same Sysbench binaries as before, same scripts, etc..
  • just that the MySQL 5.7 version is the latest one ;-)
  • NOTE: for my curiosity I've also tested older 5.7 binaries as well.. -- no one was able to reach 1.6M QPS, and some even not reached 1M.. -- this is just to make it clear: we're reaching 1.6M QPS not just because the HW is better ;-))
  • MySQL 5.7 is really scaling much more far here (and you need the latest version)

my.conf:
 max_connections=4000
 key_buffer_size=200M
 low_priority_updates=1
 table_open_cache = 8000
 back_log=1500
 query_cache_type=0
 table_open_cache_instances=16

# files
 innodb_file_per_table
 innodb_log_file_size=1024M
 innodb_log_files_in_group = 3
 innodb_open_files=4000

# buffers
 innodb_buffer_pool_size=32000M
 innodb_buffer_pool_instances=32
 innodb_additional_mem_pool_size=20M
 innodb_log_buffer_size=64M
 join_buffer_size=32K
 sort_buffer_size=32K

# innodb
 innodb_checksums=0
 innodb_doublewrite=0
 innodb_support_xa=0
 innodb_thread_concurrency=0
 innodb_flush_log_at_trx_commit=2
 innodb_max_dirty_pages_pct=50
 innodb_use_native_aio=1
 innodb_stats_persistent = 1
 innodb_spin_wait_delay= 6 / 96

# perf special
 innodb_adaptive_flushing = 1
 innodb_flush_neighbors = 0
 innodb_read_io_threads = 4
 innodb_write_io_threads = 4
 innodb_io_capacity = 4000
 innodb_purge_threads=1
 innodb_adaptive_hash_index=0

# monitoring
 innodb_monitor_enable = '%'
 performance_schema=OFF



in case you need it, Linux Sysbench binaries available from here :

NOTE: interesting that on this server using different Sysbench binaries did not show any difference, while using UNIX socket is still faster -vs- IP port for this workload..

The Sysbench command used to run Point-Selects test via UNIX socket (starting 8 processes in parallel):
LD_PRELOAD=/usr/lib64/libjemalloc.so /BMK/sysbench --num-threads=$1 --test=oltp --oltp-table-size=1000000 \
--oltp-dist-type=uniform --oltp-table-name=sbtest_1M_$n \

--max-requests=0 --max-time=$2 --mysql-socket=/SSD_raid0/mysql.sock \
--mysql-user=dim --mysql-password=dim --mysql-db=sysbench \
--mysql-table-engine=INNODB  --db-driver=mysql \
--oltp-point-selects=1 --oltp-simple-ranges=0 --oltp-sum-ranges=0 \
--oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-skip-trx=on \
--oltp-read-only=on run  > /tmp/test_$n.log &

The Sysbench command used to run Point-Selects test via IP port (starting 8 processes in parallel):
LD_PRELOAD=/usr/lib64/libjemalloc.so /BMK/sysbench --num-threads=$1 --test=oltp --oltp-table-size=1000000 \
--oltp-dist-type=uniform --oltp-table-name=sbtest_1M_$n \

--max-requests=0 --max-time=$2 --mysql-host=127.0.0.1 --mysql-port=5700 \
--mysql-user=dim --mysql-password=dim --mysql-db=sysbench \
--mysql-table-engine=INNODB  --db-driver=mysql \
--oltp-point-selects=1 --oltp-simple-ranges=0 --oltp-sum-ranges=0 \
--oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-skip-trx=on \
--oltp-read-only=on run  > /tmp/test_$n.log &


What else to say? Still hard to believe, but seems like we're entering completely new levels in MySQL Performance...

Kudos MySQL & Intel Teams !!! ;-)

So far, I have much more yet details to share with all other testings already previewed on http://www.mysql.com/why-mysql/benchmarks/  as well, hope to publish them very soon, stay tuned ;-))

NOTE: if you're looking for some live news
  • come to reach me during my tomorrow's talk on MySQL Connect @OOW2015 about MySQL Performance ("MySQL Performance: Demystified Tuning and Best Practices", Moscone South—262, 10/27/15, 17:15 - 18:00) [Slides: PDF  ]
  • or meet all of us during BOF "Meet MySQL Team" or MySQL Community Reception (Tuesday, Oct 27 @ 7:00 PM - 9:00 PM | Jillian's at Metreon) just after my talk ;-))

Rgds,
-Dimitri
Posted by Dimitri at 20:58 - Comments...
Categories: MySQL