Tuesday, 09 February, 2016
Next article from the MySQL 5.7 Performance stories, now about OLTP_RW
scalability (if you missed any previous ones, see 1.6M
SQL Query/sec (QPS) with MySQL 5.7, 1M
SQL Query/sec on mixed OLTP_RO / true
Point-Selects performance / over
100K Connect/sec Rate / Re:Visiting
nnoDB vs MyISAM Performance -- all with MySQL 5.7)..
Before we'll start looking on OLTP_RW results, let me explain first why we payed so many attention to MySQL 5.7 Performance in RO (read-only) workloads (and all my previous posts were mostly about RO as well).. -- the reason is very simple: there is no great RW performance if RO is lagging.. And also because we were pretty bad on RO before 5.7 ;-))
Let's get a look on the following graphs :
- the graphs are representing the test results obtained more than 2 years ago..
- they are all obtained from the same 32cores-HT server (4CPU sockets, each with 8cores-HT)
- and we were looking for the best possible MySQL server performance on this host by limiting MySQL instance to be running on 1/2/4CPUs (8/16/32cores) and using/not-using CPU HyperThreading (HT) (16cores-HT vs 16cores, etc.)..
So, what we observed over 2 years when MySQL 5.7 development was just started ?..
Here are the results obtained on OLTP_RO workload on MySQL 5.5 / 5.6 / and 5.7 on that time :
on MySQL 5.5 :
- the results on 16cores-HT are x2 times better than on 32cores..
on MySQL 5.6 :
- the results on 32cores are just slightly better than on 16cores-HT
- as well the difference between 32cores vs 32cores-HT results is pretty small..
on MySQL 5.7 :
- same as on 5.6, the results on 32cores are just slightly better than on 16cores-HT
- but near no difference at all in 32cores vs 32cores-HT results..
- and, the most painful, is that an overall result is worse than on MySQL 5.6 (!)..
- this was the first painful point from where MySQL 5.7 was started over 2 years ago ;-))
- (and probably you're better understanding now why we're so happy to see MySQL 5.7 scaling really well today and easily reaching now over 1M QPS on the same OLTP_RO workload ;-))
But well, let's go back 2 years ago again, and see also what it was about OLTP_RW workload on that time :
The following are the similar test results on MySQL 5.5/ 5.6/ 5.7 , but about OLTP_RW :
I think you may observe the same tendency by yourself :
- MySQL 5.5 is scaling up to only 16cores-HT
- on MySQL 5.6 and 5.7 the results on 32cores are better than on 16cores
- the benefit from CPU HyperThreading is better seen on 32cores-HT now (but not that big as on 16cores-HT)
- however, MySQL 5.7 is better "resisting" to a higher concurrent users load
- while the Max peak TPS is still reached by MySQL 5.6, and not 5.7 ;-))
- but the most killing here is not this..
- in fact the presented OLTP_RW results are intentionally presented in QPS (Query/sec) and not in TPS (Transactions/sec)
- this is making OLTP_RW results "comparable" with OLTP_RO ;-))
from where you may discover the painful point #2 :
- over 2 years ago our OLTP_RW performance was better than OLTP_RO (!!!)
- and this was true for all presented MySQL versions on that time..
- NOTE : OLTP_RW workload is including OLTP_RO ;-))
- NOTE (again) : to be exact, OLTP_RW is extending OLTP_RO by adding write operations (INSERT, DELETE, UPDATE), so we're writing to the disk, we're logging every transaction, we're hitting transaction/REDO locking, and we're still reaching a higher QPS level than a pure OLTP_RO running fully in-memory... -- and this is all because our transactions management in InnoDB on that time was very heavy on locks and did not scale at all..
- Hope you can better understand now our frustration level 2 years ago, and challenges we faced on that time ;-))
That's why so many efforts were spent to improve InnoDB performance in MySQL 5.7 on RO workloads.. -- was this challenge fully completed?.. -- not yet (some specific cases (block lock, AHI, etc.) are still remaining; then many new functionality features were added in MySQL 5.7 over a time, and adding more code making an overall code path more long as well, so on low load RO workloads you may observe some slight regressions with MySQL 5.7 today.. -- however, as soon as your load is growing, you'll see a real benefit from improved MySQL 5.7 scalability ;-)) Le's say that with MySQL we got a rid of the "main scalability show-stopper" for RO workloads! - and, of course, we don't stop here, work in progress, and yet more other improvements are in our TODO list ;-))
Now, what about MySQL 5.7 Performance on RW workloads ?..
- the main InnoDB RW scalability show-stopper (generally and particularly in MySQL 5.7) is REDO log locking (log_sys mutex)
- well, to be exact, log_sys contention is the "final" show-stopper ;-))
while before hitting log_sys, you may hit and be blocked by :
index lock contention (big stopper for RW workloads, was
finally fixed since MySQL 5.7 only.. -- before the only possible
"workaround" was to use partitioning (this will split your hot
table in several tables (partitions), means split your index as
well, means split your contention by the number of partitions,
transaction lock (trx_sys mutex) -- greatly improved in
MySQL 5.7 too
- lock_sys overhead -- lowered in MySQL 5.7, but need yet to be more improved..
- AHI (Adaptive Hash Index) contention (btr_search_latch RW-lock) -- there is a huge story behind it, but to make it short - you're better to disable it on RW workloads, as every data modification is involving AHI update (e.g. write lock), and you're quickly hitting a serialization here.. (work in progress to improve it)..
- index lock contention (big stopper for RW workloads, was finally fixed since MySQL 5.7 only.. -- before the only possible "workaround" was to use partitioning (this will split your hot table in several tables (partitions), means split your index as well, means split your contention by the number of partitions, etc)..
- but well, as soon as you're using MySQL 5.7, your main RW "scalability limit" will be mostly log_sys contention ;-))
- and, unfortunately, we were not able on MySQL 5.7 timeframe to improve this part of code as much as we made it for RO issues..
- a true fix is requiring a complete REDO log management re-design, and our timing was not favorable here..
- however, a probe prototype of the potential new solution showed us a great improvement (you can see its impact in the past LinkBench test results on MySQL 5.7)..
- the amazing part of this probe patch was that we were able to reach the same or better performance while using innodb_flush_log_at_trx_commit=1 (and flushing REDO log on every transaction) vs innodb_flush_log_at_trx_commit=2 (flushing REDO log only once per second).. -- this clearly proved that the main issue here is not the IO related fsync() of REDO log file, but the REDO log management itself..
- but well, we're not yet there ;-))
- so, while our MySQL 5.7 scalability on RW workloads got more better with innodb_flush_log_at_trx_commit=2, we're not really better with innodb_flush_log_at_trx_commit=1 yet (and on low loads / small HW configs you may see no difference vs MySQL 5.6) -- in fact getting other contentions lowered, the log_sys contention became more hot, and there is nothing to do with it, except to get it fixed, so the work in progress is here too ;-)) -- while with MySQL 5.6 you may still hit instead many other problems which were fixed only since MySQL 5.7, so the best answer here will be only your own test validation..
Well, this was about internal contentions which may limit RW scalability. While there are still few more factors :
trx_commit (trx) -- already mentioning before
(innodb_flush_log_at_trx_commit=0/2/1) and, of course, flushing REDO
log data to disk on every transaction commit
(innodb_flush_log_at_trx_commit=1) for sure will bring more penalty if
you're flushing REDO only once per second
(innodb_flush_log_at_trx_commit=2) -- while the risk here is to loose
the last second transaction(s) only (and maybe even nothing if your OS
& storage did not crash or if you're using semi-sync replication, or
even less than last 1 sec (because in reality REDO log with
innodb_flush_log_at_trx_commit=2 is still flushed more often than once
per second), and even many "serious companies" are doing so, etc.etc.)
-- but well, you're always better to evaluate what is valid for your
own production ;-))
flush_method -- as you're writing to disk, you have to choose
the way how your page writes will be flushed to the disk.. -- InnoDB
has several options here (and you may find many discussions around and
different people defending different option preferences, etc.) -- I'd
say from all the past experience and fighting various issues with FS
cache, my preferred option here will be to use O_DIRECT (or
O_DIRECT_NOFSYNC when available) combined with AIO
(innodb_flush_method=O_DIRECT_NOFSYNC and innodb_use_native_aio=1).
And, curiously, I'm still prefer EXT4 (while many are claiming XFS is
better) -- will post my observations later about ;-))
double_write (dblwr) -- the only solution InnoDB has to protect
your data from partially written pages on system crash (so, InnoDB
will write each page twice: first on dblwr buffer disk space (sys
tablespace), and once the write is confirmed, the page is written on
its own place (and if on that write the system will crash, the valid
page copy will be recovered from dblwr)) -- while I often hear that on
the "modern HW" not need to care about, the risk is still here ;-))
and it's still up to you to decide will you turn this protection ON or
OFF (innodb_doublewrite=1/0). However, there are several alternatives
- you may buy Fusion-io flash card and use their NVMFS filesystem which is supporting "atomic IO writes" (so each page write is confirmed to be fully written) -- MySQL 5.7 is supporting this card automatically (combined with O_DIRECT)
- you may use "secured" by-design FS (like ZFS for ex. or ZFS Appliance) -- such a storage solution by definition will garantee you'll never loose any bit of your data ;-)) (on the same time don't be surprised your writes are going slower -- each write (and read!) is hardly verified) -- while this may still be faster than the current dblwr..
- or use FS with data journal (like EXT4, but you have to use O_DSYNC with it, so some FS cache related surprises are potentially possible ;-))
I'd say the HW-based "atomic IO writes" solution is looking as the
most strong.. -- but we're working here as well to bring yet more
possible options, so stay tuned ;-))
purge -- a kind of "garbage collector" in InnoDB, running in
background, can be configured with several "purge threads", however
you may still see it lagging in your RW workload (can be observed as a
growing or remaining high "History List" via "show engine innodb
status" or via InnoDB METRICS table) -- the problem with constantly
lagging purge is that your data space can be finally completely filled
up with a "trash", and your whole database processing will be stopped
due no more free disk space available.. The good news with MySQL 5.7
that if even purge is lagging during a high load, it'll be still able
to catch up once the load become low and "auto-magically" free
the disk space used by UNDO images (this is available only since
MySQL 5.7, and in all previous versions the only solution to get all
this disk space back was to drop the whole InnoDB instance and restore
it from a backup or import from a dump).. -- so, it's important to
configure several purge threads to make such a space recovery faster
adaptive flushing -- I'll not go too much in details here as
the topic is extremely interesting and worth a dedicated article
about, so here will just mention that since MySQL 5.7 you can have
several "flushing threads" (cleaners) working in parallel -- the
initial analyze about what is going odd was made yet more than 3 years
ago with MySQL 5.6 (see: http://dimitrik.free.fr/blog/archives/2012/10/mysql-performance-innodb-buffer-pool-instances-in-56.html
for details) -- however this was only the first step in this
adventure, and a more advanced design was required ;-)) -- well, we're
not yet "perfect" here, yet more to come, will just mention here that
using 4 threads is usually ok (innodb_page_cleaners=4), then the IO
capacity setting should be adapted to your workload and your storage
(ex. innodb_io_capacity=2000 innodb_io_capacity_max=10000), and there
is no more danger to use bigger REDO log files (recovery processing is
going much more faster now than before, as well only a "really needed"
REDO space is used, as well a previously existing "read-on-write"
issue on REDO logs was fixed since MySQL 5.7, so using 8GB REDO, or
bigger is no more a problem (innodb_log_file_size=1024M
innodb_log_files_in_group=8) -- well, sorry to skip the details here,
will provide them all later..
checksums -- as soon as you're using crc32 option, you're fine
;-)) however, keep in mind that this is not impacting your scalability
limits, this is a pure "overhead" (your performance levels will still
scale with the same tendency, just that the response times will be
- there are some other points/tuning/etc. are coming in the game as well, but let's keep the list short just with the most important ones ;-))
After all this "preface", let's focus now on the OLTP_RW benchmark testing (hope it was not too much boring until now ;-))
So far, my main goal on the following testing is to mainly analyze the scalability of MySQL 5.7 on OLTP_RW workload :
- means, I don't need a too big database (I'm not testing the storage here ;-))
so, the dataset should be :
- not too small to run fully on CPU caches level ;-))
- and not too big either to not involve IO reads (otherwise, again, we're testing the storage performance ;-))
My HW platform :
- for my tests I'll use the 72cores-HT server running OracleLinux-7.2 and having flash storage
- why 72cores ?..
- in fact this is a 4CPU sockets server (18cores-HT per CPU socket)
- so, I can easily test scalability on 1CPU (18cores-HT), 2CPU (36cores-HT) and 4CPU (72cores-HT) by binding my MySQL server to run exclusively on these CPU cores..
- then, these CPUs are the latest CPU chips from Intel, they are really way more powerful comparing to what I have on my older machines..
- and this is where the whole HW tendency is going -- you'll see these CPUs on all "big" and "commodity" HW, and even 18cores-HT per CPU is not a limit either, so there are really fun times are coming (and if you're still thinking that "commodity" HW is a host with 4cores -- it's a good time to wake up ;-))
While my main interest here is about MySQL 5.7, I'm also curious to see what are the limits on the other MySQL Engines as well, and I have the following on my list :
MySQL Engines :
- MySQL 5.7
- MySQL 5.6
- MySQL 5.5
- Percona Server 5.6
- MariaDB 10.1
Test Scenario :
- from the previous OLTP_RO test I've already observed that all engines are worse vs MySQL 5.7 when a single table only is used in OLTP test.. -- so, no need to waste a time again to point to the same problem..
- let's focus then on x8-tables OLTP_RW Sysbench test workload, each table of 1M
- before each test the database is completely restored from its backup (clean dataset for each test)
- the load is progressively growing from 8, 16, 32, .. up to 1024 concurrent users
- each load level is kept at least for 5min (was enough to get an understanding about scalability limits, while I'd prefer more longer steps, while in the current case there was no way to run more longer iterations, as to cover all planned test conditions the whole testing already took over 2 weeks non-stop running ;-))
each MySQL Engine is tested within the following configurations :
- trx2 -- innodb_flush_log_at_trx_commit=2 && innodb_doublewrite=0 (default)
- trx1 -- innodb_flush_log_at_trx_commit=1 && innodb_doublewrite=0
- trx1-dblwr1 -- innodb_flush_log_at_trx_commit=1 && innodb_doublewrite=1
each configuration is also tested with the following tuning
- ccr0-sd6 -- innodb_thread_concurrency=0 (default) && innodb_spin_wait_delay=6 (default)
- ccr64-sd6 -- innodb_thread_concurrency=64 && innodb_spin_wait_delay=6
- ccr0-sd6 -- innodb_thread_concurrency=0 && innodb_spin_wait_delay=96
- ccr64-sd6 -- innodb_thread_concurrency=64 && innodb_spin_wait_delay=96
- and, finally, all configurations + all tuning combinations are tested on 1, then 2, then 4 CPU sockets (18cores-HT, 36cores-HT, 72cores-HT)..
- the best obtained results for each Engine from any tested combinations then used to compare performance in different configurations (best-to-best comparison)..
I think I need to explain here a little bit more in details the impact of the mentioned tuning options :
thread_concurrency : a well known InnoDB tuning to limit the
amount of concurrently running threads (usually no more required since
MySQL 5.7 for RO workloads, but still helping for RW -- as we're
writing and for sure will involve IO operations + manage various
raw/data locking (via mutexes/RW-locks, etc.) -- there is still a
significant benefit possible with an "optimal" thread concurrency
limitation. Which setting could you consider optimal?.. -- I'd say you
need to analyze which peak performance level you're reaching on your
workload without concurrency limit (innodb_thread_concurrency=0) and
see how many concurrent user sessions are running during this period
-- this will be then your main concurrency target (by not allowing
more than N concurrent threads you'll be able to keep your performance
stable even with a higher load (well, at least not to see it quickly
going down ;-)) -- in my cases the most optimal setting was 64 until
now (innodb_thread_concurrency=64), while in your case it may be
something different as well (this tuning is fully dynamic, so you may
do live experiments on any running workload at any time you want ;-))
- spin_delay : and this tuning is directly related to how internal lock primitives (mutexes/RW-locks) are "spinning" on a lock wait (threads waiting on a lock will "sleep" a given delay between spins before to re-try to acquire a lock again) -- the important point here is that a waiting thread in InnoDB will not really "sleep" on delay, but rather execute a "pause" instruction to CPU, so the CPU will switch to execute another thread(s), and waiting thread will come back as soon as its "pause" is finished (for this reason "show mutex" output about mutex/RW-locks spins/waits is better reflecting as for today InnoDB internal waits stats (as the time spent on a wait is not really wasted)). The question is then which value will be the most optimal here?.. -- again, you can get it only by testing by yourself ;-)) (this tuning is also dynamic) -- the 6 is default value, and I'm usually using 96 (innodb_spin_wait_delay=96) for big enough systems. Again, for RO workloads since MySQL 5.7 it's no more required, while for RW workloads we'll hit log_sys mutex contention for sure, and such a tuning usually may help.. The only problem here is that this setting is applied to all lock primitives together, so you really need to do experiments yourself to see what is better for you. However, by getting rid of hot contentions with every new improvement in InnoDB, we're progressively making the need of such a tuning obsolete.. (work in progress, stay tuned ;-))
Now, let me show the impact of this tuning by example :
- the following graph is representing MySQL 5.7 results on OLTP_RW test
- there are 4 results for the same MySQL 5.7, just with different concurrency/spin_delay tuning settings: ccr=0 / 64, sd=6 / 96
- as you can see, tuning the spin_delay for this Engine in this workload giving the most important impact..
- with spin_delay=6 (sd6) we're getting a better performance up to 64 concurrent users
- however with spin_delay=96 (sd96) we're going more far up to 128 users, and then able to keep near the same level of performance on a higher load as well..
- interesting that in this case tuning thread concurrency helps only for sd6 setting, and has no impact on sd96
- (but by the past experience I know it helps a lot on IO-bound workloads, so no reason to not test it ;-))
The same tuning was applied to all other Engines, and then the best obtained results collected (Max(QPS) or Max(TPS)) for each test case.
Now, if you're curious, let me show you yet few more details about :
- so, the next following graphs is representing "live" stats data corresponding to the obtained above results
from the left to the right you can see 4 tests with the same MySQL
5.7, but configured with :
- #1) sd6, ccr0
- #2) sd6, ccr64
- #3) sd96, ccr0
- #4) sd96, ccr64
- the first graph is showing reached Commit/sec rate (TPS)
- the second one is the amount of concurrent user sessions
- and the third graph is showing corresponding mutex/RW-locks spin waits reported by InnoDB :
- as you can see the default #1) case is hitting the highest lock contentions and reaching the lowest TPS..
- tuning concurrency=64 in the case #2) is helping to lower waits on other locks, except log_sys, and also helps to avoid a TPS drop on a higher load..
- tuning spin_delay=96 in case #3) lowering finally log_sys and giving us the highest TPS result here
- adding concurrency=64 in case #4) lower spin waits yet more, but then the processing becomes "too relaxed", and TPS results is not better, while becomes more stable ;-))
While if we will go yet more in details about observed spin waits we may discover the following (adding CPU Usage% and Perf Profiler stats):
- well, just to show you that only once spin_delay was set to 96 we're starting to use CPU time fully..
- however, we're spending 15%, 20%, or over 25% in the "sleeping" code (ut_delay())
- and this is where our future potential gain is ;-))
Very hope the next MySQL/InnoDB version will get a rid of all these lock contentions and use HW way more efficiently.. -- let's see ;-))
Now, let's go back to the beginning of all this story (over 2 years ago), and get a look where we're finally today!
So far, just as a reminder, here are the results on OLTP_RO workload obtained on the all mentioned MySQL Engines on the same 72cores-HT server (and published before) :
Sysbench OLTP_RO 1M x 8-tables @72cores-HT (QPS) :
As you can see, MySQL 5.7 is reaching here 1M QPS, while MySQL 5.6 (and other "5.6 based") Engines are blocked around 400K QPS...
Now, what about OLTP_RW ?..
Sysbench OLTP_RW 1M x 8-tables @72cores-HT, config: trx_commit=2 dblwr=0 (QPS) :
- NOTE : this is OLTP_RW results with Engines configured with trx_commit=2 and dblwr=0, so the max possible OLTP_RW performance is reached here..
- NOTE : the results are presented in QPS (and not TPS) to be "compatible" with OLTP_RO
- so far MySQL 5.7 is reaching 800K QPS here, the best result from all Engines, and its OLTP_RW result is lower than 1M QPS OLTP_RO (as naturally should be "expected")
- Percona Server 5.6 is on the second positions with its not far from 600K QPS, and way higher than its 400K QPS obtained on OLTP_RO..
- the 3rd is MySQL 5.6, 500K QPS on OLTP_RW, while 400K QPS on OLTP_RO
- 4th is MariaDB 10.1 with 450K QPS on OLTP_RW and higher result as well than on OLTP_RO..
So far, the OLTP_RO vs OLTP_RW target was finally reached by MySQL 5.7 only. Time for other challenges, work in progress ;-))
Let's now go back to OLTP_RW results more in details and see the impact of all tested configurations.
Sysbench OLTP_RW 1M x 8-tables @72cores-HT, config: trx_commit=2 dblwr=0 (TPS) :
MySQL 5.7 Scalability @Sysbench OLTP_RW 1M x 8-tables @72cores-HT, config: trx_commit=2 dblwr=0 (TPS) :
MySQL 5.6 Scalability @Sysbench OLTP_RW 1M x 8-tables @72cores-HT, config: trx_commit=2 dblwr=0 (TPS) :
MySQL 5.5 Scalability @Sysbench OLTP_RW 1M x 8-tables @72cores-HT, config: trx_commit=2 dblwr=0 (TPS) :
Percona Server 5.6 Scalability @Sysbench OLTP_RW 1M x 8-tables @72cores-HT, config: trx_commit=2 dblwr=0 (TPS) :
MariaDB 10.1 Scalability @Sysbench OLTP_RW 1M x 8-tables @72cores-HT, config: trx_commit=2 dblwr=0 (TPS) :
Sysbench OLTP_RW 1M x 8-tables @72cores-HT, config: trx_commit=1 dblwr=0 (TPS) :
MySQL 5.7 Scalability @Sysbench OLTP_RW 1M x 8-tables @72cores-HT, config: trx_commit=1 dblwr=0 (TPS) :
MySQL 5.6 Scalability @Sysbench OLTP_RW 1M x 8-tables @72cores-HT, config: trx_commit=1 dblwr=0 (TPS) :
MySQL 5.5 Scalability @Sysbench OLTP_RW 1M x 8-tables @72cores-HT, config: trx_commit=1 dblwr=0 (TPS) :
Percona Server 5.6 Scalability @Sysbench OLTP_RW 1M x 8-tables @72cores-HT, config: trx_commit=1 dblwr=0 (TPS) :
MariaDB 10.1 Scalability @Sysbench OLTP_RW 1M x 8-tables @72cores-HT, config: trx_commit=1 dblwr=0 (TPS) :
Sysbench OLTP_RW 1M x 8-tables @72cores-HT, config: trx_commit=1 dblwr=1 (TPS) :
MySQL 5.7 Scalability @Sysbench OLTP_RW 1M x 8-tables @72cores-HT, config: trx_commit=1 dblwr=1 (TPS) :
MySQL 5.6 Scalability @Sysbench OLTP_RW 1M x 8-tables @72cores-HT, config: trx_commit=1 dblwr=1 (TPS) :
MySQL 5.5 Scalability @Sysbench OLTP_RW 1M x 8-tables @72cores-HT, config: trx_commit=1 dblwr=1 (TPS) :
Percona Server 5.6 Scalability @Sysbench OLTP_RW 1M x 8-tables @72cores-HT, config: trx_commit=1 dblwr=1 (TPS) :
MariaDB 10.1 Scalability @Sysbench OLTP_RW 1M x 8-tables @72cores-HT, config: trx_commit=1 dblwr=1 (TPS) :
From the presented above OLTP_RW results you can see that :
- switching trx_commit=1 is giving the biggest impact on performance..
- adding dblwr=1 here is not impacting too much due a relatively not too hard flushing involved by the tested workload (not too hard for the used flash storage)
- with trx_commit=2 the absolute winner is MySQL 5.7
- with trx_commit=1 on 72cores the winner is MySQL 5.7, while on 36cores is Percona Server 5.6 (small, but visible gain), and on 18cores rather MySQL 5.5 (surprise! ;-)) just that it has a drop on 1024 users
- with trx_commit=1 + dblwr=1 the result is not much different from just trx_commit=1 :
INSTEAD OF SUMMARY :
- there was a really huge gain made in scalability improvement in MySQL 5.7 !..
- reaching over 40K TPS on OLTP_RW is the highest ever result I've seen on MySQL until now ;-)
- however, there is yet more to do in MySQL for efficiency and further scalability improvements in RW workloads..
- the main challenges are around REDO log management
- while on heavy IO-bound RW workload the double_write becomes a huge problem as well (more about later, in the next articles)..
- well, work in progress, stay tuned ;-))
As usually, any comments are welcome! And thank you for using MySQL! ;-))
The Sysbench command used to run OLTP_RW test via IP port (starting 8 processes in parallel):
$ LD_PRELOAD=/usr/lib64/libjemalloc.so /BMK/sysbench --num-threads=$1 \ --test=oltp --oltp-table-size=1000000 \ --oltp-dist-type=uniform --oltp-table-name=sbtest_1M_$n \ --max-requests=0 --max-time=$2 --mysql-host=127.0.0.1 --mysql-port=5700 \ --mysql-user=dim --mysql-password=dim --mysql-db=sysbench \ --mysql-table-engine=INNODB --db-driver=mysql \ run > /tmp/test_$n.log &
the my.conf I've used during the tests :
[mysqld] # general table_open_cache = 8000 table_open_cache_instances=16 back_log=1500 query_cache_type=0 max_connections=4000 # files innodb_file_per_table innodb_log_file_size=1024M innodb_log_files_in_group=3 innodb_open_files=4000 # buffers innodb_buffer_pool_size= 32000M innodb_buffer_pool_instances=32 innodb_log_buffer_size=64M # tune innodb_checksums=0 innodb_doublewrite= 0 / 1 innodb_support_xa=0 innodb_thread_concurrency=0 / 64 innodb_flush_log_at_trx_commit=2 / 1 innodb_flush_method=O_DIRECT_NOFSYNC innodb_max_dirty_pages_pct=90 innodb_max_dirty_pages_pct_lwm=10 innodb_lru_scan_depth=4000 innodb_page_cleaners=4 join_buffer_size=32K sort_buffer_size=32K innodb_use_native_aio=1 innodb_stats_persistent = 1 innodb_spin_wait_delay=6 / 96 # perf special innodb_adaptive_flushing = 1 innodb_flush_neighbors = 0 innodb_read_io_threads = 4 innodb_write_io_threads = 4 innodb_io_capacity=2000 innodb_io_capacity_max=4000 innodb_purge_threads=4 innodb_max_purge_lag_delay=30000000 innodb_max_purge_lag=0 innodb_adaptive_hash_index=0 # Monitoring innodb_monitor_enable = '%' performance_schema=OFF
Friday, 11 December, 2015
Next article from the MySQL 5.7 Performance stories (if missed, see 1.6M
SQL Query/sec (QPS) with MySQL 5.7, 1M
SQL Query/sec on mixed OLTP_RO , true
Point-Selects performance and over
100K Connect/sec Rate -- all with MySQL 5.7)..
The today's article will be about re:visited MyISAM -vs- InnoDB performance comparison within MySQL 5.7 -- in fact the main and detailed article related to these engines comparison I've already published in 2012 but with MySQL 5.6 (just before MySQL 5.6 became GA) -- however, since then I'm constantly re:asked "And what about MySQL 5.7 ?" -- so, the following is the answer ;-))
First of all, let's summarize little bit what was already observed with MySQL 5.6 before :
- so, on Full Text Search (FTS) InnoDB was already way better than MyISAM, no need to replay..
- (on OLTP RW with its table-locking design MyISAM will always loose -vs- row-locking InnoDB, no need to waste a time either)
- then, on mixed OLTP_RO workload : InnoDB was slightly, but better (215K QPS) -vs- MyISAM (200K QPS)
- on Simple-Ranges queries workload : InnoDB was much better (170K QPS) -vs- MyISAM (95K QPS)
and on Point-Select queries workload : InnoDB was much worse (250K
QPS) -vs- MyISQM (430K QPS)
- NOTE: however, InnoDB was able to reach 450K QPS in experimental on that time "read_only mode", this showed the potential gain possible "in theory" to reach on InnoDB by by-passing the whole transactions layer, but not too much useful in real life (if writes are not allowed), so for MySQL 5.6 times it was more likely a hacking rather a real solution..
- the whole transactional layer was greatly improved in InnoDB
- as well the whole MySQL Server code got a rid of all known scalability bottlenecks
- so, we're having a much higher expectation to scale today with MySQL 5.7 than with 5.6 in the past ;-)
However, keeping this all in mind, it's not yet really clear if since then the gap between InnoDB and MyISAM was increased or decreased in MySQL 5.7 -- because general improvements in MySQL Server code made also MyISAM running faster.. -- and as usual, only a real test will give us a real answer ;-)
So far, this was the main reason to re:visit 3 years old obtained results, and my new results presented below are coming from the following :
- HW config : 32cores-HT server (exactly the same I've used 3 years ago) and a newer 40cores-HT server (to observe a tendency)
- OS : the same Oracle Linux 6.5
Test workloads :
- load level : 8, 16, 32, .. 1024 concurrent user sessions
- test cases: Sysbench OLTP_RO, RO Simple-Ranges, RO Distinct-Ranges, RO Point-Selects
- datasets : single table with 10M rows, 8 tables with 1M rows each
And, finally, here are the results :
Mixed OLTP_RO workload
dataset 10M x 1-table @32cores-HT :
dataset 1M x 8-tables @32cores-HT :
dataset 10M x 1-table @40cores-HT :
dataset 1M x 8-tables @40cores-HT :
dataset 10M x 1-table @32cores-HT :
dataset 1M x 8-tables @32cores-HT :
dataset 10M x 1-table @40cores-HT :
dataset 1M x 8-tables @40cores-HT :
dataset 10M x 1-table @32cores-HT :
dataset 1M x 8-tables @32cores-HT :
dataset 10M x 1-table @40cores-HT :
dataset 1M x 8-tables @40cores-HT :
dataset 10M x 1-table @32cores-HT :
dataset 1M x 8-tables @32cores-HT :
dataset 10M x 1-table @40cores-HT :
dataset 1M x 8-tables @40cores-HT :
- in MySQL 5.7 both InnoDB and MyISAM engines are giving better results than before on the same workloads and the same HW
- all the results obtained on 40cores-HT server are better than on 32cored-HT
- the most critical Point-Select workload is now leaving only a small gain to MyISAM -vs- InnoDB (and it's easy to understand - the overhead of transactions is still present in InnoDB, even if the code was greatly improved for scalability.. -- however, this is now nothing about hacking, but a true, production ready, re:designed code!)..
- then, on all other test workloads - InnoDB is doing just better than MyISAM
- while on all the tests using a single table - InnoDB is far way faster than MyISAM, even on Point-Selects
As usual, any comments are welcome! Thank you for using MySQL ! (and preparing your upgrade to MySQL 5.7 asap ;-))
MySQL 5.7 rocks! ;-)
Friday, 27 November, 2015
This article is continuing the MySQL 5.7 Performance story, started from 1.6M
QPS on MySQL 5.7 details post , then 1M
QPS on mixed OLTP_RO with MySQL 5.7 article, and detailed story
the Point-Selects performance is so critical (and why the 1M result
published by MariaDB is not fair)..
The current story will be about Connect/sec (connect/disconnect) performance improvement in MySQL 5.7 - such kind of metric is very critical for any application which cannot use persistent connections all the time (and many web apps are in such a case). Well, I'd say MySQL from the beginning was extremely good for its lightweight connections, and made in the past the base of success for many web solutions.. However, time is going, and we're no more with 4cores as "commodity hardware" (this is rather a smart-watch today ;-)) - so, there was a need to speed-up this Connect rate to match higher workloads. This was already greatly done in MySQL 5.6, and finally yet more improved in MySQL 5.7 - you may read all details about directly from our developers - I'll just present here a short summary about where we're today..
So far, first of all, how to test the Connect/sec performance of your MySQL server instance? - the most simple way here is just to use a standard Sysbench kit, load 10M rows into sysbench database (1 table or several tables, no matter -- the main show-stopper here is the Connect code itself), and then run the following :
#!/bin/bash # ---------------------------------------------------------------- # Connect/sec test # ---------------------------------------------------------------- for Users in 8 16 32 64 128 256 512 1024 do LD_PRELOAD=/usr/lib64/libjemalloc.so.1 sysbench --num-threads=$Users \ --test=oltp --oltp-table-size=10000000 \ --db-ps-mode=disable --oltp-dist-type=uniform --oltp-table-name=sbtest_10M \ --max-requests=0 --max-time=300 --mysql-socket=/tmp/mysql.sock \ --mysql-user=dim --mysql-password=dim --mysql-db=sysbench \ --mysql-table-engine=INNODB --db-driver=mysql \ --oltp-point-selects=1 --oltp-simple-ranges=0 --oltp-sum-ranges=0 \ --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-skip-trx=on \ --oltp-read-only=on --oltp-reconnect-mode=query --oltp-connect-delay=0 run sleep 30 done # ----------------------------------------------------------------
means on every point-select query your client session will re-connect, and the final QPS result will give you the max Connect/sec rate your MySQL instance is able to reach for a given amount of concurrent users.
And here are the results obtained from older to newer generations Intel-based Linux servers :
12cores-HT @2.9Ghz :
32cores-HT @2.3Ghz :
40cores-HT @2.3Ghz :
72cores-HT @2.5Ghz :
Instead of SUMMARY :
- Connect/sec performance is mainly depending on the MySQL "connect/disconnect" code itself + CPU chip speed
- it's hard to speak about "scalability" here as the max possible Connect/sec rate limit is reached pretty quickly and depending on IP stack performance as well..
- tuning "thread_cache_size" to something bigger than zero is helping here, but not too much..
- MySQL 5.7 showing the best performance here regardless the HW platform, and reaching over 100K Connect/sec on the latest Intel CPU chip
- there is a clear better-and-better tendency in MySQL 5.5 => 5.6 => 5.7 results
- and we still can do yet more better with MySQL 5.8 ! (question of time and resources - so, please, send us your feedback/ votes/ wishes if you want to see it ;-))
MySQL 5.7 rocks! ;-))
Wednesday, 11 November, 2015
This article is continuing the MySQL 5.7 Performance story, started from 1.6M
QPS details post, and followed by 1M
QPS OLTP_RO article. However, the current story will not be mostly
about MySQL 5.7, but also about announced on the same time MariaDB 10.1
So far, MySQL Team was proud to show 1.6M QPS on Point-Select (SQL) queries, and MariaDB 10.1 GA announce was also claiming an ability to reach 1M QPS, also on Point-Selects, but on POWER8 HW. And I may be only happy for MariaDB team for their progress on POWER systems, except just one small detail related to how their 1M QPS result was obtained..
But first of all, what are these Point-Selects and what is so special with this workload ?..
- point-select is representing a single SQL query reading a row by its primary key (PK)
- a workload based on such queries is very similar to any generic key-value store solution, but via SQL
- usually a point-select is extremely fast and doing a round-trip from end-to-end in database engine
- so any internals overhead in engine code, any contentions, any scalability limits are seen very quickly..
- historically such kind of workload did not scale very well on MySQL/InnoDB tandem..
- on MySQL Server side there was a huge amount of internal locks around every SQL query execution
- while on the InnoDB side there was a huge general overhead related to transactions management + related locks..
- the first step ahead was made here in MySQL 5.6
- but the real changes came only with MySQL 5.7 ;-)
So, our 1.6M QPS on Point-Selects with MySQL 5.7 we were able to reach under the most heavy conditions :
- each Sysbench "client" thread is executing a single point-select query per iteration
- no transactions clauses used, no grouping..
- single query round-trip is driving the response time latency
- a fair simulation of a key-value store by SQL access
and the Sysbench execution command line is looking then as the following:
LD_PRELOAD=/usr/lib64/libjemalloc.so /BMK/sysbench --num-threads=$1 \ --test=oltp --oltp-table-size=1000000 \ --oltp-dist-type=uniform --oltp-table-name=sbtest_1M_$n \ --max-requests=0 --max-time=$2 --mysql-socket=/SSD_raid0/mysql.sock \ --mysql-user=dim --mysql-password=dim --mysql-db=sysbench \ --mysql-table-engine=INNODB --db-driver=mysql \ --oltp-point-selects=1 --oltp-simple-ranges=0 --oltp-sum-ranges=0 \ --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-skip-trx=on \ --oltp-read-only=on run > /tmp/test_$n.log &
Now, about 1M QPS result published on MariaDB 10.1 :
- if you'll look in details published in their article, instead of --oltp-point-selects=1 the --oltp-point-selects=1000 was used
- and these 1K point-selects then grouped and executed within a single transaction!.. (--oltp-skip-trx=off is used (default))
Adapting these test conditions to my Sysbench command line this will then give the following :
LD_PRELOAD=/usr/lib64/libjemalloc.so /BMK/sysbench --num-threads=$1 \ --test=oltp --oltp-table-size=1000000 \ --oltp-dist-type=uniform --oltp-table-name=sbtest_1M_$n \ --max-requests=0 --max-time=$2 --mysql-socket=/SSD_raid0/mysql.sock \ --mysql-user=dim --mysql-password=dim --mysql-db=sysbench \ --mysql-table-engine=INNODB --db-driver=mysql \ --oltp-point-selects=1000 --oltp-simple-ranges=0 --oltp-sum-ranges=0 \ --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-skip-trx=off \ --oltp-read-only=on run > /tmp/test_$n.log &
How does this change the initial workload test case ?
- in one word : completely ;-)
- by grouping 1000 queries within a single transaction we're lowering a lot the whole related transaction locks..
- and as the result, an overall QPS numbers are growing! ;-)
- but did you see many real cases when a single MySQL session is grouping 1000 SELECT queries within a single transaction ?..
- I may still imagine a need to run several SELECTs within a single transaction to guarantee a read consistency...
- but such kind of queries are usually not short ones, so not really good to show a high QPS ;-)
- but well, even they are really short as point-selects.. -- what kind of application will run them by 1000 in a single shot ?..
- such kind of a "workaround" we proposed yet to our users when started to ship MySQL 5.6, but always mentioned that this is as a workaround, not a final solution (as this could help when you're really have many fast SELECTs and you can really group then within a single transaction without breaking your apps logic).. -- but as soon as potential amount of SELECTs to group is small, the transaction overhead is quickly eating your gain..
- so, the real solution here is - MySQL 5.7 ;-)
Let me show now all this by example, using the test scenario as proposed by MariaDB and just vary the amount of point-selects executed within a single transaction. Let's start with 50 point-selects (which is already too much as to me, but well) :
50 point-selects in transaction :
- MySQL 5.7 is still reaching its 1.6M QPS as well
- MariaDB 10.1 is reaching over 1.4M QPS (so, not only on Power while running within a "workaround" test conditions)..
Then the following are the results with 25, 10, and 5 point-selects executed within the same transaction :
25 point-selects in transaction :
10 point-selects in transaction :
5 point-selects in transaction :
- as you can see, less queries executed within the same single transaction - lower QPS is reached..
- so, MySQL 5.7 is going from 1.6M QPS to 1.2M QPS
- while MariaDB 10.1 is going from 1.4M QPS to just 600K QPS..
On the last test, with 5 point-selects within a transaction the lower QPS is also impacted by round-trips of BEGIN and COMMIT statements around transaction.. What the result will be if I'll not use transactions?
5 point-selects without transaction :
- as you can see, MySQL 5.7 is going back to its 1.6M QPS
- while MariaDB 10.1 is going yet more down to just slightly higher than 400K QPS only..
To get a better understanding of the transaction statements impact let's get a look on the following graph comparing MySQL 5.7 in the last test with 5 point-selects within transaction (left side) -vs- 5 point-selects without transaction (right side) :
- as you can see in overall amount of Queries/sec the left side is even higher! (BEGIN and COMMIT are also counted as queries by MySQL stats)
- and as we're wasting CPU cycles to process BEGIN and COMMIT - we're doing less SELECTs as the result..
- while on the right side (without transactions) the whole CPU time is spent for SELECTs, and we're reaching 1.6M QPS then ;-)
- it's up to you to consider whenever MariaDB 10.1 is really reaching 1M QPS, or 400K QPS only ;-))
- however, what is important for me here after all : you can really reach 1.6M QPS with MySQL 5.7 whatever the test conditions are used here ;-))
Then, to avoid any kind of speculations about new Intel chips as the reason of excellent MySQL 5.7 scalability results, let me just remind you about the results obtained on the old severs (32cores-HT and 40cores-HT), running the old Intel CPUs (similar 4CPU sockets each, but just 8cores-HT and 10cores-HT per CPU socket, both are running at 2300Mhz frequency). The following are the results obtained on the same Sysbench RO Point-Select workload :
- single point-select per iteration, no transactions
- 8 tables of 1M rows are used in the first test
- 1 table of 10M rows is used in the second test
- as you can see there is a huge gap between MySQL 5.7 and any other engine on 32, 40 and 72cores-HT HW..
- the gap on 72cores-HT HW is way bigger only because MySQL 5.7 is continuing to scale and reaching a yet more higher result, while all other engines are already reached their limits and cannot go anymore further..
- NOTE : only MySQL 5.7 is showing near the same QPS results on both 8-tables and 1-table workloads
INSTEAD OF SUMMARY :
From where is coming such a great scalability gain in MySQL 5.7 ?
- first of all it's a continuous improvement process started yet since MySQL 5.6 with kernel_mutex split + RO transactions on InnoDB side, and many various internal contentions improvements on the MySQL Server side
- then, on MySQL 5.7 InnoDB side : a whole transactions management redesign, improved lock management, redesigned index locking, etc...
- MySQL 5.7 Server side : resolved contentions around MDL, THR_lock, LOCK_grant, LOCK_plugin, and all other "next-level" locks fixes.. -- in fact for today there is no more any known/visible internal contentions in MySQL 5.7 except coming from InnoDB ;-)
- so far, all the credit is to our great MySQL Engineering Team! (all the listed stuff above is related to fundamental changes invented and implemented by our Engineering, taking us months and years of heavy work)..
and, of course, huge thanks to MySQL Community for all feedback we
have about ;-)
When MariaDB will be able to scale as far as MySQL 5.7 ?
- when it'll move to InnoDB SE from MySQL 5.7
and when on its Server side the lock contention in the "lock free"
table definition cache code will be fixed ;-)
When Percona Server will scale as far as MySQL 5.7 ?
- since Percona Server will move to the MySQL 5.7 code base
- and XtraDB moves to InnoDB code base from MySQL 5.7
not before ;-)
- or both are re:implementing all these changes in their 5.6 / 10.1 code ;-)
- And hope I don't say you here any news, because the same story was already with MySQL 5.6 too, but who cares? ;-))
Well, what to say.. -- #MySQL 5.7 rocks! ;-)
Friday, 06 November, 2015
Slides from my talk during MySQL Central @OpenWorld 2015 are available
from here now :
they should be soon available from the OpenWorld site as well.
Thursday, 05 November, 2015
This article is following the MySQL 5.7 Performance series started from 1.6M
QPS details post
Let's focus now on the Sysbench "mixed" OLTP_RO workload (so, not only Point-Selects, but all RO queries as it was initially designed in OLTP_RO scenario in Sysbench), which is composed of :
- x10 point-selects
- x1 simple-ranges
- x1 order-ranges
- x1 sum-ranges
- x1 distinct-ranges
Note about test conditions :
- in this and in the previous article we're seeking for MySQL & InnoDB scalability limits ;-)
- so, yes, the client and server are running locally on the same host (we're not testing here network stuff vendors)
- the load is not IO-bound either, very quickly the whole dataset is reaching BP and remains in memory (we're not testing IO vendors here)
- again, all the focus here is on MySQL scalability limits, nothing else..
Note about tested workloads :
- why we're paying so many attention to these "simple" queries from Sysbench ?..
- well, just consider it like the "entry ticket" for any database engine ;-)
- these simple tests in fact are extremely aggressive, and bombarding all engine internals very heavily..
- the point-select, for ex., is extremely fast and crossing on every pass the whole engine stack from entry to end point & back..
- while the distinct-ranges is extremely aggressive on malloc() scalability + in-memory temp tables management..
- etc. etc. etc..
- so, if your production workload is based on much more complex queries, and you're suspecting scalability issues -- rather to stay negative, send us your test workload and share your observations -- we cannot improve everything within a one single shot, right? -- there is still so many things to do yet..
- but I just may say you one thing : without improved scalability on "simple" loads any "complex" load scalability is just impossible.. -- so, stay tuned, we're coming ;-)
Note about the tested HW :
- I was told several times during the MySQL Central conference and other user meeting that 72cores config is waaay toooo big..
- guys, is it really big ?? ;-)
- just think that this is the same 4CPU sockets server as we used before..
- just that today's Intel chip has 18cores per socket -vs- 10cores we saw before, and the CPU chip itself was also greatly improved..
- well, if 4CPU is still toooo biiiig for you, will be 2CPU is "good enough" ?..
- but even with 2CPU you're having 36cores-HT here ;-)
- and this 2CPU 36cores-HT config is already showing a way better QPS than we observed on 4CPU 40cores-HT before !!
- what to do with this then ?? ;-)
- keeping in mind that the next Intel chip will give you 48cores-HT (24cores-HT per CPU socket)..
- so, you want it, or not -- scalability is THE MUST today to give MySQL users a full power of their HW ;-))
Well, a long story short, let's go directly to the test results.
Sysbench OLTP_RO 1M x 8-tables 72cores-HT :
- MySQL 5.7 is reaching 1M QPS on OLTP_RO !! (while seems like there is still a room for progress)..
- on the second position is MySQL 5.6, and in max-to-max comparison MySQL 5.7 is doing x2.5 times better than 5.6
- Persona Server 5.6 is taking #3position
- and, surprisingly MariaDB 10.1 here the worse, except MySQL 5.5
- Note that except MySQL 5.7, the difference between MySQL 5.5 result and other engines is not that big..
- this is because QPS gain here is coming for them mostly from "mixed" ranges queries, which was already not that bad in 5.5
- (and if you payed attention to the previous article, you may see that on point-selects MySQL 5.5 is reaching the same max QPS as all other engines, except 5.7)..
- so, yes, the huge gain you're seeing here for MySQL 5.7 is coming due all this heavy remastering started from TRX-list re-design 2 years ago and continued on all levels till 5.7 GA release..
- indeed, this was not easy, but yes, we done it !!! ;-))
What about scalability now?
MySQL 5.7 Scalability @Sysbench OLTP_RO 1M x 8-tables 72cores-HT :
MySQL 5.6 Scalability @Sysbench OLTP_RO 1M x 8-tables 72cores-HT :
MySQL 5.5 Scalability @Sysbench OLTP_RO 1M x 8-tables 72cores-HT :
Percona Server 5.6 Scalability @Sysbench OLTP_RO 1M x 8-tables 72cores-HT :
MariaDB 10.1 Scalability @Sysbench OLTP_RO 1M x 8-tables 72cores-HT :
- only MySQL 5.7 is really scaling here up to 72cores-HT
- except MySQL 5.5, all other engines are having better QPS on 32cores-HT -vs- 18cores-HT
- however, this gain is not that big.. - but MySQL 5.6 is doing better than others
- MariaDB 10.1 is the worse again, just better than the oldest MySQL 5.5..
Now the same workload, but using only a single 10M rows table.
Sysbench OLTP_RO 10M x 1-table 72cores-HT :
- MySQL 5.7 is the best again, but not yet reaching 1M QPS, a room for improvement ;-)
- MySQL 5.6 and Percona Server 5.6 on the #2 position
- and MariaDB 10.1 closing the group, just before MySQL 5.5
MySQL 5.7 Scalability @Sysbench OLTP_RO 10M x 1-table 72cores-HT :
MySQL 5.6 Scalability @Sysbench OLTP_RO 10M x 1-table 72cores-HT :
MySQL 5.5 Scalability @Sysbench OLTP_RO 10M x 1-table 72cores-HT :
Percona Server 5.6 Scalability @Sysbench OLTP_RO 10M x 1-table 72cores-HT :
MariaDB 10.1 Scalability @Sysbench OLTP_RO 10M x 1-table 72cores-HT :
- the result is very similar to 8-tables results, except that Percona Server 5.6 is very close with MySQL 5.6 here
- while MySQL 5.7 is just the best ;-)
And, similar to the previous article, here are the results reflecting the evolution of tested HW and CPU power :
MySQL 5.7 Scalability @Sysbench OLTP_RO 1M x 8-tables :
MySQL 5.7 Scalability @Sysbench OLTP_RO 10M x 1-table :
Indeed, the difference is very impressive ;-)
Over last 2 weeks in all discussions I've got many times the same question : do we really need to upgrade our HW to get the best from MySQL Performance ?...
Let me show you then just few following graphs :
here is HW evolution related to MySQL 5.5 on OLTP_RO the presented workload :
- as you can see, 5.5 was the best on 12cores-HT config on the old HW
- it was not able to show anything better on 32 or 40cores-HT configuration..
- however, moved to the newer CPU chip, even MySQL 5.5 is going x2 times faster here! ;-)
- and as you can see from the 5.5 scalability graphs, 1CPU socket is just enough here..
- and, unfortunately, 1CPU will be also your limit for MySQL 5.5 Performance ;-)
now the same for MySQL 5.6 :
- MySQL 5.6 is scaling better, and already able to show over 50% better performance on an old, but bigger HW
- but on the newer CPU it's doing yet x2 times better (even without scaling well)..
And, of course, once you'll move to MySQL 5.7, you'll be able to go yet more far on these new 2CPU or 4CPU sockets :
- no comments ;-))
And to finish, few more details about the test conditions :
my.conf -- used exactly the same config settings as in the previous article.
The Sysbench command used to run OLTP_RO test via IP port (starting 8 processes in parallel):
$ LD_PRELOAD=/usr/lib64/libjemalloc.so /BMK/sysbench --num-threads=$1 \ --test=oltp --oltp-table-size=1000000 \ --oltp-dist-type=uniform --oltp-table-name=sbtest_1M_$n \ --max-requests=0 --max-time=$2 --mysql-host=127.0.0.1 --mysql-port=5700 \ --mysql-user=dim --mysql-password=dim --mysql-db=sysbench \ --mysql-table-engine=INNODB --db-driver=mysql --oltp-skip-trx=on \ --oltp-read-only=on run > /tmp/test_$n.log &
Should I say once more again MySQL 5.7 rocks !!! ;-))
Monday, 26 October, 2015
It was exactly 2 years now since we
reached 500K QPS with MySQL 5.7 -- it was a great milestone,
and the highest ever result seen on MySQL with true SQL queries ;-)) And
this was on 32cores-HT machine.. (the linked article contains the whole
long story how we arrived to 250K QPS first, then 275K, 350K, 440K, and
finally 500K QPS) -- the main improvement here is coming from greatly
redesigned transactions and transaction list management in MySQL 5.7.
However, when the 40cores-HT server arrived to MySQL LAB we were not yet ready to show more than 500K QPS -- there were still some other contentions on external to InnoDB level.. Once these all were fixed, we finally reached 645K QPS on 40cores-HT server and pushed the limit more far yet ;-) -- this is what we were able to announce a year ago, during OOW2014.
Since then, an important work was made in MySQL 5.7 code to resolve all known contentions on the "server" side to let InnoDB run on a full speed without limits.. The work was done, but no one from our LAB servers was able to show the potential benefit of all these changes.. - in fact our servers became too old over a time, and it was no more clear what is the real limit of 5.7 and how far we can go today..
So, we started to look for our HW upgrade, and Intel new CPU chips were announced exactly on time (see http://wccftech.com/intel-unleashes-haswell-ex-xeon-e7-v3-processors-18-cores-45-mb-l3-cache-12-tb-ddr4-memory-support-57-billion-transistors/ -- sounds really good "57-billion-transistors" for MySQL 5.7 ;-)) -- and finally in collaboration with Intel we were able to get our hands on our new server with these new CPUs pretty quickly ;-)
- the server we used is exactly the same as we have with 32cores-HT and 40cores-HT
- same 4CPU sockets, etc.
- just that per CPU socket we got 18cores-HT now (instead of 8 or 10 before)
- and a slightly higher CPU frequency - 2.5Ghz instead of previous 2.3Ghz
- however, the difference is not only in the number of cores and CPU frequency ;-)
- the chip itself is really much more powerful !..
- NOTE: this CPU chips are already widely comercially available and from a long time ready for production
- (e.g. I'm not telling you here something new ;-))
Used HW config :
Intel(R) Xeon(R) CPU E7-8890 v3 (codename Haswell-EX) 4 CPU sockets x 18 cores-HT (144 CPU threads) 2.5 Ghz, 512GB RAM Linux kernel 3.16
well, a long story short -- here is the QPS graph with the results I've obtained on this machine :
So, yes, our new record is over 1.6M QPS of true SQL queries !!!!! ;-))
(same OLTP_RO Point-Selects as before, same conditions, etc.)
And this is really killing! ;-))
NOTE: from profiler stats I've monitored this is still NOT a limit !!! we're scaling on such kind of workloads perfectly well! -- the limit is just a CPU time..
NOTE: this is near x2.5 times better -vs- 40cores-HT the best 5.7 results, so it's pretty amazing to see such a huge progress on Intel chips (40cores is 2.3Ghz, 72cores is 2.5Ghz, but it's not just frequency and amount of CPU cores)...
NOTE: all other RO results are also mostly x2.5 times better ;-))
An this was yet in August 2015..
Let's see now how well can scale today other vendors on this server and the same workload.
Sysbench RO Point-Selects 1M x 8-tables 72cores-HT :
- MySQL 5.7 is simply x3 times better than any other..
- no comments ;-)
Now, what about scalability? (each Engine is now running only on 1, then 2, then all 4CPU sockets)
MySQL 5.7 Scalability @Sysbench RO Point-Selects 1M x 8-tables 72cores-HT :
MySQL 5.5 Scalability @Sysbench RO Point-Selects 1M x 8-tables 72cores-HT :
MySQL 5.6 Scalability @Sysbench RO Point-Selects 1M x 8-tables 72cores-HT :
Percona Server 5.6 Scalability @Sysbench RO Point-Selects 1M x 8-tables 72cores-HT :
MariaDB 10.1 Scalability @Sysbench RO Point-Selects 1M x 8-tables 72cores-HT :
- MySQL 5.7 is showing a near perfect scalability up to 72cores-HT on 4CPU sockets
- no one of other engines not scaling even up to 2CPU sockets (32cores-HT)
- interesting that MySQL 5.6 and Percona 5.6 are able to out-pass 500K QPS here
- not MariaDB 10.1
Well, now what if you have a single table hot in your workload? -- this was from a long time a very painful problem in MySQL, which is finally completely resolved with MySQL 5.7 :
Sysbench RO Point-Selects 10M x 1-table 72cores-HT :
- near the same performance on MySQL 5.7 on one 10M table as on 8x 1M tables
- cannot say the same about others: something like 20% regression everywhere..
- NOTE: x8 Sysbench processes are running here in parallel to get a rid of Sysbench's own contentions.
MySQL 5.7 Scalability @Sysbench RO Point-Selects 10M x 1-table 72cores-HT :
MySQL 5.5 Scalability @Sysbench RO Point-Selects 10M x 1-table 72cores-HT :
MySQL 5.6 Scalability @Sysbench RO Point-Selects 10M x 1-table 72cores-HT :
Percona Server 5.6 Scalability @Sysbench RO Point-Selects 10M x 1-table 72cores-HT :
MariaDB 10.1 Scalability @Sysbench RO Point-Selects 10M x 1-table 72cores-HT :
No comments ;-)
Now, let's get a look on impact of this new Intel CPU..
MySQL 5.7 Scalability @Sysbench RO Point-Selects 1M x 8-tables :
- it's amazing to see how the gap is big...
- NOTE: even on 8 concurrent users the result is near x2 times better than on any other server..
- killing.. ;-)
MySQL 5.7 Scalability @Sysbench RO Point-Selects 10M x 1-table :
Well, it's really amazing to see here that on the new CPU with 2 sockets (36cores-HT) I'm way faster than on 40cores-HT (4 sockets) before.. -- seems to me we should be prepared to see a lot of changes in MySQL Performance with these CPU chips..
Few more details about the presented test :
- I've used exactly the same config params as mentioned 2 years ago (see: http://dimitrik.free.fr/blog/archives/2013/10/mysql-performance-the-road-to-500k-qps-with-mysql-57.html)
- each engine as before was tested with tuned spin wait delay= 6/ 96 (helps a lot to lower InnoDB contentions here)
- the best obtained results is used then for each load level
- exactly the same Sysbench binaries as before, same scripts, etc..
- just that the MySQL 5.7 version is the latest one ;-)
- NOTE: for my curiosity I've also tested older 5.7 binaries as well.. -- no one was able to reach 1.6M QPS, and some even not reached 1M.. -- this is just to make it clear: we're reaching 1.6M QPS not just because the HW is better ;-))
- MySQL 5.7 is really scaling much more far here (and you need the latest version)
max_connections=4000 key_buffer_size=200M low_priority_updates=1 table_open_cache = 8000 back_log=1500 query_cache_type=0 table_open_cache_instances=16 # files innodb_file_per_table innodb_log_file_size=1024M innodb_log_files_in_group = 3 innodb_open_files=4000 # buffers innodb_buffer_pool_size=32000M innodb_buffer_pool_instances=32 innodb_additional_mem_pool_size=20M innodb_log_buffer_size=64M join_buffer_size=32K sort_buffer_size=32K # innodb innodb_checksums=0 innodb_doublewrite=0 innodb_support_xa=0 innodb_thread_concurrency=0 innodb_flush_log_at_trx_commit=2 innodb_max_dirty_pages_pct=50 innodb_use_native_aio=1 innodb_stats_persistent = 1 innodb_spin_wait_delay= 6 / 96 # perf special innodb_adaptive_flushing = 1 innodb_flush_neighbors = 0 innodb_read_io_threads = 4 innodb_write_io_threads = 4 innodb_io_capacity = 4000 innodb_purge_threads=1 innodb_adaptive_hash_index=0 # monitoring innodb_monitor_enable = '%' performance_schema=OFF
in case you need it, Linux Sysbench binaries available from here :
NOTE: interesting that on this server using different Sysbench binaries did not show any difference, while using UNIX socket is still faster -vs- IP port for this workload..
The Sysbench command used to run Point-Selects test via UNIX socket (starting 8 processes in parallel):
LD_PRELOAD=/usr/lib64/libjemalloc.so /BMK/sysbench --num-threads=$1 --test=oltp --oltp-table-size=1000000 \ --oltp-dist-type=uniform --oltp-table-name=sbtest_1M_$n \ --max-requests=0 --max-time=$2 --mysql-socket=/SSD_raid0/mysql.sock \ --mysql-user=dim --mysql-password=dim --mysql-db=sysbench \ --mysql-table-engine=INNODB --db-driver=mysql \ --oltp-point-selects=1 --oltp-simple-ranges=0 --oltp-sum-ranges=0 \ --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-skip-trx=on \ --oltp-read-only=on run > /tmp/test_$n.log &
The Sysbench command used to run Point-Selects test via IP port (starting 8 processes in parallel):
LD_PRELOAD=/usr/lib64/libjemalloc.so /BMK/sysbench --num-threads=$1 --test=oltp --oltp-table-size=1000000 \ --oltp-dist-type=uniform --oltp-table-name=sbtest_1M_$n \ --max-requests=0 --max-time=$2 --mysql-host=127.0.0.1 --mysql-port=5700 \ --mysql-user=dim --mysql-password=dim --mysql-db=sysbench \ --mysql-table-engine=INNODB --db-driver=mysql \ --oltp-point-selects=1 --oltp-simple-ranges=0 --oltp-sum-ranges=0 \ --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-skip-trx=on \ --oltp-read-only=on run > /tmp/test_$n.log &
What else to say? Still hard to believe, but seems like we're entering completely new levels in MySQL Performance...
Kudos MySQL & Intel Teams !!! ;-)
So far, I have much more yet details to share with all other testings already previewed on http://www.mysql.com/why-mysql/benchmarks/ as well, hope to publish them very soon, stay tuned ;-))
NOTE: if you're looking for some live news
- come to reach me during my tomorrow's talk on MySQL Connect @OOW2015 about MySQL Performance ("MySQL Performance: Demystified Tuning and Best Practices", Moscone South—262, 10/27/15, 17:15 - 18:00) [Slides: PDF ]
- or meet all of us during BOF "Meet MySQL Team" or MySQL Community Reception (Tuesday, Oct 27 @ 7:00 PM - 9:00 PM | Jillian's at Metreon) just after my talk ;-))
Monday, 21 September, 2015
I may only admit that #PerconaLive Conference becomes only better and
better year to year. And I'm happy to be there and have an occasion to
tell you all the latest news about MySQL 5.7 Performance as it for
And of course, if we're speaking about benchmarks, I'll not speak only about MySQL 5.7, I'll not miss either Percona Server 5.6 and MariaDB 10.1 ;-) -- hope to publish all these results later, over a time.. I've realized that I'm blogging less this year -- most of the time is going today to dig the performance problems we're hitting and looking for real solutions for them. Hope to get both fixed soon ;-) While during the conference, please, don't hesitate to ask questions and report your problems -- many persons from #MySQLTeam will be there, so be sure you'll be listened! And if you will don't miss their talks as well - this way you'll don't miss them for sure ;-)
Well, time to leave a sunny Paris..
And join a sunny Amsterdam ;-)
But who cares about weather ? ;-)
Day #1 is ready to start..
My talk is on Wednesday, just before a lunch. So, please, come hungry (for questions and sharing I mean ;-))
UPDATE : the slides from my talk are now available from here :
Tuesday, 14 April, 2015
MySQL 5.7-RC1 is available now since the last week and you may find a lot about all the new improvement coming from our Team Blog and many other blog posts on this site as well. While in this article I'll just mention that with MySQL 5.7 all scalability limits are going more and more far! -- it's not that we resolved all of them, no, just that we become better and better ;-)
And one of the huge win coming now with MySQL 5.7-RC1 is that for the
first time ever we're getting the same performance levels on a single
hot table as on several tables used in parallel!
Here is the Max QPS result on the Sysbench RO Point-Select workload with 8-tables running on 40cores-HT server :
And here is the same load, but bombarding only one single table :
Now a full Sysbench OLTP_RO workload on 8-tables :
And on 1-table :
The story is pretty fun here, and I'll tell it in its full version (and many other "fun" stories) this afternoon (1:20pm) during my talk at Percona Live'15:
UPDATE: Slides are available from here: http://dimitrik.free.fr/Presentations/MySQL_Perf-Benchmarks-PLive_2015-dim.pdf
What to add.. - MySQL 5.7 really rocks!! kudos MySQL Team! ;-)
Tuesday, 17 February, 2015
There were so many valuable articles already written by others over past
years explaining all details about InnoDB transaction isolation modes
and how to deal with this. So, I'll avoid to repeat what was already
said ;-) -- my attention attracted the performance study made by PeterZ
and published in the following article: http://www.percona.com/blog/2015/01/14/mysql-performance-implications-of-innodb-isolation-modes/
-- the article is very good and providing a good analyze of the observed
problem which is solved by using READ-COMMITTED transaction isolation
instead of REPEATABLE-READ (which is default in InnoDB).. The natural
question is coming then: why don't we have then the READ-COMMITTED mode
by default?.. Is there any danger?..
Let's then investigate together..
First of all, you should keep in your mind not only the theory, but also a way in which all this stuff is implemented within InnoDB :
- transaction isolation / MVCC in InnoDB is implemented via ReadViews
- every time a ReadView is created, a mutex (trx_sys) should be acquired
- in REPEATABLE-READ mode a ReadView is created on transaction start
- in READ-COMMITTED mode a ReadView is created on every statement(!)
- means, if your statements are short -- you may hit a storm on trx_sys mutex contention in your workload..
Of course, all depends on a workload, and if you're lucky, you will probably see only a benefit here (all is possible, right? - at least in theory)..
Let me show you now some cases where you'll feel yourself much less lucky ;-)
For my test cases I'll use :
- 40cores-HT server
- running OEL 6.5 2.6.32-504 kernel
- extremely fast flash storage (Fusion-io ioMemory)
- each workload is using 64 concurrent users
3 test cases executed on each workload :
- REPEATABLE-READ mode is configured (RR)
- READ-COMMITTED mode is configured (RC)
- READ-UNCOMMITTED mode is configured (RU)
DBT-2 500W Workload (TPC-C) :
- you may already observe here a slightly lower TPS on both 2nd (RC) and 3rd (RU) test cases comparing to the first one (RR)
- the "regression" is not very big, but notable
- let's get a look now on internal lock contentions within InnoDB..
- a jumping contention on trx_sys mutex is very well seen for RC and RU tests
- however it's not yet too big to make a serious damage..
Now, let's move to a heavy Sysbench OLTP_RW -- I've slightly changed the "classic" OLTP_RW here by adding a Read/Write ratio in my tests :
- initially the load is starting with 128:1 ratio (mostly Reads, less Writes)
- then 16:1
- then 4:1
- then 2:1
- and finally 1:1
I'm using this test case also to evaluate the impact of writes in transactions, etc..
Sysbench OLTP_RW 32x10M-tables, rw128/16/4/2/1 :
- here the impact is more then just notable ;-)
- and this is only due trx_sys mutex contention? or something else?..
- oh, indeed, trx_sys is jumping too high now!..
- and could it be even more worse??
let's see ;-)
The next workload has a code name "OLTP_RW-p_sel10" - all reads in this test are replaced by 10 point-selects, that's all, making the load much more aggressive on writes and short and fast on reads :
Sysbench OLTP_RW-p_sel10 32x10M-tables, rw128/16/4/2/1 :
- indeed, seeing a x2 time worse performance is really killing..
- and still due trx_sys mutex??
no comments ;-))
Well, you may still say that it's just because this server is too big and that's why I'm observing all these contentions, and you'll be not far from the reality -- on smaller machines all these contentions are, of course, lower - but! still "notable" ;-))
The same OLTP_RW-p_sel10, but on 20cores-HT :
(while many x2 CPU Intel machines today are having more than 32cores-HT in total, so a "small" HW becomes a big one ;-))
INSTEAD OF SUMMARY :
- so, what should we finally conclude from all this presented stuff???..
- PeterZ told us a so nice story, and now you're coming with your b*** and showing that PeterZ was wrong...
- Stop, guys, PeterZ was not wrong!!! ;-)
- ?? -- so, you're lying ??????
- and I'm not lying either ;-))
- well, what you should keep in mind is that there is no a "silver bullet" and the most universal answer in most of the cases will be "it depends" ;-))
- and with InnoDB Transaction Isolation is the same story here!
THE GENERAL RULE could be the following :
- if your queries and transactions are short : use rather the default REPEATABLE-READ mode!
- if your queries are long and reading a lot of data which are likely to be modified by other transactions in parallel : then use the READ-COMMITTED mode - this will allow you to read already committed changes while your query is progressing and avoid to be lost in scanning of the old page images instead (as PeterZ well showed you in his story ;-))
- Repeatable Read Isolation Level in InnoDB - How Consistent Read View Works
- Differences between READ-COMMITTED and REPEATABLE-READ transaction isolation levels
- The basics of the InnoDB undo logging and history system
- If Eventual Consistency Seems Hard, Wait Till You Try MVCC
- SET TRANSACTION Syntax
As usual, any comments are welcome!