« Linux | Main | news »

Tuesday, 17 February, 2015

MySQL Performance : Impact of InnoDB Transaction Isolation Modes in MySQL 5.7

There were so many valuable articles already written by others over past years explaining all details about InnoDB transaction isolation modes and how to deal with this. So, I'll avoid to repeat what was already said ;-) -- my attention attracted the performance study made by PeterZ and published in the following article: http://www.percona.com/blog/2015/01/14/mysql-performance-implications-of-innodb-isolation-modes/ -- the article is very good and providing a good analyze of the observed problem which is solved by using READ-COMMITTED transaction isolation instead of REPEATABLE-READ (which is default in InnoDB).. The natural question is coming then: why don't we have then the READ-COMMITTED mode by default?.. Is there any danger?..

Let's then investigate together..

First of all, you should keep in your mind not only the theory, but also a way in which all this stuff is implemented within InnoDB :

  • transaction isolation / MVCC in InnoDB is implemented via ReadViews
  • every time a ReadView is created, a mutex (trx_sys) should be acquired
  • in REPEATABLE-READ mode a ReadView is created on transaction start
  • in READ-COMMITTED mode a ReadView is created on every statement(!)
  • means, if your statements are short -- you may hit a storm on trx_sys mutex contention in your workload..

Of course, all depends on a workload, and if you're lucky, you will probably see only a benefit here (all is possible, right? - at least in theory)..

Let me show you now some cases where you'll feel yourself much less lucky ;-)

For my test cases I'll use :

  • 40cores-HT server
  • running OEL 6.5 2.6.32-504 kernel
  • extremely fast flash storage (Fusion-io ioMemory)
  • each workload is using 64 concurrent users
  • 3 test cases executed on each workload :
    • REPEATABLE-READ mode is configured (RR)
    • READ-COMMITTED mode is configured (RC)
    • READ-UNCOMMITTED mode is configured (RU)

DBT-2 500W Workload (TPC-C) :

Observations :
  • you may already observe here a slightly lower TPS on both 2nd (RC) and 3rd (RU) test cases comparing to the first one (RR)
  • the "regression" is not very big, but notable
  • let's get a look now on internal lock contentions within InnoDB..
Lock contentions :

Observations :
  • a jumping contention on trx_sys mutex is very well seen for RC and RU tests
  • however it's not yet too big to make a serious damage..

Now, let's move to a heavy Sysbench OLTP_RW -- I've slightly changed the "classic" OLTP_RW here by adding a Read/Write ratio in my tests :
  • initially the load is starting with 128:1 ratio (mostly Reads, less Writes)
  • then 16:1
  • then 4:1
  • then 2:1
  • and finally 1:1

I'm using this test case also to evaluate the impact of writes in transactions, etc..

So far:

Sysbench OLTP_RW 32x10M-tables, rw128/16/4/2/1 :

Observations :
  • here the impact is more then just notable ;-)
  • and this is only due trx_sys mutex contention? or something else?..
Lock contentions :

Observations :
  • oh, indeed, trx_sys is jumping too high now!..
  • and could it be even more worse??

let's see ;-)

The next workload has a code name "OLTP_RW-p_sel10" - all reads in this test are replaced by 10 point-selects, that's all, making the load much more aggressive on writes and short and fast on reads :

Sysbench OLTP_RW-p_sel10 32x10M-tables, rw128/16/4/2/1 :

Observations :
  • indeed, seeing a x2 time worse performance is really killing..
  • and still due trx_sys mutex??
Lock contentions :

no comments ;-))

Well, you may still say that it's just because this server is too big and that's why I'm observing all these contentions, and you'll be not far from the reality -- on smaller machines all these contentions are, of course, lower - but! still "notable" ;-))

The same OLTP_RW-p_sel10, but on 20cores-HT :

(while many x2 CPU Intel machines today are having more than 32cores-HT in total, so a "small" HW becomes a big one ;-))

  • so, what should we finally conclude from all this presented stuff???..
  • PeterZ told us a so nice story, and now you're coming with your b*** and showing that PeterZ was wrong...
  • Stop, guys, PeterZ was not wrong!!! ;-)
  • ?? -- so, you're lying ??????
  • and I'm not lying either ;-))
  • ???.....
  • well, what you should keep in mind is that there is no a "silver bullet" and the most universal answer in most of the cases will be "it depends" ;-))
  • and with InnoDB Transaction Isolation is the same story here!
  • THE GENERAL RULE could be the following :
    • if your queries and transactions are short : use rather the default REPEATABLE-READ mode!
    • if your queries are long and reading a lot of data which are likely to be modified by other transactions in parallel : then use the READ-COMMITTED mode - this will allow you to read already committed changes while your query is progressing and avoid to be lost in scanning of the old page images instead (as PeterZ well showed you in his story ;-))
useful links for more info :
As usual, any comments are welcome!

Posted by Dimitri at 23:50 - Comments...
Categories: MySQL

Monday, 17 November, 2014

MySQL Performance: 5.7 and RDS Aurora, so what?.. ;-)

It was very fun for me to read last week the announces about RDS Aurora - specially the parts related to its performance gain over MySQL: Aurora is claimed to show x5 times better performance than MySQL! However, without publishing any details about ;-) -- and the only details I was able to find until now and group together are the following:

  • Aurora is a proprietary closed source database engine, "compatible" with MySQL (so, not an improved MySQL fork, as many expected..)
  • Sysbench workloads were used to evaluate Aurora performance
  • the only currently published Sysbench results are the following:
    • 500K SELECT/sec
    • 100K UPDATE/sec
  • again, no details about any of the tests..
  • in some articles instead of UPDATE/sec performance was mentioned INSERT/sec, but keeping in mind that there is simply no pure INSERT test available by default within Sysbench workloads, I may only suppose the mentioned 100K writes/sec are corresponding to the UPDATE/sec performance..

Fine then. But, so what?..

Let's see now where we are with MySQL Performance for today?.. ;-))

Yet more than a year ago we already announced 500K SELECT/sec with MySQL 5.7 on Sysbench (and you may read a full story about, if you got it missed). And this year during MySQL Central @OpenWorld we already presented 645K SELECT/sec with the latest MySQL 5.7! - here are the graphs taken from a live load just this morning :
  • the first graphs is showing the SELECT/sec level
  • the second graph - the corresponding amount of concurrent users (sessions) starting with 8, then 16, 32, 64, 128, 256 users

Sysbench SELECT Performance @MySQL 5.7 :

This is a single MySQL instance running on a single HW server. And as you see, we're reaching 645K SELECT/sec since 128 concurrent users. Is the 645K QPS level is the max limit for this server?.. - of course not, it should be possible to do yet more better here by improving our SQL layer, because on the same HW we're able to reach 1M QPS via InnoDB Memcached plugin (where all the data are going the same way directly from InnoDB, but bypassing all the SQL layers).. Work in progress ;-))

Then, what about UPDATE/sec performance?..

Again, a single HW server, single MySQL 5.7 instance, few simple SSD disks (even not a super fast flash storage as we have from Fusion-io and LSI in our LAB) :
  • the first graphs is showing the UPDATE/sec level
  • the second graph - the corresponding amount of concurrent users (sessions) starting with 8, then 16, 32, 64, 128, 256 users

Sysbench UPDATE Performance @MySQL 5.7 :

As you can see, we're doing even slightly more than 100K UPDATE/sec here ;-)

And you may ask then: Why MySQL Team @Oracle did not publish such a great result on UPDATE performance?.. - Well, just because there is nothing to be proud of for the moment.. ;-)) Of course, if you'll run the same workload on HDD storage you'll get much more worse results (for ex. on this server with HDD I have only 2K UPDATE/sec).. - so, for sure, SSD is much more efficient for random I/O writes. However, we're not doing better when using a much more faster flash storage than SSD, and this is our main headache today for all IO-bound workloads ;-) -- the limit today is still in the MySQL/InnoDB code itself.. - and we know where, and we're working hard to get it fixed.. And once this stuff will be improved, then we'll have something to be proud of, and then you'll hear about us for sure ;-))

Now, looking back to RDS Aurora results on Sysbench.. - so, what?.. MySQL 5.7 is already doing better! ;-)

Well, still looking for claimed x5 times performance difference..

(NOTE: if you have any troubles to reach over 500K SELECT/sec or 100K UPDATE/sec on MySQL 5.7, I have a webinar right this Wednesday to tell you all about MySQL/InnoDB internals and related tuning, don't hesitate to join: http://www.mysql.com/news-and-events/web-seminars/tuning-and-best-practices-for-developers-and-dbas/)

Posted by Dimitri at 16:40 - Comments...
Categories: MySQL

Wednesday, 29 October, 2014

Speaking about MySQL 5.7 Performance @Percona Live London 2014

Percona Live London 2014 is starting next week and I'm happy to speak there about MySQL 5.7 Performance, Scalability & Benchmarks latest news. Indeed, MySQL 5.7 has a lot of positive changes and impressive improvements. And, looking on MySQL 5.7 Performance, there are many cases where the gap comparing to MySQL 5.6 today is simply amazing! ;-) -- but, of course, there are still some pending issues which are requiring yet more work to reach the expected speed-up.. So, I'll tell you all about the most hot stories around, while I'm also always curious about your issues you're hitting on your production systems. Please, don't hesitate to share them - this is helping us to make MySQL yet more better! ;-)

UPDATE - my slides are here : http://dimitrik.free.fr/Presentations/MySQL_Perf-Benchmarks-PLUK_2014-dim-key.pdf 


Posted by Dimitri at 23:40 - Comments...
Categories: MySQL, news, secondary

Wednesday, 08 October, 2014

Indeed, MySQL 5.7 rocks : OLTP_RO/RW 1-table Benchmarks

This is the next part of the stories about MySQL 5.7 Performance..

So far, the previous story was about reaching 645K QPS with SQL queries, while in reality it's only a half of the full story ;-) -- because when last year we've reached 500K QPS due a huge improvement on the TRX-list code, the same improvement made a negative impact on the all single-table test workloads..

What happened finally :

  • the new code changes dramatically lowered contention on TRX-list (trx_sys mutex)
  • which is made MDL related locking much more hot..
  • and if one table becomes hot on a workload, MDL lock contention then is hitting its highest level..

So far, it was clear that MDL is needed a fix. Specially seeing that on 8-tables workload we're reaching 645K QPS. However there was a dilemma: should we push the TRX-list change to the 5.7 code, or wait first for MDL improvement?.. -- and we finally decided to push the changes, even if for some period of time we'll need to accept a regression on all single-table workload..

So, what we got the last year :
  • 500K QPS on 8-tables point-select
  • less than 200K QPS on 1-table.. - which was even worse than in MySQL 5.6 ;-)

It was clear, we have to fix MDL code asap, but the MDL story was not that simple either :
  • Dmitry Lenev made a quick dirty patch just removing completely all MDL related code from the source tree..
  • so, of course, we all expected to see a huge QPS jump after that on this experimental code, right? ;-)
  • however, for our big surprise, QPS just remained on exactly the same level...
  • what is going odd?..
  • in fact instead of the MDL contention we moved to the THR_lock mutex contention!
  • what is interesting that until the MDL code is present, we don't see any sign of THR_lock ;-)
  • and it's only since MDL is no more here -- THR_lock is firing!..
  • finally Dmitry made another dirty patch removing all THR_lock related code too..
  • and bingo! - without MDL & THR_lock code we doubled QPS at once, and it became clear that both contentions should be fixed to bring a speed-up on single-table workloads...

All my kudos here to Dmitry Lenev, who worked very hard to find the most optimal solution to fix both problems. There were several prototypes, less or more successful, until the final solution out-passed all our expectations -- just because it provided us the same QPS level as the binary which has no MDL nor THR_lock code at all!!! - and this is awesome! ;-))

I was happy to follow all this work very closely and play with each new update. Let me show you now just a short "making off" which is representing pretty well a summary of the one step in the progress in this work:

Observations :
  • there are 4 tests running the same OLTP_RO point-select workload on 8/16/32..1024 users :
    • #1 is using the original 5.7 code
    • #2 is using #1 + MDL fix
    • #3 is using #2 + removed THR_lock code
    • #4 is using #1 + removed MDL code + removed THR_lock code

  • as you can see, the original code (#1) is suffering of MDL rwlock contention and not out-passing 200K QPS..
  • the MDL fix in #2 is helping little bit to reach more than 200K QPS, but then the THR_lock mutex contention is killing an overall performance..
  • by removing completely the THR_lock code (#3) we can see the potential level of performance we should have if THR_lock contention was also fixed
  • then in #4 we can see the QPS level on 5.7 if there was no MDL & THR_lock code at all..
  • and what remarkable here that #3 and #4 are showing exactly the same performance!
  • so, the MDL fix worked extremely well to run as efficiently as if MDL code was removed ;-)
  • the next step was to do the same with THR_lock to get the solution simply perfect! ;-)
  • and it's exactly what was done finally in 5.7 and can be seen now in DMR5..

May only say kudos MySQL Runtime Team! kudos Dmitry Lenev!!! ;-))

Let me show you now the final impact of all these changes (and keep in mind that just a year ago the MySQL 5.7 results here were worse on all the following tests comparing to MySQL 5.6)..

Sysbench OLTP_RO 1-table 10M :

Sysbench OLTP_RO Point-Selects 1-table 10M :

Sysbench OLTP_RO Simple-Ranges 1-table 10M :

Sysbench OLTP_RW 1-table 10M trx2 :

Sysbench OLTP_RW 1-table 10M trx1 :

As you can see, the same performance improvement we can see on OLTP_RW workloads as well, where :
  • trx2 : means innodb_flush_log_at_trx_commit=2
  • trx1 : means innodb_flush_log_at_trx_commit=1
More about RW workloads in the next articles (and about "MySQL-5.7-futexV3" too ;-))

to be continued..
Posted by Dimitri at 11:22 - Comments...
Categories: MySQL

Tuesday, 07 October, 2014

MySQL Central @OpenWorld 2014 was simply great!

And I'm not saying this because I'm MySQL/Oracle employee ;-)) - just expressing my personal feeling while flying back to Paris..

I really enjoyed this conference now much more than all previous years :

  • first of all having 5 days instead of 2 allows to have much more networking than before (and usually it's the main goal for every conference to create a lot of informal discussions and exchange on opinions)..

  • then, the content - all excellent, no words.. there was so much new stuff to discover and learn from other experiences, that I'm pretty sure that each person at one moment during the conference should feel a kind of "brain overflow" ;-) and what is good that there were talks for any levels of skills -- so, it was very easy to progress on understanding of each topic quickly and smoothly..

  • organization was just perfect : all in one place, no more running over different floors/buildings, etc. ;-) having 30min breaks between talks is really good as it allows to have an exchange related to presented topics without impacting timing of the next talk / session (while we could probably finish little bit later as well to have yet more talks?)..

  • MySQL Customer Advisory Forum (CAF) - yet bigger and very well organized as usually (while I'm not sure if all customers / users are aware about this great possibility to discuss directly with MySQL engineering, express all your wishes & issues, get the info from the source about what is new, what is coming, what is planned, etc.) -- well, I know that many regretted on the last minute to have missed subscription, so keep in mind this for the next year ;-))

  • MySQL Reception - really good! specially that doors were open to anyone, including persons who are not attending the conference -- a true MySQL Community event!

Few wishes anyway for the next year :
  • would be great to have MySQL posters around the place where conference is going ;-)

  • would be great to have a kind of "closing conference" keynote(s), etc. to make a summary, to thank all attendees & organizers, to have some fun instead just see doors closing and people leaving..

But well, all-in-all - simply great MySQL Conference, which may become only better over a time! so, book your agenda ahead for the next year -- pretty sure it'll be yet more fun ;-)

Bye-Bye, SFO...

Posted by Dimitri at 10:52 - Comments...
Categories: MySQL, news, secondary

My Presentation Slides from MySQL Central @OpenWorld 2014

As promised, here are my presentation slides from my talks on MySQL Central @OpenWord 2014 :

  • MySQL 5.7 Benchmarks [PDF]
  • MySQL Performance : Demystified Tuning & Best Practices [PDF]

Feel free to ping me if you have any questions, while I'll also try to cover some of explained topics within the next blog posts, so stay tuned ;-)


Posted by Dimitri at 10:40 - Comments...
Categories: MySQL, news, secondary

Monday, 29 September, 2014

Indeed, MySQL 5.7 rocks : OLTP_RO Point-Selects 8-tables Benchmark

Indeed, MySQL 5.7 rocks ;-)

This is the part1 of the following blog posts about various benchmark results on MySQL 5.7 - and this particular one is dedicated to the Sysbench OLTP_RO Point-Select 8-tables workload.

We've already published the over 500K QPS on SQL Point-Selects before @32cores-HT server, so may reconfirm it again with MySQL 5.7 DMR5 -vs- other engines :

and now on a similar server, but with 40cores-HT we're able to confirm 645 QPS on the same workload!

If you missed the long story about how we arrived on such a performance level and how to reproduce the test - you may find all here. This workload is the most killing from all Read-Only Sysbench OLTP tests.. And it's really for the first time we started to scale here with MySQL! - look here (all results are from the 40cores-HT server) :

MySQL 5.5 :

MySQL 5.6 :

MySQL 5.7 :

Percona 5.6 :

MariaDB 10.1 :

As you can see, only MySQL 5.7 rocks here ;-)

However, if you remember the full story, this workload is extremely depending on every extra bits and even insignificant overhead.. - personally it's very rare that I saw a RDBMS workload to be so close to HPC problematic ;-)) -- to remind you that to reach 500K QPS (and 645K QPS here) was necessary to use UNIX socket instead of IP socket and an older Sysbench version binary (v0.4.8) which has just little bit less instructions than the v0.4.13 I'm currently use and will be able to get only 575K QPS on this server..

will tell you yet more during my talk on Wednesday, 1/Oct.

to be continued..

Posted by Dimitri at 23:33 - Comments...
Categories: MySQL

Sunday, 28 September, 2014

MySQL Central @ OpenWorld 2014 : Speaking about MySQL Performance

Oracle OpenWorld 2014 is starting today, and MySQL Central Conference since this year is taking a full part of it. I'm very happy to be there and have 2 talks :

so I'll be able to say more this time and mostly focus on the Benchmarks results & analyzes on the first talk, and then mainly focus on the tuning and the latest MySQL 5.7 design improvements to allow you to reach yet more higher & stable performance levels.

I have several "fun" stories to tell you around the latest findings around MySQL 5.7 Performance we made, and problems we've solved (or newly found ;-)) Don't know if I'll have a time to blog about all of this one day.. - while telling you these stories live during the conference is much more easier, so hope to keep you awake and have fun ;-) Indeed, there is no doubt, MySQL 5.7 rocks and it will be the next the best MySQL release ever.. - and I'll tell you why ;-)

Then, once again, I'm regretting I cannot clone myself to attend all the conference talks in parallel.. - there are so many valuable presentations in the program that it's really hard to decide where to go.. I may only advice you to attend all of them (you and your clones ;-)) -- this excellent Agenda Summary is telling all..

And don't miss the MySQL Community Reception @OOW 30/Sep. 7PM-9PM (you need just to register yourself via this link to help organizers with logistics ;-))

see you there!

Posted by Dimitri at 23:50 - Comments...
Categories: MySQL, news, secondary

Friday, 19 September, 2014

MySQL Performance: More in depth with LinkBench Workload on MySQL 5.7, MariaDB 10.1 and also Percona Server 5.6

This is the next chapter of the story started in my previous article and related to the updated results on LinkBench workload published by MariaDB..

Keeping in mind that the obtained results are completely opposite from both sides, I've started to investigate then the same LinkBench-150GB 64 concurrent users workload from the "most favorable" possible test conditions on the same 40cores-HT server as in my previous article:

  • InnoDB Buffer Pool (BP) = 150G (so, all the data may remain in memory)
  • innodb_flush_log_at_trx_commit=2 (so, REDO write are not flushed once per second and not on every transaction)
  • no checksum, no double-write, no compression, no atomic IO, etc..
  • just a base-to-base comparison..

And the probe result was the following, with a still better TPS on MySQL 5.7 (left) than on MariaDB 10.1 (right) :

NOTE: I'm obtaining my TPS numbers from reported by MySQL server its Commit/sec rate. I did not use for that the results from the LinkBench log file.. From what I can read in the log file, my TPS x 3 is less or more equal to the level of operations/sec reported by LinkBench, but I'm not looking for the moment for the best result in absolute, but rather a difference in performance between MySQL and MariaDB engines..

But well, my surprises started then come one ofter one ;-)
  • first, discussing details with Fusion-io Team, I've discovered that my flash card is not supporting atomic IO ;-)
  • so, all the previous tests with atomic IO on MariaDB you may ignore (and indeed, this explains also why the results are exactly the same with and without atomic IO ;-))
  • then I discovered that 32cores reported in MariaDB config testing are in fact 16cores-HT (16cores with hyper-threading, 2 threads per core).. - and this could play a significant role as having less CPU cores may lower an overall concurrency during processing, and as the result - a lower lock contention.. - but is it so? - have to check..
  • but the most killing one was the last one: from Jan's comments I understood that MariaDB 10.1 is using XtraDB by default, and even this version of XtraDB code was modified to have some features from MariaDB's own changes on InnoDB, and InnoDB engine should be used than as plugin (while I was pretty sure I'm using their InnoDB code as all their new options were accepted)..

So far, I've then replayed the same test again, but within the following conditions:
  • MySQL server is bound to 20cores-HT only
  • MariaDB is using its own InnoDB plugin
  • EXT4 is used as filesystem to avoid any other potential surprises ;-)
  • 1h or 1.5h test duration to avoid doubts about dramatically dropping TPS levels..

The result was very surprising, and I've finally added also Percona Server 5.6 into the game to clarify the things ;-)

And here is what I've observed, from left to right :
  • MySQL 5.7
  • MariaDB 10.1 with XtraDB (default)
  • MariaDB 10.1 with InnoDB (plugin)
  • Percona Server 5.6

Observations :
  • indeed, MariaDB with its InnoDB plugin is performing better then with its XtraDB which is used by default..
  • on the same time Percona Server 5.6 is preforming way better than MariaDB with its XtraDB..
  • so, I may just suspect that there are some problems in MariaDB with XtraDB integrations..
  • while Percona Server is doing better than MariaDB with any engine
  • and MySQL 5.7 is doing better than anyone ;-)

What about lock contentions?

The picture is completely different here too:

Observations :
  • for my big surprise, the index lock contention is going down on both MariaDB and Percona Server
  • log_sys mutex contention is dominating for all..

So far, let's see if now running on full 40cores we have a better performance?

From left to right, the same test is executed on 20cores-HT and then on 40cores-HT on:
  • MySQL 5.7
  • MariaDB 10.1 with InnoDB (plugin)
  • Percona Server 5.6

Observations :
  • surprisingly a very small gain on MySQL 5.7 on move from 20 to 40cores..
  • near no gain at all on MariaDB..
  • and yet more surprising - a regression on Percona Server..
  • indeed, there is something yet more to investigate as there is no reason to not win at least 50% in TPS on 40cores with 64 concurrent users..
  • NOTE: I'm not blaming LinkBench here, as it's very possible that the show-stopper is inside of the database code rather in the test code itself..

And hope an analyze of the lock contentions will say us little bit more:

Observations :
  • as you can see, the log_sys contention is growing on the move from 20 to 40cores on all engines
  • while Percona Server is also hitting a growing lock_sys time..

But if this is only about the locking, then probably we can improve the thing little bit by tuning the InnoDB spin wait delay?

The same test on MySQL 5.7, but with progressively increased spin wait delay :
  • innodb_spin_wait_delay=6 (default)
  • innodb_spin_wait_delay= 12, 24, 48, 96, 192, ...

Observations :
  • the best TPS level is reached with spin delay=48 : 35K TPS instead of the initial 30K ;-)
  • on 96 TPS is less or more the same as with 48, but on 192 performance is going down..

And it's well seen on the graph with contentions :

Now, let's replay the same test on the same 40cores-HT config, but with applied innodb_spin_wait_delay=48 on all engines and adding one more for REDO log flushing:
  • the first test for each engine with innodb_flush_log_at_trx_commit=2 (trx2 : REDO fsync once per second)
  • and the second one with innodb_flush_log_at_trx_commit=1 (trx1 : REDO fsync on every transaction)

Observations :
  • there are several surprises here..
  • MySQL 5.7 : 35K TPS on trx2, but only 24.5K TPS on trx1..
  • MariaDB 10.1 : 28K TPS on trx2, and 24.5K TPS on trx1..
  • Percona 5.6 : only 27K TPS on trx2, but 28K TPS on trx1! ;-))
  • well, if on MariaDB the gap between trx2 and trx1 results is as expected (redo writes fsync() on every transaction with trx1 cannot be free)..
  • while on MySQL 5.7 it looks rather as regression (should be fixed)
  • and on Percona Server there is rather a regression on trx2 ;-)

Then, to understand why all this happens you have just to look on the corresponding mutex contentions graph reported by PFS:

However, the most surprising for me here the drop on MySQL 5.7 from 35K TPS to 24.5K TPS just because we switched from innodb_flush_log_at_trx_commit=2 to innodb_flush_log_at_trx_commit=1.. - I'd not expect such a big regression just because of REDO fsync, specially when we're using an extremely fast flash storage.. And indeed, the story will likely change for 5.7 very quickly -- the following graph representing the result on exactly the same workload, but on the next-step-dev 5.7 version (after DMR5), get a look on this:

While the result with innodb_flush_log_at_trx_commit=2 is slightly lower than 35K TPS (need some more tuning/testing to find an optimal internals setting), but with innodb_flush_log_at_trx_commit=1 the result is even little bit bigger than 35K TPS!!!

All this means that in MySQL 5.7 we may be potentially not too far from having near the same level of performance for innodb_flush_log_at_trx_commit=1 setting as for innodb_flush_log_at_trx_commit=2 whenever it's possible.. - stay tuned! ;-)

Ok, now what will happen if the Buffer Pool size will be reduced to 75GB ? (instead of 150GB)...

The same test but with BP size=75G :

Observations :
  • there is not too much difference anymore between trx2 and trx1 on all engines..
  • a huge wait time part now is attributed to the IO reads..
  • all engines are starting from the less or more the same TPS level on the beginning (except MySQL 5.7 which is starting from a higher TPS)
  • but then over a time starting to decrease...
  • MySQL 5.7 is loosing less than any others, and remains the best from all engines (and if one day I'll have a patience to run such a test for 24h I'll do to see the end of the graph ;-))
However, it's easy to understand why MariaDB and Percona Server are regressing over a time.. -- if you'll look on the lock wait graphs you'll see a constantly increasing wait time on the index lock.. - but no more for MySQL 5.7 ;-)

Any difference to expect if only 20cores will be user for database server?

The same test, but on 20cores-HT :

On 20cores-HT the overall result just little bit slower, but very similar to 40cores-HT..

And contentions levels are looking very similar as well, including growing over a time the index lock waits on MariaDB and Percona Server:

Instead of SUMMARY :
  • indeed, the plugin InnoDB in MariaDB 10.1 seems to run better than their default XtraDB..
  • however, it looks like there is just something wrong with XtraDB integration in MariaDB, because Percona Server itself is showing the same or a better result than MariaDB running with its own InnoDB plugin..
  • as well I did not see any single point where MariaDB could be better on base-to-base conditions comparing to MySQL 5.7
  • further investigation will come later once I'll have in my server the Fusion-io flash card supporting atomic IO and all the latest NVMFS stuff..
  • more code changes in MySQL 5.7 are in pipe to better analyze these strange scalability issues and more higher and bigger database workloads.. - note: we're yet far here from reaching even a half of potential performance of this flash storage, so there is definitively a lot of things to improve ;-)

So, stay tuned ;-)

Additional NOTES : (as you're asking)

>> HOWTO monitor Commit/sec in MySQL ?
  • you're getting on a regular interval (say every 10sec.) the value of the "Com_commit", for ex. :
  • mysql> show global status like 'Com_commit';
  • then do a subtraction from the current value the previous one, and divide it by your timeout interval (say 10sec)
  • that's all ;-)

>> HOWTO monitor mutex time waits in MySQL ?
  • you're starting MySQL server with PFS enabled and wait events instrumentation ON:
  • performance_schema=ON
  • performance_schema_instrument='%synch%=on'
  • then on a regular time interval executing the following SQL query, and getting a diff from the current values vs the previous ones and divide them by your time interval
  • SQL query:
mysql> select EVENT_NAME, max(SUM_TIMER_WAIT)/1000000000000 as WaitTM 
        from events_waits_summary_by_instance group by 1 order by 2 desc limit 25;
| EVENT_NAME                                | WaitTM   |
| wait/io/file/innodb/innodb_data_file      | 791.3204 |
| wait/synch/mutex/innodb/fil_system_mutex  |  25.8183 |
| wait/synch/rwlock/innodb/btr_search_latch |   5.2865 |
| wait/io/file/innodb/innodb_log_file       |   4.6977 |
| wait/synch/rwlock/sql/LOCK_grant          |   4.4940 |
25 rows in set (0.06 sec)

As well you can use MySQL Enterprise Monitor (MEM) or use my tools from my site (dim_STAT @ http://dimitrik.free.fr) to get all this and other stuff ready for action..
Posted by Dimitri at 23:53 - Comments...
Categories: MySQL

Friday, 15 August, 2014

MySQL Performance: Analyzing LinkBench Workload on MySQL 5.7 and MariaDB 10.1

Reading the article published on MariaDB site about performance evaluation of MySQL 5.7 -vs- MariaDB 10.1 I've got a double feeling: from one side I have no any reason to have a doubt in obtained results, and from another side - my feeling says me there is something going odd.. But well, I don't have any attention to blame anybody (and sorry if you think so) -- my willing here is only to understand what is going on, why, and what are the challenges on the observed test workload(s). I'll cover here only my observations on the LinkBench workload as it was not yet enough tested and involving many questions around..

LinkBench Benchmark :

  • developed by Facebook Team, so represents a true real life workload
  • written in Java
  • from the beginning since it creation was reported as very sensible to the index lock contention in InnoDB
  • for this reason was modified to have partitions in the most hot table (to split a single index contention)
  • this helped to improve performance, but index lock contention remained limiting..
  • the issue with InnoDB index lock was fixed only in MySQL 5.7 giving a significant performance boost in LinkBench workload

What is looking odd for me in MariaDB's test results :
  • indeed, having compression feature within any database engine is very valuable
  • however, compression by itself cannot improve an overall performance (except if you have some limitation on other levels, like slow storage, etc.)..
  • and on extremely fast Fusion-IO flash storage there is no doubt that performance result will be better when compression is not used, and it's exactly what we see in the presented results..
  • but then it's hard to believe for me that without using any compression MySQL 5.7 which is having the fix for InnoDB index lock contention is running slower than MariaDB 10.1 which is based on InnoDB from MySQL 5.6 and not having such a fix..

Well, I may be wrong as well, but the following is what I'm observing on LinkBench workload on my system.

Configuration :
  • Linux box with 40cores-HT 2.3Ghz running Oracle Linux 6.5
  • Fusion-IO flash storage using NVMFS filesystem
  • LinkBench database of 150G

Tuning :
  • let's start with a Buffer Pool (BP) = 50% of database size to follow MariaDB test conditions (so, 75G in my case)
  • I'm usually using 32 BP instances
  • as BP will not be able to keep the whole data set, we may expect constant IO reads
  • so a high activity on LRU to expect + LRU flushing
  • means LRU depth should be tuned well, I'll set it to 4000 (not too big, not too small, as it's per BP instance, this will give me 32 x 4000 free pages amount to expect which should cover a potential page IO read speed from Fusion-IO)
  • REDO log size is also important, as a bigger REDO is allowing to delay page flushing and keep flushing/checkpoint activity more smooth -- 12GB REDO here seems to be enough
  • adaptive hash index (AHI) -- in most RW workloads it's better to have disabled (and even in some RO), but curiously here has no impact at all..
  • using AIO + O_DIRECT is the must
  • all other setting is just following general "best practices" for most of workloads..
  • Performance Schema (PFS) is enabled + mutex instrumentation too

TEST #1 : LinkBench-150G 64users with 75GB Buffer Pool

There are 5 test cases executed in the following order:
  • #1 - MySQL 5.7 no doublewrite (innodb_flush_log_at_trx_commit=1, checksums crc32, innodb_doublewrite=0)
  • #2 - MySQL 5.7 secure ((innodb_flush_log_at_trx_commit=1, with checksums crc32, innodb_doublewrite=1)
  • #3 - MariaDB 10.1 no doublewrite (same options as MySQL 5.7 in #1)
  • #4 - MariaDB 10.1 secure
  • #5 - MariaDB 10.1 with atomic writes (without doublewrite, but atomic writes instead)

Each test case was running during 30min (if was several hours initially, but then I've decided to reduce the test duration time --because once the load became stable on the beginning, then not too much is happening later, and I don't have unlimited time ahead ;-))

Observations :
  • without doublewrite MySQL 5.7 is near x2 times faster than MariaDB 10.1 (20K TPS -vs- 12K TPS)
  • within "secure" configuration MySQL 5.7 is loosing 25% in performance (20K -> 15K TPS)
  • interesting that this workload is pretty sensible to REDO flushing (sync)
  • but the main overhead is coming from a doublewrite anyway..
  • however, as you can see, it's still better than MariaDB
  • interesting to see that atomic writes in MariaDB are not lowering too much TPS
  • so, once atomic writes will be available in MySQL 5.7 I may expect the same 20K TPS in secure config here
  • NOTE: to avoid doublewrite overhead you may also use O_DSYNC flush setting combined with EXT4 data journaling as it was recently demonstrated by Percona

So far, why MariaDB is getting a lower TPS than MySQL 5.7 ?..

Let's get a look on what Performance Schema is reporting:

Observations :
  • MariaDB 10.1 is hitting a severe contention on the index lock
  • while MySQL 5.7 is not having it anymore
  • and that's why MariaDB is running slower on this workload..

Let's check there is nothing abnormal going with page flushing :

Observations :
  • Checkpoint Age did not reach any critical level (max 7GB from 12GB in REDO was used)
  • no LRU single page flush either, so the amount of free pages was always enough..

What is the page IO read rate in this workload?

Observations :
  • MySQL 5.7 is requiring 20K / 18K free pages/sec
  • while MariaDB 10.1 is only 10K..
  • the excessive page scanning seen in MariaDB is fixed in MySQL 5.7 (still remains in MySQL 5.6, but in fact not impacting a final performance)

Well, all observed is confirming my initial worry.

Now, let's see how both engines will perform within "expected" test conditions (Facebook Team is mentioning in LinkBench notes that the data set should be bigger at least x10 times than a Buffer Pool size to reproduce their environment).. -- and for sure, having only 16GB Buffer Pool within the same workload will involve much more page IO reads, where storage IO level will play a huge role.

TEST #2 : LinkBench-150G 64users with 16GB Buffer Pool

Observations :
  • without a doublewrite MySQL 5.7 is out passing MariaDB 10.1 by x2.5 times (10K TPS -vs- 4K TPS)..
  • in a "secure" mode MySQL 5.7 is still x1.5 times better (over 6K TPS -vs- 4K TPS)
  • NOTE: I've also replayed the MariaDB test with using its multithreaded flushing feature (innodb_mtflush_threads=16, innodb_use_mtflush=1) -- this is helping to reach 4500-5000 TPS, but still not enough to attend the MySQL 5.7 level..

What about internal contentions?

Observations :
  • MySQL 5.7 is blocked mainly by fil_system mutex and log_sys mutex contentions
  • fils_system mutex contention is killing on all IO-bound workloads (and that's why testing Read-Only workloads is important too to better understand the problems on Read+Write ;-))
  • bunch of other lock contentions is blocking MariaDB here..

Any flushing related issues?

Observations :
  • Checkpoint Age did not reach even 2GB
  • no LRU single page flush either
  • all ok..

What about page IO read rate?

Observations :
  • free page rate is much higher, but not enormous (I've already observed 70K-80K pages/sec on this server with Fusion-IO on IO-bound workloads)
  • page rate on MariaDB is lower than on MySQL 5.7

Well, yet another reason why MySQL 5.7 just rocks and will be the next the best ever MySQL release :-)

I'm not pretending either the config setting I'm using and workload conditions are the most optimal here. I'm just sharing what I'm observing and curious for any input/idea about how this workload performance could be improved..

Few other notes:
  • Compression : indeed, compression on IO level is much more simple an optimal for any application (not only databases), and more and more will come over a time for sure with a goal to reach as max as possible a performance level of "uncompressed" workload..
  • LinkBench : seems to me needs some more love and more testing.. - would be happy to discuss with developers and any other observations related to this workload. It give me an impression to be "self limited" as with 64 concurrent users CPU is not really used, looks like there are some internal waits are happening during processing. I've also suspected garbage collection time on JVM level, but monitoring JVM stats showed that it's not so.. Open to any discussion ;-)

MySQL configuration setting I've used :

# myisam
  ft_max_word_len = 16

# general
  table_open_cache = 8000

# files

# buffers
  innodb_buffer_pool_size= 75000M / 16000M

# tune
  innodb_doublewrite= 0 / 1 

  innodb_stats_persistent = 1

# perf special
  innodb_adaptive_flushing = 1
  innodb_flush_neighbors = 0
  innodb_read_io_threads = 16
  innodb_write_io_threads = 16

# Monitoring
  innodb_monitor_enable = '%'

# MariaDB specific:
#  innodb_mtflush_threads=16
#  innodb_use_mtflush=1
# with atomic writes :

Any comments are welcome! ;-)

Posted by Dimitri at 17:44 - Comments...
Categories: MySQL