« Linux | Main | Oracle »

Friday, 13 April, 2012

MySQL Performance: 5.5 and 5.6-labs @TPCC-like

It's now over a year when there was a quite interesting exchange with Vadim and other Percona guys due their TPCC-like benchmark results on MySQL 5.5 - I've replayed on that time the same tests, but with a better tuned configuration parameters to demonstrate that with a little bit more love, MySQL 5.5 may keep TPCC-like workload very well ;-)) However, it was clear that Adaptive Flushing in 5.5 needs some improvement (which is now implemented in 5.6), but there were also several open questions regarding I/O level and O_DIRECT performance which I was able to clarify for me only during last year IO testing on a decent Linux box + storage..

Summarizing all this stuff, you may understand my curiosity to see how well (or not) MySQL 5.6-labs is running today on Percona's TPCC-like workload ;-))

But let's start from the system setup first:

  • so, I'm using a 32cores bi-thread Intel box, 128GB RAM
  • running Oracle Linux 6.2
  • storage: x3 SDD in RAID0
  • filesystem: XFS mounted with "noatime,nodiratime,nobarrier,logbufs=8" options

Same TPCC-like workload as before:
  • 500W data volume
  • 32 concurrent users (initially it was only 16 on the previous tests, but I think it'll be pity to run only 16 users when you have 32cores Linux box ;-))


Then, my first test will be with MySQL 5.5 - even it's GA since more than one year and no any major changes allowed to the code, it was still improved over a time and will give us a good idea about its state for today.. From the previous testing I've retained that to help Adaptive Flushing in 5.5 the "innodb_io_capacity" value should be set to a something huge -- as the algorithm in 5.5 is not following very well the REDO activity, we should avoid to limit it in IO capacity (and when it'll decide to do a big I/O request we should let it happen ;-))

MySQL 5.5 configuration:

[mysqld]

max_connections=4000
table_open_cache = 8000
back_log=1500
query_cache_type=0

# files
innodb_file_per_table
innodb_log_file_size=1024M
innodb_log_files_in_group=3

# buffers
innodb_buffer_pool_size=64000M
innodb_buffer_pool_instances=16
innodb_log_buffer_size=64M

# tune
innodb_checksums=0
innodb_doublewrite=0
innodb_support_xa=0
innodb_thread_concurrency=0
innodb_flush_log_at_trx_commit=2
innodb_flush_method= O_DIRECT
innodb_max_dirty_pages_pct=50

# perf special
innodb_adaptive_flushing=1
innodb_read_io_threads = 16
innodb_write_io_threads = 16
innodb_io_capacity = 20000
innodb_purge_threads=1


Note that I did not reduce this time the dirty pages percentage to enforce the dirty pages flushing - I'm expecting here that my I/O level will be fast enough to keep the flush activity aligned with REDO logs.. Let's see what is the result now ;-)


MySQL 5.5 results:
Observations :
  • there are some initial QPS drops due critical levels in Checkpoint Age
  • but over a time performance become more and more stable, reaching something not far from 110K QPS..
Also, I did not write this blog post intentionally for 5.5 (my main goal was to test 5.6-labs), but I'm surprised to not see here huge drops and frozen activity for long amount of time observed and reported by guys from Percona and TokuDB.. - sure there are also QPS drops on my graphs too, but it's nothing comparable to what was reported by others.. Maybe with a little bit more love you may also reach a better and more stable results?.. ;-)

Well, lest move to MySQL 5.6.4 now - it was the last 5.6 release just before 5.6-labs, so Adaptive Flushing was still very similar to what we have in 5.5 (even if some improvement, like page_cleaner thread, were introduced).. So, nothing different within 5.6.4 configuration setup (same 20K IO capacity), except monitoring via METRICS table is enabled.

MySQL 5.6.4 results :

Observations :
  • QPS performance is reaching 160K QPS over a time
  • there are more activity drops comparing to 5.5..
  • all QPS drops are related to reaching critical levels in Checkpoint Age.. - so it was a time to improve AF here ;-))
  • we're reaching near 8000 pages/sec in flushing - good to know to adapt the Max IO capacity in 5.6-labs ;-)


MySQL 5.6-labs configuration:
  • same as 5.5, except the following:
  • innodb_io_capacity = 4000
  • innodb_max_io_capacity = 12000


MySQL 5.6-labs results :

Observations
  • seems we're unable to flush more than 8000 dirty pages/sec here.. (interesting why?..)
  • so, nothing surprising that Checkpoint Age is remaining in critical level most of the time..
  • however, there are near no QPS drops at all on the "good" part of graph (up to 11:25 ;-))
  • we're reaching 170K QPS now !..
  • but after 11:25 there is a huge performance drop.. - why?..


Let's look more in depth:
  • the main current InnoDB bottleneck on TPCC-like workload is index lock contention..
  • but since 11:25 there is a log_sys mutex contention which is jumping up too!..
  • the only reason for log_sys to come in such a configuration is to get REDO log files out of filesystem cache (then on most of REDO I/O writes there will be read-on-write involved, because the REDO I/O operation is not aligned to the filesystem block size..) - this will be fixed soon in 5.6 too ;-)
  • however, since I'm using O_DIRECT, and only MySQL is running on my server, what other I/O activity may remove REDO log files from the filesystem cache?...


The answer:
  • from the graph above you may see that "mysqld" process is reaching 128GB memory usage by the end of test..
  • of course, when there is no more memory on the system, there is no more place for REDO logs in the filesystem cache either ;-))
  • comparing with 5.6.4, seems there was a memory leak bug introduced in 5.6 trunk recently ;-))
  • so, I'm happy to find it, but keep it in mind for the moment when trying 5.6-labs on Read+Write workloads - I think it'll be fixed soon..


Well, I'm pretty happy with what I saw.. (except the memory leak bug of course ;-)) -- but I see now the potential we're having under 5.6-labs.. And with all other further improvement which are currently in pipe, MySQL 5.6 should be again yet another the best ever MySQL release after 5.5 !..

Stay tuned.. ;-)

Rgds,
-Dimitri
Posted by Dimitri at 12:39 - [] Comments...
Categories: MySQL

MySQL Performance: Improved Adaptive Flushing in 5.6-labs

I'd start this blog post with "I've had a dream".. ;-))

Seriously, it was my dream from a long time that "official" MySQL code comes with a more improved Adaptive Flushing solution in InnoDB.. - and now, with 5.6-labs release, this dream becomes a reality ;-))

The story with Adaptive Flushing is so long.. Started by Yasufumi after his brilliant analyze of the flushing problem in InnoDB, then integrated in XtraDB as Adaptive Checkpoint, then came in alternative way as Adaptive Flushing in "official" InnoDB code, then yet more improved within Percona Server, etc. etc. etc.. (my last post about was one year ago, and it was already a call to action rather a general discussion about performance improvements ;-))

However, it took a time to get similar improvement within an "official" InnoDB code, but not because people are lazy or not open to changes, but just because the already implemented solution within InnoDB was simply "good enough" in many cases ;-)) specially with appearing of the page_cleaner thread in 5.6.. The only weak point it has is that it's unable to keep load spikes (but in this point no one of existing solutions is good, even in XtraDB), and it may take a long time before to "adapt" to a given workload..

Let's get a look on the following graph:

Explanations:

  • this a light dbSTRESS RW workload with 32 concurrent user sessions
  • MySQL 5.5 and 5.6 are both having a similar drops in TPS level (left part of graph)
  • these drops are due Checkpoint Age reaching max REDO log space
  • so urgent burst flushing (furious flushing) is involved to create a room of free space within REDO, otherwise the whole transactional processing is remaining frozen..
  • however, over a time such drops are less and less important, so some users may probably even live with it..
  • but on the right part of the graph you may see 5.6-labs solution in action, and there is no more any drops at all (except the initial one which is related to adaptive hash index initialization)
  • as well, 5.6-labs is keeping a quite significant marge in Checkpoint Age in case there will some spikes happen in the load with increased activity (where 5.5 will almost have TPS drops every time..)

To be sure we're not reinventing the wheel, we even ported most interesting solutions to the 5.6 prototype trunk, and I've tested all of them on the more hard RW workload:

From the left to right:
  • 5.6-native
  • 5.6 + "estimate" from Percona
  • 5.6 + "keep avg" from Percona
  • 5.6-labs
  • Percona 5.5 (this one I've tested just to be sure nothing was wrong with a logic in a ported code)

As you may see directly from these graphs that our solution (implemented now in 5.6-labs) looked more attractive from the beginning.. And I'm very happy to be able to contribute to it, as several of my previously expressed ideas were accepted to be a part of this solution too ;-))


So, what was changed in the improved Adaptive Flushing (AF) in MySQL 5.6-labs:
  • first of all the logic is moved to the REDO log space occupancy rather tracking estimation of pages to flush as before
  • more REDO space is filled, and more aggressively becomes InnoDB is flushing
  • less REDO is filled - more relaxed is flushing..
  • the relation between REDO occupancy and flushing power level is progressive and non-linear..
  • all the time we're also considering the average REDO filling speed and average amount of flushed pages
  • these averages are helping to guide the AF decisions to better adapt dirty pages flushing according database activity..
  • then only point which is remaining yet open is a case when pages flushing is unable to follow REDO writes simply because of weak I/O layer.. - we already have a solution here too, but it's not yet part of 5.6-labs ;-)


From the user/DBA point of view the AF configuration as looking as simple as following:
  • the IO capacity setting is still remaining (and used for most maintenance tasks within InnoDB - you should consider it now as kind of AVG I/O write activity you're expecting to see from your database server in general, specially useful if your MySQL server is running with many other applications on the same machine (or other MySQL instances as well))..
  • the "Max" IO capacity is introduced to say how high InnoDB is allowed to go in flushing to be able to follow REDO log activity
  • the Low Watermark is introduced to say since what percentage of REDO log space filling the AF processing should be activated - depending on your workload it may vary.. (default is 10%)
  • all configuration variables are dynamic and can be changed at any moment ;-)

So, once you've set your limits - InnoDB then will try to do the best with what it has :-)


And all AF internals are available for monitoring via METRICS table:
  • buffer_flush_adaptive_total_pages -- getting delta of this value divided by the time interval giving you the current speed of pages flushing due AF
  • buffer_flush_avg_page_rate -- the average pages flushing rate seen and considered within AF, and if everything is going well with your I/O level, this value should less or more follow the previous one ;-)
  • buffer_flush_n_to_flush_requested -- the amount of pages requested for flushing by AF (and which may be quite different from both previous counters because the avg REDO filling speed is also coming in the game, and may limit a real flushing by pages age)..
  • buffer_flush_lsn_avg_rate -- LSN avg rate seen by AF (corresponding to the REDO logs feeling speed and used as age limitation barrier when flushing is involved -- so we may then flush as many pages as requested or less, because of this age rate).. - when on the I/O level everything is going well, this value should less or more follow your REDO MB/sec write speed, and grow in case of lagging to increase the current age barrier for flushing..

There are also some other counters, but let's skip them to keep stuff simple ;-)
Now, replaying the same RW workload presented before on MySQL 5.6-labs with:
  • innodb_io_capacity = 2000
  • innodb_max_io_capacity = 4000

will give us the following:


So,
  • AF is requesting to flush 2000 pages according its estimation
  • but in reality flushing 1000 pages/sec is quite enough to keep Checkpoint Age on a reasonable level
  • 70K QPS is reached..

But what will be changed since there will be some activity spikes?..

I'll mix now this workload with Sysbench OLTP_RW - additional 16/32 users sessions will come and leave the database activity with their RW transactions:

Observations:
  • Checkpoint Age is jumping up during spikes in activity..
  • within a current I/O configuration InnoDB is unable to flush faster than 1500 pages/sec..
  • which is resulting in QPS drops during these spikes..
  • What can we do to help here?..
    • we may improve the I/O level performance (sure will help! ;-))
    • or expect that is the spikes are not too long in time, with a bigger REDO logs there will be more room to resist to them..


So, now replaying the same test case, but now with using 8GB REDO logs total space:

Observations:
  • having a bigger REDO space automatically reduced the flushing rate during "quiet periods" (from 1500 to 500 pages/sec)
  • and we're reaching 1500 pages/sec flushing only during spikes now..
  • the marge on the REDO log space is big enough to keep Checkpoint Ages jumps under critical level
  • which is resulting in a total absence of QPS drops ;-)


Or the same workload on a served with less powerful CPUs, but more powerful storage level:

Observations:
  • QPS level is lower due lower CPU speed
  • but dirty pages flushing level is way faster when requested, easily reaching 3000 pages/sec
  • so we're not observing any QPS drops on activity spikes as Checkpoint Ages levels were never critical..

Looking on these results we may be only happy, because none of ones was possible to achieve with MySQL 5.5 and previously in 5.6 (both will keep Checkpoint Age too close to the critical limit, so any activity spike will bring a TPS drop)..


After these "complex" tests, let's get a look on a "simple" Sysbench OLTP_RW between 5.5 and 5.6-labs:

As you can see, as soon as you have to flush due reaching critical Checkpoint Age levels - as soon you'll have TPS drops too.. It's what we're observing in 5.5 and before in 5.6 - but I hope time is changed since 5.6-labs ;-))

The solution is not yet final, but at least very promising and pretty easy to tune ;-) but more testing will give us more ideas about how the final code should look like.. - so don't hesitate to test it on your own systems and your own workloads!! ;-))

As the final word, I'd like to thank Inaam from InnoDB team for his patience in implementation and adoption of all the ideas we have around into a single and truly integrated within InnoDB code solution!.. Kudos Inaam! :-))

I'd say, and you may just trust me, since 5.6-labs benchmarking MySQL will become much more interesting! ;-))

Rgds,
-Dimitri
Posted by Dimitri at 0:53 - [] Comments...
Categories: MySQL

Tuesday, 10 April, 2012

MySQL Performance: 5.6-labs is opening a new era..

I think every performance engineer in his life is time to time meeting some curious performance problems within an application (not only MySQL) which is looking out of understanding and over a time only continuing to increase a headache.. (until not resolved, of course.. ;-))

I've met many of them in the past while working in the Sun Benchmark Center, but did not expect to meet it so soon once started my official work in MySQL team ;-))

The problem is looked as simple as nothing else, and that's why it was several time more painful than anything else too :-))

Let me explain: since of the beginning of this year I've got for MySQL performance testing a 32cores (bi-thread) Intel server running Oracle Linux. And ss soon I've got it, I've deployed my tools on it and started my tests..

The main surprise for me came on the Read-Only test:
  • what can me more simply than Read-Only workload?..
  • specially when the data work set is small enough to fit the InnoDB Buffer Pool and remain fully cached?..
  • the test case is a simple Sysbench OLTP_RO workload, so nothing surprising here, many times tested by others..
  • running the test on both MySQL 5.5 and 5.6, to be sure nothing wrong came with 5.6 code..
But let's get a look on the difference in results obtained with 1, 16 and 32 concurrent users:
  • well, on a single user thread performance numbers may be quite random and vary from test to test..
  • but what is blocking any performance increase when we're moving from 16 concurrent users to 32?..
  • there are 32cores on the server and from the HW power point there is no limit yet..
  • from the SW point: other applications are scaling well on this server, then regarding InnoDB - if on 5.5 it's at least reporting the increased contention on the "kernel_mutex", then on 5.6 it's reporting just nothing!..
  • same "problem" with PFS (Performance Schema) - no any increased contention is reported..

We're simply lacking and instrumentation somewhere to find the source of the problem..

So, then I've started a long weeks of testing + in depth code profiling to understand what's going on.. Of course, I did not spend all of my time on resolving only this problem :-)) But you may understand as well than any other issues became less critical in my mind, because when you're hitting such a huge scalability limit without any contentions reported by the code.. - do you worry then about response time in your tests or any other benchmarks?.. - it's just looking like a huge bug or invisible contention.. - you're turning around it.. able to see the impact.. but still cannot see why it happens :-))

Well, to make it shorter - at one point of time a quantity of test results and analyzed cases should move to a quality of the final improvement in the code ;-)) so, when there was a time for our planned Performance Meeting in Paris, there was enough of data to discuss from my side, and this scalability limit was one of the main points in agenda..

Our brainstorming meeting was a true success! And I'd say it's a real pleasure when you're living such kind of events in your life.. - summarized all our skills together, we profiled the code dipper and dipper, arriving finally to assembler level to discover what is going on ;-)) There was such a strong complicity in this teamwork that we quickly understood that we simply MUST fix this issue ;-)) And finally we got it fixed!.. - it was even hard to believe the solution was so close.. Well, generally you know that it may happen in "some cases", but that it happens with MySQL code which is RDBMS and not at all HPC related?? (HPC - High Performance Computing)..

The scalability issue came from the effect that we got some code zones (data structures) in MySQL/InnoDB which became too hot on CPU cacheline!.. I'll not repeat the same details explained by Mikael in his blog post, but rather just will present you few most spectacular graphs comparing performance before and after G5 (the code name we used to call this improvement feature):


- over 50% performance improvement!.. ;-))

An even more spectacular on Sysbench RO SELECT RANGES test:


- up to x6 times(!) faster ;-))
And it's for a long time now I did not see something so spectacular in MySQL Performance ;-))

All these improvement and many other are available right now within MySQL 5.6-labs release!
Don't wait to try it!

what else to say?.. - I'm just proud to be part of it :-)) and enjoying my MySQL "full time" more than ever.. ;-))

Stay tuned, more is coming ;-))
We're living a great time..

Rgds,
-Dimitri

P.S. BTW, did I forget to say you we're hiring @MySQL ?.. ;-))

Posted by Dimitri at 17:39 - [] Comments...
Categories: MySQL

MySQL Performance: Game Of Contentions..

Recently running yet one of series of benchmark on MySQL 5.6, I've met the following case:

  • the workload is Read+Write, 256 concurrent user sessions running non-stop..
  • Performance Schema (PFS) is enabled and Synch events are monitored (both counted and timed)..
  • accidentally I've not fully disabled the query cache (QC) in MySQL config (query_cache_type=0 was missed)
  • and at once PFS reported it to be the top wait event during this workload:


Test #1 -- @5.6 with PFS=on, default query_cache_type:
NOTE: we're reaching in this configuration ~100K QPS..

So, of course, I've expected to see much more higher QPS once I've set query_cache_type=0, don' you?.. ;-)
The second graphs are representing the result obtained with these changes:


Test #2 -- @5.6 with PFS=on, query_cache_type=0:


So far:
  • 80K-90K QPS instead of 100K QPS observed initially..
  • wait time on the "btr_search_latch" lock contention is now replacing the same wait event time as the QC before..

Hm.. - my tuning "improvement" is resulting in a worse performance finally..
But keeping in mind that "btr_search_latch" is a RW-lock (and PFS overhead become higher on hot RW-locks), it's possible to suspect here PFS overhead..

Let's replay with PFS disabled now:

Test #3 -- @5.6 with PFS=off, query_cache_type=0:

Oh! it's much better! - we're reaching 100K QPS again! (well, near reaching 100K, to be honest.. - 98K QPS probably, but well, it seems to be better anyway ;-))

And what if now I'll comment again the "query_cache_type" setting and leave it by default?..

Test #4 -- @5.5 with PFS=off, default query_cache_type:

Hm... - no difference!.. - so, there is no impact at all with QC query_cache_type leaved by default???.. :-))

Let's summarize now:
  • 100K QPS with PFS=on, query_cache_type=default
  • 90K QPS with PSS=on, query_cache_type=0
  • 98K QPS with PFS=off, query_cache_type= default or 0


So far:
  • from the last 2 tests: seems that "query_cache_type" setting is not having any performance impact..
  • from the first 2 tests: "query_cache_type" has performance impact ;-)
  • from the tests #2 and #3: PFS has performance impact when enabled..
  • and we still have the best performance result here when PFS is enabled + "query_cache_type" is not turned OFF ;-))
  • NOTE: the test workload is exactly the same in all cases..
  • NOTE: it's exactly the same MySQL binary used in all cases too, so if PFS has an overhead, it'll be the same one whenever PFS is turned ON.. ;-))

What is wrong here?..

Nothing.. :-)

The problem is that since some time we're entering with MySQL code in a new game of contentions which is very well seen in a presented example:
  • PFS overhead is higher whenever RW-lock contention is higher..
  • enabling "query_cache_type" is adding an additional overhead (as seen from the Test #1)..
  • but this additional overhead by its action is lowering the contention on "btr_search_latch" RW-lock ;-))
  • lowering RW-lock contention is reducing on its turn PFS overhead..
  • and finally resulting in the better performance, even it sounds surprisingly ;-))


And, finally, the mutex waits seen by InnoDB for all 4 tests are looking like following:

Quite surprising impact on "btr_search_latch" contention ;-))

However, there are many other contentions as well, and this post is just a short example about.. - I'll use it as a reference in following posts this week and in a near future when discussing about various "lock contentions impact" in MySQL ;-))

Rgds,
-Dimitri

Posted by Dimitri at 12:37 - [] Comments...
Categories: MySQL

Thursday, 19 January, 2012

MySQL Performance: Overhead of Optimizer Tracing in MySQL 5.6

Last week I was faced to a strange problem while benchmarking MySQL 5.6: some of my user sessions become "frozen" periodically and finally my whole test was not valid..

More in-depth analyze shown that:

  • sessions are "frozen" when executing a SELECT statement..
  • sessions are not really frozen, but waiting on read!..
  • the read seems to be a kind of long read, involving many I/O operations..
  • all makes think that query is execution a full table scan (200M rows, 40GB table) and ignoring indexes, so instead of 1ms I have >5min response time..

But the problem is completely "random" - it's not happening all of the time and not in all users sessions. And every time I'm executing EXPLAIN for any given "wrong" query - it's always giving me a right execution plan.. So, seems that something is going wrong on the time when queries are executed concurrently, and not when I'm trying to analyze them alone. Which means that I have to catch the execution state exactly on the time when the execution plan is wrong!..

How to do it?.. - since MySQL 5.6 there was an Optimizer Tracing feature introduced which is helping a lot to understand the reason of choices made by optimizer. There is a good short HOWTO available on the Forge MySQL. While regarding my stress code, I have to add the following queries after CONNECT for my user sessions:

   SET OPTIMIZER_TRACE="enabled=on,end_marker=on";
   SET OPTIMIZER_TRACE_MAX_MEM_SIZE=1000000;

And then in case if in the user session the last executed query took more than 60sec, user then executed the following query to save the last query optimizer trace into a file:

   SELECT TRACE INTO DUMPFILE '/tmp/trace.$(ID)-$(REF)' FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;

As it's much more simple to read this trace from a file rather SELECT output ;-))

So far, all my 32 concurrent users were now running with Optimizer Trace enabled (currently the trace may be enabled only from the session itself, there is no any global setting available for security reasons). And very quickly I was able to get trace files for my problematic cases! What is great that the trace is containing the whole SQL query in question, so it was very easy then to replay the same SELECT query and obtain the optimizer trace when this query is executed with a right plan. Then simply compare two traces, easy no? ;-)

Skiping all details, the source of problem is shown by the following lines:

right execution plan:

...
                  "table_scan": {                                                        
                    "rows": 197451683,                                                   
                    "cost": 4.24e7                                                       
...

wrong execution plan:

...
                  "table_scan": {                                                        
                    "rows": 1,                                                           
                    "cost": 2.3                                                           
...
    So, for some reasons MySQL Optimizer is getting an information from InnoDB that my table containing only one single row instead of 200M rows.. Seems I was very lucky until now to never meet such a problem, but looks like the issue is real, and may happen time to time in InnoDB databases during table statistics updates. To avoid such kind of problems, MySQL 5.6 introduced InnoDB persistent statistics - once enabled, the statistics will be updated only on the next time of ANALYZE TABLE execution! So, if your table is changed frequently and widely, you have to plan regular ANALYZE of all such tables (via cron or MySQL Event Scheduler, depending what is a more simple for you).

And YES! by adding in my conf file:

# innodb pers.stats
innodb_analyze_is_persistent=1


all my problems were gone! no more broken execution plan anymore!..

The very positive from my observations was:
  • less than 10% overall performance degradation once Optimizer Tracing was enabled on all(!) 32 concurrent sessions
  • near no degradation at all when InnoDB persistent statistics were enabled
  • and finally I was able quickly find the source of my problem! ;-))

But my "problematic" workload was more I/O-centric rather CPU-bound (even CPU was used near 100%)... So, what will be an overhead in the "most worse" case when the workload is more CPU-bound and queries response time is less than 1ms?.. ;-)

Let's get a look on the following graphs representing 3 test cases:
  • default configuration as before
  • persistent statistics are enabled on InnoDB
  • optimizer tracing is enabled within all user sessions
All tests were executed with 32 concurrent users on 12 cores server.


Read-Only Test

Observations :
  • default: 35700 TPS
  • with persistent stats: 35600 TPS
  • with enabled optimizer tracing: 28500 TPS => 20% degradation..


Read+Write Test


Observations :
  • default: 48870 TPS
  • with persistent stats: 48860 TPS
  • with enabled optimizer tracing: 41100 TPS => 16% degradation..


So, keeping in mind these degradation levels, I'm pretty ready today to add ahead in my code of all perf tools an option to enable Optimizer Tracing on demand! - such a feature added in to any application using MySQL may save you days of debugging! Then, even 20% of performance degradation is nothing comparing to difference in bad response time.. And if I'm tracing only one user session, the global performance degradation will be less than 1% ;-) while 20% difference on a web application (for ex.) you may even not see, as the network latency sometimes may give you even more surprises ;-))

And seems that Persistent Statistics in InnoDB is the must! It just needs to have a solution for the most straight forward adoption on the user land..

BTW, did you try already these features?.. You should ;-)

Any comments and feedbacks are very welcome!..

Rgds,
-Dimitri

Other resources to read:

Posted by Dimitri at 15:41 - [] Comments...
Categories: MySQL, Tools/ dbSTRESS

Friday, 06 January, 2012

MySQL Performance: Linux I/O

It was a long time now that I wanted to run some benchmark tests to understand better the surprises I've met in the past with Linux I/O performance during MySQL benchmarks, and finally it happened last year, but I was able to organize and present my results only now..

My main questions were:

  • what is so different with various I/O schedulers in Linux (cfq, noop, deadline) ?..
  • what is wrong or right with O_DIRECT on Linux ?..
  • what is making XFS more attractive comparing to EXT3/EXT4 ?..

There were already several posts in the past about impact on MySQL performance when one or another Linux I/O layer feature was used (for ex. Domas about I/O schedulers, Vadim regarding TPCC-like performance, and many other) - but I still did not find any answer WHY (for ex.) cfq I/O scheduler is worse than noop, etc, etc..

So, I'd like to share here some answers to my WHY questions ;-))
(while for today I still have more questions than answers ;-))


Test Platform

First of all, the system I've used for my tests:
  • HW server: 64 cores (Intel), 128GB RAM, running RHEL 5.5
  • the kernel is 2.6.18 - as it was until now the most common Linux kernel used on Linux boxes hosting MySQL servers
  • installed filesystems: ext3, ext4, XFS
  • Storage: ST6140 (1TB on x16 HDD striped in RAID0, 4GB cache on controller) - not a monster, but fast enough to see if the bottleneck is coming from the storage level or not ;-))


Test Plan

Then, my initial test plan:
  • see what the max possible Read/Write I/O performance I can obtain from the given HW on the raw level (just RAW-devices, without any filesystem, etc.) - mainly I'm interested here on the impact of Linux I/O scheduler
  • then, based on observed results, setup more optimally each filesystem (ext3, ext4, XFS) and try to understand their bottlenecks..
  • I/O workload: I'm mainly focusing here on the random reads and random writes - they are the most problematic for any I/O related performance (and particularly painful for databases), while sequential read/writes may be very well optimized on the HW level already and hide any other problems you have..
  • Test Tool: I'm using here my IObench tool (at least I know exactly what it's doing ;-))



TESTING RAW DEVICES

Implementation of raw devices in Linux is quite surprising.. - it's simply involving O_DIRECT access to a block device. So to use a disk in raw mode you have to open() it with O_DIRECT option (or use "raw" command which will create an alias device in your system which will always use O_DIRECT flag on any involved open() system call). Using O_DIRECT flag on a file opening is disabling any I/O buffering on such a file (or device, as device is also a file in UNIX ;-) - NOTE: by default all I/O requests on block devices (e.g. hard disk) in Linux are buffered, so if you'll start a kind of I/O write test on, say, your /dev/sda1 - you'll obtain a kind of incredible great performance ;-)) as no data probably will not yet even reach your storage and in reality you'll simply test a speed of your RAM.. ;-))

Now, what is "fun" with O_DIRECT:
  • all your I/O requests (read, write, etc.) block size should be aligned to 512 bytes (e.g. be multiplier of 512 bytes), otherwise your I/O request is simply rejected and you get an error message.. - and regarding to RAW devices it's quite surprising comparing to Solaris for ex. where you're simply instead of /dev/dsk/device using /dev/rdsk/device and may use any block size you want..
  • but it's not all.. - the buffer you're using within your system call involving I/O request should also be allocated aligned to 512 bytes, so mainly you have to allocate it via posix_memalign() function, otherwise you'll also get an error.. (seems that during O_DIRECT operations there is used some kind of direct memory mapping)
  • then, reading the manual: "The O_DIRECT flag on its own makes at an effort to transfer data synchronously, but does not give the guarantees of the O_SYNC that data and necessary metadata are transferred. To guarantee synchronous I/O the O_SYNC must be used in addition to O_DIRECT" - quite surprising again..
  • and, finally, you'll be unable to use O_DIRECT within your C code until you did not declare #define _GNU_SOURCE

Interesting that the man page is also quoting Linus about O_DIRECT:
"The thing that has always disturbed me about O_DIRECT is that the whole interface is just stupid, and was probably designed by a deranged monkey on some serious mind-controlling substances." Linus

But we have to live with it ;-))

And if you need an example of C or C++ code, instead to show you the mine, there is a great dev page on Fusion-io site.

So far, what about my storage performance on the RAW devices now?..

Test scenario on RAW devices:
  • I/O Schedulers: cfq, noop, deadline
  • Block size: 1K, 4K, 16K
  • Workload: Random Read, Random Write

NOTE: I'm using here 1K block size as the smallest "useful" size for databases :-)) then 4K as the most aligned to the Linux page size (4K), and 16K - as the default InnoDB block size until now.

Following graphs are representing 9 tests executed one after one: cfq with 3 different block sizes (1K, 4K, 16K), then noop, then deadline. Each test is running a growing workload of 1, 4, 16, 64 concurrent users (processes) non-stop bombarding my storage subsystem with I/O requests.


Read-Only@RAW-device:

Observations :
  • Random Read is scaling well for all Linux I/O Schedulers
  • Reads reported by application (IObench) are matching numbers reported by the system I/O stats
  • 1K reads are running slightly faster than 4K (as expected as it's a "normal" disks, and transfer of a bigger data volume reducing an overall performance, which is normal)..



Write-Only @RAW-device:

Observations :
  • looking on the graph you may easily understand now what is wrong with "cfq" I/O scheduler.. - it's serializing write operations!
  • while "noop" and "deadline" are continuing to scale with a growing workload..
  • so, it's clear now WHY performance gains were observed by many people on MySQL workloads by simply switching from "cfq" to "noop" or "deadline"


To check which I/O scheduler is used for your storage device:
# cat /sys/block/{DEVICE-NAME}/queue/scheduler

For ex. for "sda": # cat /sys/block/sda/queue/scheduler

Then set "deadline" for "sda": # echo deadline > /sys/block/sda/queue/scheduler
To set "deadline" as default I/ scheduler for all your storage devices you may boot your system with "elevator=deadline" boot option. Interesting that by default many Linux systems used "cfq". All recent Oracle Linux systems are shipped with "deadline" by default.



TESTING FILESYSTEMS

As you understand, there is no more reasons to continue any further tests by using "cfq" I/O scheduler.. - if on the raw level it's already bad, it cannot be better due any filesystem features ;-)) (While I was also told that in recent Linux kernels "cfq" I/O scheduler should perform much more better, let's see)..

Anyway, my filesystem test scenario:
  • Linux I/O Scheduler: deadline
  • Filesystems: ext3, ext4, XFS
  • File flags/options: osync (O_SYNC), direct (O_DIRECT), fsync (fsync() is involved after each write()), fdatasync (same as fsync, but calling fdatasync() instead of fsync())
  • Block size: 1k, 4K, 16K
  • Workloads: Random Reads, Random Writes on a single 128GB file - it's the most critical file access for any database (having a hot table, or a hot tablespace)
  • NOTE: to avoid most of background effects of caching, I've limited an available RAM for FS cache to 8GB only! (all other RAM was allocated to the huge SHM segment with huge pages, so not swappable)..


Also, we have to keep in mind now the highest I/O levels observed on RAW devices:
  • Random Read: ~4500 op/sec
  • Random Write: ~5000 op/sec

So, if for any reason Read or Write performance will be faster on any of filesystems - it'll be clear there is some buffering/caching happening on the SW level ;-))

Now, let me explain what you'll see on the following graphs:
  • they are already too many, so I've tried to bring more data on each graph :-))
  • there are 12 tests on each graph (x3 series of x4 tests)
  • each serie of tests is executed by using the same block size (1K, then 4K, then 16K)
  • within a serie of 4 tests there are 4 flags/options are used one after one (osync, direct, fsync, fdatasync)
  • each test is executed as before with 1, 4, 16, 64 concurrent user processes (IObench)
  • only one filesystem per graph :-))

So, let's start now with Read-Only results.


Read-Only @EXT3:

Observations :
  • pretty well scaling, reaching 4500 reads/sec in max
  • on 1K reads: only "direct" reads are really reading 1K blocks, all other options are involving reading of 4K blocks
  • nothing unexpected finally :-)



Read-Only @EXT4:

Observations :
  • same as on ext3, nothing unexpected



Read-Only @XFS:

Observations :
  • no surprise here either..
  • but there were one surprise anyway ;-))

While the results on Random Read workloads are looking exactly the same on all 3 filesystems, there are still some difference in how the O_DIRECT feature is implemented on them! ;-))

The following graphs are representing the same tests, but only corresponding to execution with O_DIRECT flag (direct). First 3 tests are with EXT3, then 3 with XFS, then 3 with EXT4:


Direct I/O & Direct I/O

Observations :
  • the most important here the last graph showing here the memory usage on the system during O_DIRECT tests
  • as you may see, only with XFS the filesystem cache usage is near zero!
  • while EXT3 and EXT4 are still continuing cache buffering.. - may be a very painful surprise when you're expecting to use this RAM for something else ;-))


Well, let's see now what is different on the Write Performance.


Write-Only @EXT3:

Observations :
  • the most worse performance here is with 1K blocks.. - as default EXT3 block size is 4K, on the 1K writes it involves a read-on-write (it has to read 4K block first, then change corresponding 1K on changes within it, and then write the 4K block back with applied changes..)
  • read-on-write is not happening on 1K when O_DIRECT flag is used: we're really writing 1K here
  • however, O_DIRECT writes are not scaling at all on EXT3! - and it explains me finally WHY I've always got a worse performance when tried to use O_DIRECT flush option in InnoDB on EXT3 filesystem! ;-))
  • interesting that the highest performance here is obtained with O_SYNC flag, and we're not far from 5000 writes/sec for what the storage is capable..



Write-Only @EXT4:

Observations :
  • similar to EXT3, but performance is worse comparing to EXT3
  • interesting that only with O_SYNC flag the performance is comparable with EXT3, while in all other cases it's simply worse..
  • I may suppose here that EXT3 is not flushing on every fsync() or fdatasync(), and that's why it's performing better with these options ;-)) need to investigate here.. But anyway, the result is the result..

What about XFS?..


Write-Only @XFS:

Observations :
  • XFS results are quite different from those of EXT3 and EXT4
  • I've used a default setup of XFS here, and was curios to not observe the impact of missed "nobarrier" option which was reported by Vadim in the past..
  • on 1K block writes only O_DIRECT is working well, but in difference from EXT3/EXT4 it's also scaling ;-) (other options are giving poor results due the same read-on-write issue..)
  • 4K block writes are scaling well with O_SYNC and O_DIRECT, but still remaining poor with other options
  • 16K writes are reporting some anomalies: while with O_SYNC nothing is going wrong and it's scaling well, with O_DIRECT there is some kind of serialization happened on 4 and 16 concurrent user processes.. - and then on 64 users things then came back to the normal.. Interesting that is was not observed with 4K block writes.. Which remains me the last year discussion about page block size in InnoDB for SSD, and the gain reported by using 4K page size vs 16K.. - just keep in mind that sometimes it may be not related to SSD at all, but just to some filesystem's internals ;-))
  • anyway, no doubt - if you have to use O_DIRECT in your MySQL server - use XFS! :-)

Now, what is the difference between a "default" XFS configuration and "tuned" ??..

I've recreated XFS with 64MB log size and mounted with following options:
# mount -t xfs -o noatime,nodiratime,nobarrier,logbufs=8,logbsize=32k

The results are following..


Write-Only @XFS-tuned:

Observations :
  • everything is similar to "default" config, except that there is no more problem with 16K block size performance
  • and probably this 16K anomaly observed before is something random, hard to say.. - but at least I saw it, so cannot ignore ;-))

Then, keeping in mind that XFS is so well performing on 1K block size, I was curious to see if thing will not go even better if I'll create my XFS filesystem with 1K block size instead of default 4K..


Write-Only @XFS-1K:

Observations :
  • when XFS is created with 1K block size there is no more read-on-write issue on 1K writes..
  • and we're really writing 1K..
  • however, the performance is completely poor.. even on 1K writes with O_DIRECT !!!
  • why?..


The answer is came from the Random Reads test on the same XFS, created with 1K block size.


Read-Only @XFS-1K:



Observations :
  • if you followed me until now, you'll understand everything from the last graph, reporting RAM usage.. ;-))
  • the previously 8GB free RAM is no more free here..
  • so, XFS is not using O_DIRECT here!
  • and you may see also that for all reads except O_DIRECT, it's reading 4K for every 1K, which is abnormal..

Instead of SUMMARY
  • I'd say the main point here is - "test your I/O subsystem performance before to deploy your MySQL server" ;-))
  • avoid to use "cfq" I/O scheduler :-)
  • if you've decided to use O_DIRECT flush method in your MySQL server - deploy your data on XFS..
  • seems to me the main reason why people are using O_DIRECT with MySQL it's a willing to avoid to deal with various issues of filesystem cache.. - and there is probably something needs to be improved in the Linux kernel, no? ;-)
  • could be very interesting to see similar test results on the other filesystems too..
  • things may look better with a newer Linux kernel..

So far, I've got some answers to my WHY questions.. Will be fine now to get a time to test it directly with MySQL ;-)

Any comments are welcome!

Rgds,
-Dimitri
Posted by Dimitri at 10:48 - [] Comments...
Categories: Linux, MySQL, Tools/ IObench

Wednesday, 21 December, 2011

MySQL Performance: 5.6.4 @dbSTRESS Benchmark

MySQL 5.6.4 milestone is here and I'd like to share some benchmark results I've obtained with it on dbSTRESS workload.

First of all, I was curious to observe a single user session performance (I did not look on it for a while, and there were several persons reporting we got a performance degradation in 5.6 comparing to 5.5 and 5.1).. - Seems it's very workload depending, because on dbSTRESS I've obtained the following:

Read-Only :
  • 5.1 : 1800 TPS
  • 5.5 : 1750 TPS (but sometimes jumps to 2700 TPS)
  • 5.6.2: 2600 TPS
  • 5.6.4: 2600 TPS
Read-Write :
  • 5.1: 2700 TPS
  • 5.5: 2600 TPS
  • 5.6.2 : 3300 TPS
  • 5.6.4: 3300-3600 TPS

The results as they are, I'm planning to profile them to analyze in depth (and periodic jumps to 2700 TPS on read-only in 5.5 are looking confusing). However, few points:
  • In 5.6.4 it was possible to suspect the effect of the Read-Only transactions optimization feature , but the results on 5.6.2 is the same or quite similar, and it did not have this feature yet..
  • Then, to compare apples to apples, I've compiled all binaries without Performance Schema - so P_S improvements made in 5.6 are not playing here, just pure code base..
So, all it makes me think that improvement you're observing (or not) are depending on your workload, so more workload will be analyzed - better the MySQL code will be designed ;-)) (I'm keeping in mind sysbench test cases made by Percona, but my Call for Workloads remains open ;-))

Also, a quick workload profiling is already showing me that in some parts of code we're spending more time in 5.6 comparing to 5.5, so there is still yet a room for improvement in 5.6, even it's looking better than 5.5 here ;-))

Now, let's see how the things are going on the multi-user workload.

Few words about the test conditions:
  • Server: X4140 12cores bi-thread 2900Mhz, 72GB RAM
  • Storage: RAID0 on x2 internal disks, there is an SSD also, but too small to host my database :-)
  • OS: OEL 6.1 running UEK kernel
  • FS: XFS mounted with noatime,nobarrier option
  • Workload: growing from 1 to 512 concurrent users, Read-Only, Read+Update, Read+Write (Write= Delete+Insert+Update)
  • Comparing 5.5 and 5.6.4
Nothing special on the MySQL configuration, exactly the same as usual, but:
  • Buffer Pool is 32GB, 16 instances
  • I'm attentionally not using O_DIRECT here as my internal disks will not be able to follow on random reads (and I'm delegating it to the FS cache), while we may still have some expectation for redo and data writes ;-)
  • same for redo flushing: innodb_flush_log_at_trx_commit=2 is used

So far..

Read-Only Workload

Observations :
  • 5.6 is running slightly better here, however keeps the TPS level up to 512 users pretty well
  • the main bottleneck is on btr_search_mutex, but it remains stable due limited number of cores on this server (12 cores)
  • in 5.5 contention is also combined with kernel_mutex
  • NOTE: the trx mutex contention observed before is gone in 5.6 due RO optimization available since 5.6.4! ;-)
  • NOTE: it's still need to be fixed for btr_search_mutex - it's abnormal we have this contention while there is no more pages loaded during this period of time into the Buffer Pool. - need to be fixed ;-)


Read+Update Workload

Observations :
  • 5.6 is looking better here, and RO optimization feature is slightly helping here too..
  • a short presence of contention on the log mutex in 5.5 is signaling here that one moment there was and I/O wait on redo writes in 5.5, but even before it, on the low load, 5.6 is still showing a better performance..
  • kernel_mutex + btr_search_mutex waits are the main killers for 5.5
  • lock mutex waits are killing 5.6 since 512 concurrent users..


Read+Write Workload

Observations :
  • 5.6 is still performing better (while not giving impression of stability..- it's due contentions on my "storage" level ;-))
  • index mutex seems to be the main bottleneck here, looks like it's time for me to include also a test with partitions ;-)) but I'd love to see index mutex fixed as well ;-))


Well, not bad at all for this 5.6.4 release, even there are still yet many things to do (like to fix adaptive flushing, etc. ;-))

Then, I was curious if some other stuff may improve performance/stability:
  • using small sort and join buffers (32K instead of 2M and 128K respectively) -- no difference..
  • using innodb_sync_array_size=32 in 5.6 (instead of default 1) -- no difference..
Well, this workload is not involving contentions on this stuff, so not really surprised..
Next:
  • InnoDB thread concurrency = 24 :
    • Read-Only: no difference
    • Read+Update: more stable and better performance up to 512 users on 5.6 (due reduced lock mutex contention)
    • Read+Write: more stable, but significantly lower performance..

While the impact is very depending on the workload, the feature like that is very welcome here! - the experimental concurrency management in 5.6 should be improved to become production ready. As well having improved Thread Pool extension here will change the things too! ;-)) To be honest, I think even once we'll fix any scalability contentions within MySQL and InnoDB, there will be still a need for a kind of concurrency management (e.g. thread/session pool, resource management) - any HW server + running OS has a limit of a number of tasks it can manage optimally.. - then after this limit you'll always observe a performance decrease ;-)) and the goal fo such a self-manager will be to never out-pass such a limit..

But well, time is pressing, and there was a one feature I wanted to test from a long time - a huge REDO log! (bigger than 4GB).


Read+Update with a bigger REDO size

As a start point I've decided to just replace x3 logs of 1GB by x12 logs = 12GB in total. With 12GB of REDO space there is should be always enough of REDO and the only flushing which should be involved in this case is due dirty pages percentage limit (which is supposed to be quite cool ;-))

But for my big surprise the performance was worse:

Observations :
  • indeed, performance is worse with 12GB REDO vs 3GB..
  • such a degradation is explained by log mutex waits..

Hmm.. - is my internal storage became a bottleneck now?? - Let's move REDO logs to the SSD storage (it's small, but quite enough to keep REDO logs ;-))

And it's not really better:

Observations :
  • well, it looks very stable since REDO was moved to SSD, and performance is better ;-))
  • however, TPS level is still lower comparing to the initial results..
  • and WHY I still have waits on the log mutex??? is my SSD not fast enough either?? ;-))


Let's get a closer look on the I/O activity now:

Observations :
  • as you can see, InnoDB is reporting no page reads, only writes!
  • while from the system I/O stats we can see there were reads present during both last tests (and there is no reads on the initial one)
  • why?..
  • the answer is quite simple here:
    • the REDO log writes in the current InnoDB code are not aligned to any block size..
    • so, most of REDO writes are involving Read-On-Write operation (as the write record is not aligned to FS block size, FS has to read the block first, apply the changes from the record, and then write the modified block to disk)..
    • until the REDO size was "small enough", all these reads during Read-On-Write were reading blocks from FS cache ;-))
    • now, as REDO was increased to 12GB, once we've finished to write the last log and switching to write to the first one, the first log file will have many blocks already missed in FS cache, so they will be physically read from disk!
    • and even SSD is fast, it's not as fast as FS cache sitting in the RAM ;-))

So, here we have an excellent illustration why we have to find a way to write REDO logs with O_DIRECT option (no need to use FS cache for files which are used just for write-only), AND/OR align log writes to the FS block size! (well, in the current design O_DIRECT may have a negative impact as it makes impossible to use a chain: hold_a_lock->write()->release_a_lock->fflush() - with O_DIRECT we'll involve physical write() while holding a lock.. - so such a chain should be redesigned; OR we may simply use a block size aligned writes and involve fadvise() to limit a caching of REDO logs! -- BTW, these points were already discussed with MarkC, VadimTk and all InnoDB team, so it's not something new :-)) but I was surprised to hit this issue so radically :-)) Then, since we're all agree that it should be fixed, it's probably a time to implement it now? ;-))

To go till the end now, let's see if things will be better with 3GB REDO size (as initially), but on SSD:

Observations :
  • yes, we reached the same 50,000 TPS as before! ;-)
  • no more log mutex waits, as expected..
  • however, missing some workload stability..


And with 6GB REDO on SSD?.. -

Observations :
  • the last result is the most stable from all of we have seen until now :-))
  • quite interesting that Checkpoint Age did not out-pass 4.5GB in case with 6GB REDO log..

Seems that improving Adaptive Flushing will be the next step here.. (as well, block size aligned REDO log writes, btr_search_latch and lock mutex contentions ;-))

It's also great to see a growing interest on MySQL 5.6 - db4free.net is already proposing sandboxes with MySQL 5.6, and I was told that some other web hosting companies are actively evaluating MySQL 5.6 adoption now.. Let's see ;-)

Time for vacations now! ;-))

Some fun stuff..
if you continued to read until here ;-)) This year is finishing, and there were many fun events.. - it's a long time now I wanted to share few small videos from Harmony-2011 Conference (Helsinki, Finland) - it was a Baltic OUG Conference with MySQL stream, very warm and open mind people, I really enjoyed it :-)) And here are 2 videos from there:

  • Ronald Bradford playing a "Cold War Game" with a Russian guy - HD / MP4
  • Opera time @Conference (no words.. just "wow!") - HD / MP4

Happy Holidays! Happy New Year! Happy Vacations! ;-))

Rgds,
-Dimitri

Posted by Dimitri at 22:19 - [] Comments...
Categories: MySQL, Tools/ dbSTRESS

Thursday, 08 December, 2011

MySQL Performance: Call for Workload Scenarios

Just in case you've missed my initial post last week (there were some problems on the Planet MySQL site, and it did not appear correctly)..

I'm looking for Test Scenarios to extend our Benchmark Workloads to run on MySQL Server in way to improve MySQL performance and find the most optimal solutions to your problems.. - More closed to production workloads we're testing, better MySQL will keep your production workloads in the future! easy, no?.. ;-)

Well, it's just a reminder from my initial post (and I cannot believe you have nothing to share ;-))
(hoping there will be no problems on the Planet MySQL site this time ;-))

Rgds,
-Dimitri

Posted by Dimitri at 13:26 - [] Comments...
Categories: MySQL

Thursday, 01 December, 2011

MySQL Performance: Full Time since Today ;-)

Until now, all my work around MySQL Performance was just a part of fun rather my main job... But things are changing since today ;-) I'm moving to MySQL Team, and will work on MySQL Performance full time from now on! ;-) Which is meaning yet more fun (I hope ;-)), more blog posts (I'm sure ;-)), more benchmarks (of course), and more brain work (even more than sure ;-))..

And as I've mentioned benchmarks - many of you complained over a time that we're not really testing what we should.. Well, everything is relative and depending directly on what you're trying to fix or analyze.. ;-) So, I'll not stop running dbSTRESS or Sysbench workloads (or TPCC-like, DBT-2, etc.).. But, I'm also looking for other various test scenarios to reproduce different MySQL performance problems you're observing in your production, or your customer's sites, etc. Please, leave comments on this post, or contact me directly if you prefer. I'll try then to group all such test scenarios together and then ship them as extension to dbSTRESS or any other way (I don't have yet a final test kit, but many ideas are running in my head ;-)), publish performance results and their analyzes (as usual).

As well if you have in mind some top priority performance problems in MySQL 5.5/5.6 - please share them too. I promise you to add them all into my TODO list, and once the impact is validated - see how better it should be fixed (but, of course, I have to be able to reproduce it first ;-)) I'm already have in head issues with adaptive flushing, purge lag delay, Performance Schema overhead, etc. - but you may see other things and have other priorities..

So don't wait - share! ;-)

Rgds,
-Dimitri

Posted by Dimitri at 11:30 - [] Comments...
Categories: MySQL, x-files...

Friday, 16 September, 2011

MySQL Performance: High Load & New Thread Pool in 5.5


This post is the following part of the previous one (http://dimitrik.free.fr/blog/archives/2010/11/mysql-performance-55-and-innodb-thread-concurrency.html) discussing about concurrency management within MySQL/InnoDB. At the end of the article I've expressed my expectations to see one day a kind of "Advanced Thread Pool" integrated within MySQL. Today this expectation become a reality with newly available Thread Pool extension for MySQL (see more about on http://blogs.oracle.com/MySQL/entry/new_commercial_extensions_for_mysql and Sysbench benchmark results on http://www.mysql.com/products/enterprise/scalability.html).

And of course, to follow my previous blog post, I'll present you my own results obtained on the same dbSTRESS workload as before. The following graphs are representing MySQL 5.5 performance levels under dbSTRESS workload while Thread Pool is used or not. As you may read from my previous post, using "innodb_thread_concurrency" setting is not helping anymore to keep performance stable on a high number of concurrent user sessions. And the most optimal solution was to limit MySQL server execution within a limited set of CPU/cores (if contention cannot be reduced, we may at least avoid to amplify it by reducing concurrency). Let's see now how the Thread Pool extension is helping here..

And few test graphs will tell you it better than many words ;-))

There are 4 parts on each graph corresponding to 4 tests running one after other:
  • MySQL is running within 8 cores taskset
  • MySQL is running within 16 cores taskset
  • MySQL is running on 32 cores without taskset
  • MySQL is running on 32 cores without any taskset and with Thread Pool enabled (with thread_pool_size=16)

On each test the load is growing step by step from 1 to 2, 4, 8, 16, 32 .. 1024 concurrent user sessions. While the load is growing, InnoDB internal contention become more and more hot. Limiting MySQL server to 8 cores is giving the most stable result on the "standard" MySQL configuration. While the Thread Pool is "naturally" helping here to keep the level of contentions more constant whatever load we have and avoid performance drop on a high number of concurrent sessions.

And it's very clearly seen from graphs how drastically the InnoDB mutex contention is reduced.


Read-Only Workload


Observations :
  • using taskset on 8 cores gives the most stable result on the "standard" MySQL 5.5 configuration (20.000 TPS, 21.000 TPS in peak), however performance is decreasing on 1024 sessions (under 15.000 TPS)..
  • then when Thread Pool is used the peak performance is higher: 27.000 TPS vs 21.000, and on 1024 sessions it's still higher as well (17.500 vs 14.000 TPS)
  • the gain with Thread Pool here is due a dramatic drop on kernel_mutex contention!..



Read+Update Workload


Observations :
  • "standard" 5.5 is more optimal on 8 cores taskset again
  • but with Thread Pool you have yet more better overall performance: 40.000 TPS vs 35.000, and way better result on 1024 sessions: 27.000 TPS vs 10.000..
  • the obtained gain is again due lower contention on the kernel_mutex


Read+Write Workload

Observations :
  • again taskset on 8 cores is more optimal for "standard" MySQL 5.5
  • and Thread Pool again gives the best results: 22.500 TPS vs 19.000, then overall performance is better too, and finally on 1024 sessions the result is better again: 10.000 TPS vs 3.000 TPS
  • once again the gain is due a lower mutex contention! (kernel_mutex + index mutex)

So far, Thread Pool is helping well to avoid amplification of InnoDB mutex contentions with a growing level of concurrent user sessions. Also it seems to be able to manage execution of these sessions very well! (Note that OS scheduler has no idea what is going inside of MySQL internals, while Thread Pool is knowing exactly the state of each thread and for ex. will switch execution to another thread if the current one is involving a lock or an I/O request, etc.)..

Well, I may only invite you to run your own tests on MySQL 5.5 with Thread Pool and share your experience! :-))

As usually, any comments are welcome..

Posted by Dimitri at 15:47 - [] Comments...
Categories: MySQL