by Dimitri SSC Team, 2009 Sun Microsystems Inc. |
Benchmark Information |
Customer Name(s): SSC TeamNDA: no
Contact Information:
- dimitri (at) sun.comDates: Jun.2009
Keywords: MySQL, InnoDB, Ahead Flushing, purge, purge lag, innodb_max_purge_lag, log size, dirty pages, performance
Hardware Configuration |
Server(s):
- M5000 8CPU SPARC64-VII (quad-core bi-thread) 2400Mhz, 64GB RAM, 2x FC-port 4GbitStorage:
- ST6140 2x LUN (500GB RAID1 8HD), each LUN is connected by its own fiber channel to the host
Software Configuration |
System:
- Solaris 10 u6
- UFSApplication(s):
- MySQL 5.4
- dbSTRESS (injector)
Abstract |
Overview: After publishing in May a benchmark report about InnoDB Dirty pages & Log size impact I received by email several very interesting comments, and one of them pointed to the purge lag problem and InnoDB innodb_max_purge_lag parameter created specially for such purpose! So, I was very curious to know how it may help me in my workload...The following study is about how innodb_max_purge_lag is working now and what may be done to make things better :-))
Goal(s): I'm not pretending here on any absolute truth :-) My goal is to understand what's going on with purge on my workload and see if there is something else possible to improve..
Result(s): are following :-)
Benchmark Scenario |
My benchmark scenario will be exactly the same as in my previous test - and by chance I'm still having exactly the same server :-))To reproduce the same problem again (if it still exists of course :-)) I've prepared a follow dbSTRESS scenario:
Test Scenario :
- during 5 minutes a Read-Only warm-up
- then during 1 hour a Read+Write workload
- each sessions is operating a fixed list of objects (choice is random, but the list is fixed and InnoDB buffer pool is sized big enough to keep all necessary pages cached!)
From my previous tests I observed that the max throughput on 16cores is generally reached with 32 sessions (both with innodb thread concurrency equal to 0 or 16). So, other settings for this scenario:
Config :
- CPU cores: 16
- concurrent sessions: 32
- think time: 0
Each test is executed :
- with 1024MB redo log size
- or 128MB redo to compare
For every test to check :
- dirty pages percentage
- total time of dirty page flushing at the end of the test
- reached performance level
- History length
- DML delays (if any)
However, I'm concentrating only on MySQL 5.4 testing now.
InnoDB Purge Lag |
According the InnoDB documentation (and as well InnoDB code :-)) innodb_max_purge_lag variable controls how to delay INSERT, UPDATE, and DELETE operations when purge operations are lagging. The default value 0 (no delays).The InnoDB transaction system maintains a list of transactions that have delete-marked index records by UPDATE or DELETE operations. Let the length of this list be purge_lag. When purge_lag exceeds innodb_max_purge_lag, each INSERT, UPDATE, and DELETE operation is delayed by ((purge_lag/innodb_max_purge_lag)×10)–5 milliseconds. The delay is computed in the beginning of a purge batch, every ten seconds. The operations are not delayed if purge cannot run because of an old consistent read view that could see the rows to be purged.
However, as I explained before , purge routine is not called as expected, Main Thread is looping all the time on trx_purge() - but let's see more in depth how it works.
Purge lag is also called as a Rollback segment History length within InnoDB code, and it's the name I prefere :-))
InnoDB Code Instrumentations |
To be able to monitor InnoDB internals, I've added following stats:
- FLush calls (DTrace script tracing who is calling buf_flush_batch() function and how often)
- innodbSTAT (based on "show innodb status" - to check a buffer pool state)
- added "History len" value into "show innodb status"
- added DML delay values into "show innodb status"
the added code is looks like:
fputs("--------------\n"
"PURGE STATUS\n"
"--------------\n", file); fprintf( file, "History len: %d\nDML Delay: %d max: %d\n",
trx_sys->rseg_history_len, srv_dml_needed_delay, srv_dml_needed_delay_max ); srv_dml_needed_delay_max= 0;As DML-delay (srv_dml_needed_delay) may vary a lot at any time, I've added a DML-delay-max value (srv_dml_needed_delay_max) - it's keeping the max delay value reached until current "show innodb status" call (and set to zero after).
First Observations |
As you may see from my previous tests , a smaller redo log size gives a better TPS stability on InnoDB. While a bigger redo log gives a higher performance level, but - generates a periodical burst data flushing which may have a partial freeze of throughput as a side effect...And even I don't like too much any kind of artificial limit (storage box is even not fully used here), but probably an innodb_max_purge_lag option may bring little bit more stability?..
Let's see..
First test :
- innodb_max_purge_lag=0
- innodb_max_purge_lag=1000000
- innodb_max_purge_lag=100000
- innodb_max_purge_lag=10000
For my big surprise, History length outpassed 8.000.000 within all cases and there was NO DIFFERENCE between these tests..
Why?..
Analyzing the code |
Let's get a look on the InnoDB code where the innodb_max_purge_lag value is analyzed.File : innobase/trx/trx0purge.c
/* If we cannot advance the 'purge view' because of an old
'consistent read view', then the DML statements cannot be delayed.
Also, srv_max_purge_lag <= 0 means 'infinity'. */
if (srv_max_purge_lag > 0 && !UT_LIST_GET_LAST(trx_sys->view_list))
{
float ratio = (float) trx_sys->rseg_history_len
/ srv_max_purge_lag;
if (ratio > ULINT_MAX / 10000) {
/* Avoid overflow: maximum delay is 4295 seconds */
srv_dml_needed_delay = ULINT_MAX;
} else if (ratio > 1) {
/* If the history list length exceeds the
innodb_max_purge_lag, the
data manipulation statements are delayed
by at least 5000 microseconds. */
srv_dml_needed_delay = (ulint) ((ratio - .5) * 10000);
}
}As you may see an innodb_max_purge_lag is considered only in case when UT_LIST_GET_LAST(trx_sys->view_list) returns zero! - it means there is no more any "consistent read" running and needing "unpurged" data. Tracing this condition I discovered it never returns zero until there is at least one select running within a database! So, my purge lag setting was simply ignored !
What is dangerous here?
- first of all - while history length become bigger and bigger the DML delay value may be calculated way higher than 5ms(!) - and once there is no more reads in the database one DML statement may be delayed for several minutes(!) and break throughput stability! - that's why I think I've found several posts in forums about possible InnoDB freeze when innodb_max_purge_lag option is used...
- second - I don't see any reason why we cannot delay DML statements for few milliseconds even if there are some "consistent reads" running on the database?.. - what is the problem?.. - probably I'm missing something here, but I'll try the code without checking UT_LIST_GET_LAST() condition.
Updated code:
/* If we cannot advance the 'purge view' because of an old
'consistent read view', then the DML statements cannot be delayed.
Also, srv_max_purge_lag <= 0 means 'infinity'. *///(dim) if (srv_max_purge_lag > 0 && !UT_LIST_GET_LAST(trx_sys->view_list)) if (srv_max_purge_lag > 0 )
{
float ratio = (float) trx_sys->rseg_history_len
/ srv_max_purge_lag;
if (ratio > ULINT_MAX / 10000) {
/* Avoid overflow: maximum delay is 4295 seconds */
srv_dml_needed_delay = ULINT_MAX;
} else if (ratio > 1) {
/* If the history list length exceeds the
innodb_max_purge_lag, the
data manipulation statements are delayed
by at least 5000 microseconds. */
srv_dml_needed_delay = (ulint) ((ratio - .5) * 10000);
} if( srv_dml_needed_delay > srv_dml_needed_delay_max)
srv_dml_needed_delay_max= srv_dml_needed_delay;
}I've also added a srv_dml_needed_delay_max variable to keep the max DML delay value setting during the last loop.
Testing updated code |
To make shorter I'll skip all various tests I've done to find the most optimal combination. I'll just present a short summary before to go more far.Purge lag summary (with updated code!):
- from my observations the main benefit of setting innodb_max_purge_lag variable is making the final dirty pages flushing (once the one hour R+W workload id finished) much more shorter! - instead of 5-15 minutes I've got stable 1-2 minutes(!)
- the most optimal setting was innodb_max_purge_lag=200000 in my case (as well history len did not outpass this value during whole test! - and I remind you it jumped to 8 millions before!)
- max DML delay never outpassed 5ms
However :
- there is still something wrong with dirty pages recycling (but Mark Callaghan seems already to look closely on this issue)
- as well as before TPS level is lower with 128MB redo log, but throughput is more stable
- and with 1024M TPS level is higher, but generates a periodic "furious flushing"...
Let's see what else may be done here...
Improving Dirty Pages Flushing code |
As I explained before during my previous tests, the Master Thread on the high Read+Write workload is mostly looping on the trx_purge() call and may even not reach the code checking if innodb_max_dirty_pages_pct value is reached (nor Adaptive checkpoint code added by Percona team)...After monitoring dirty pages flush calls with DTrace script, I may list here the buf_flush_batch() callers:
- srv_master_thread() - when everything is working as expected (but not on high load)
- buf_flush_free_margin() - when buffer pool has no more free pages
- log_check_margins() - when redo log has no more free space, so the oldest dirty pages must be flushed to disk to remove corresponding data within a redo log
The most "furious" flashing happens when buf_flush_batch() is called by log_check_margins() - the demand is so urgent and flushing is so important that it creates a partial freeze of workload - all InnoDB time is spent to flush a part of data as soon as possible and then all depends on the disk storage subsystem performance...
What may be changed here?..
From my point of view, when flushing is involved from log_check_margins() function it's already too late - data should be flushed ahead to avoid an I/O burst! Analyzing the Master Thread code I've splitted it into 3 critical parts:
- Part-1 : if the load is not high enough, the percentage of dirty pages (innodb_max_dirty_pages_pct) is verified here, and if percentage was outpassed - pages are flushed with 100% IO_CAPACITY setting; I've tested some low values here too, but in this case flushing become very aggressive ans slow down global performance...
- Part-2 : here a page flushing is involved ahead if there is no too much activity within a database...
- Part-3 : is the most critical place, because if the load is very high, the Main Thread is never leaving the purge loop! And from my point of view, a kind of Ahead Flushing should be involved here!...
Initial code of purge loop :
n_pages_purged = 1;
last_flush_time = time(NULL);
while (n_pages_purged) {
if (srv_fast_shutdown && srv_shutdown_state > 0) {
goto background_loop;
}
srv_main_thread_op_info = "purging";
n_pages_purged = trx_purge();
current_time = time(NULL);
if (difftime(current_time, last_flush_time) > 1) {
srv_main_thread_op_info = "flushing log";
log_buffer_flush_to_disk();
last_flush_time = current_time;
srv_sync_flush++;
}
}
My additional changes :
- changes are involving ahead flushing within a purge loop
- check if the half of the innodb_max_dirty_pages_pct value is reached
- if so, queue page flushing with 20% of IO_CAPACITY
- ideally setting of these condition values should be available via my.conf
n_pages_purged = 1;
last_flush_time = time(NULL);
while (n_pages_purged) {
if (srv_fast_shutdown && srv_shutdown_state > 0) {
goto background_loop;
}
srv_main_thread_op_info = "purging";
n_pages_purged = trx_purge();//(dim)
if (UNIV_UNLIKELY(buf_get_modified_ratio_pct()
> srv_max_buf_pool_modified_pct/2 )) {
n_pages_flushed = buf_flush_batch(BUF_FLUSH_LIST,
PCT_IO(20), ut_dulint_max);
skip_sleep = TRUE;
}
current_time = time(NULL);
if (difftime(current_time, last_flush_time) > 1) {
srv_main_thread_op_info = "flushing log";
log_buffer_flush_to_disk();
last_flush_time = current_time;
srv_sync_flush++;
}
}
Re-Testing with Ahead Flushing enabled |
Combining Purge Lag changes + Ahead Flushing, I've got the following result:
- very short final flushing once database activity is stopped (purge lag = 200000)
- due Ahead Flushing, stable background flushing of dirty pages, so log_check_margins() is never involved!
- as log_check_margins() is never involved, the "furious" flushing is gone ! :-) and my TPS throughput become very stable! (finally :-))
- very stable dirty page percentage
Following few graphs are comparing 2 cases: 128MB redo log (stable, but aggressive) and Ahead Flushing with 1024MB redo (performance level equal to 1024MB redo log, less aggressive and without periodic freezes due "furious flushing" :-))
Setting used in my.conf :
- innodb_max_dirty_pages_pct = 60
- innodb_max_purge_lag = 200000
- innodb_log_file_size = 128M or 1024M + AheadFlushing enabled
TPS level and Response times |
Observations :
- as seen before, TPS level is higher with 1024M redo log, but with Ahead Flushing it also become stable now!
- response times are better from the beginning, while with 128M log it needs yet some time to become stable
- NOTE : 128M configuration benefits here the improvement done by my changes on Purge Lag (!)
I/O Activity and DML delays |
Observations :
- curiously I/O activity (writing) is much lower with Ahead Flushing enabled...
- and I may explain it only by supposition there are pages which are modified several times and with less aggressive flushing we're not flushing all page versions but only the latest one!
- as well DML-delays are less frequent with Ahead Flushing (but more accurate measurement will be needed to confirm it)..
Buffer Pool and Dirty Page flushing |
Observations :
- dirty pages level is staying stable in both configurations
- 128M : aggressive flushing involved from log_check_margins() on the first half of test, then due Purge Lag changes flushing progressively coming back to the "normal"
- 1024M + AheadFlushing: log_check_margins() is never called, flushing is less aggressive and called only from the Master Thread during a whole test duration!
SUMMARY |
Based on the presented performance study, here is my list of changes I'm suggesting to apply on MySQL 5.4 InnoDB engine.Purge Lag :
- verify if checking of presence of "consistent read" is really needed within a purge lag code (what do you think?)
- add a kind of innodb_max_dml_delay configuration parameter to avoid any unexpected long DML delays!
- add indicators about History length and DML delays into "show innodb status" output!
Ahead FLushing : (if the idea is accepted :-))
- add a kind of innodb_ahead_dirty_pages_pct - a percentage of dirty pages level when Ahead Flushing should be activated
- add a kind of innodb_ahead_io_capacity_pct - an I/O capacity percentage to use when Ahead Flushing is activated
- add indicators about flushing calls (function names and number of calls) into "show innodb status" !!!
Any comments are welcome! :-))