Created: 2009-06-08
Last modified: 2009-06-30




MySQL Performance: InnoDB Purge & Ahead Flushing






by Dimitri


SSC Team, 2009
Sun Microsystems Inc.





Ces informations sont données à titre indicatif et n'engagent pas Sun Microsystems.

Table of contents



Benchmark Information
Customer Name(s): SSC Team

NDA: no

Contact Information:
     - dimitri (at) sun.com

Dates: Jun.2009

Keywords: MySQL, InnoDB, Ahead Flushing, purge, purge lag, innodb_max_purge_lag, log size, dirty pages, performance

Hardware Configuration
Server(s):
     - M5000 8CPU SPARC64-VII (quad-core bi-thread) 2400Mhz, 64GB RAM, 2x FC-port 4Gbit

Storage:
     - ST6140 2x LUN (500GB RAID1 8HD), each LUN is connected by its own fiber channel to the host

Software Configuration
System:
     - Solaris 10 u6
     - UFS

Application(s):
     - MySQL 5.4
     - dbSTRESS (injector)


Abstract
Overview: After publishing in May a benchmark report about InnoDB Dirty pages & Log size impact I received by email several very interesting comments, and one of them pointed to the purge lag problem and InnoDB innodb_max_purge_lag parameter created specially for such purpose! So, I was very curious to know how it may help me in my workload...

The following study is about how innodb_max_purge_lag is working now and what may be done to make things better :-))

Goal(s): I'm not pretending here on any absolute truth :-) My goal is to understand what's going on with purge on my workload and see if there is something else possible to improve..

Result(s): are following :-)


Benchmark Scenario
My benchmark scenario will be exactly the same as in my previous test - and by chance I'm still having exactly the same server :-))

To reproduce the same problem again (if it still exists of course :-)) I've prepared a follow dbSTRESS scenario:

Test Scenario :

From my previous tests I observed that the max throughput on 16cores is generally reached with 32 sessions (both with innodb thread concurrency equal to 0 or 16). So, other settings for this scenario:

Config :

Each test is executed :

For every test to check :

However, I'm concentrating only on MySQL 5.4 testing now.

InnoDB Purge Lag
According the InnoDB documentation (and as well InnoDB code :-)) innodb_max_purge_lag variable controls how to delay INSERT, UPDATE, and DELETE operations when purge operations are lagging. The default value 0 (no delays).

The InnoDB transaction system maintains a list of transactions that have delete-marked index records by UPDATE or DELETE operations. Let the length of this list be purge_lag. When purge_lag exceeds innodb_max_purge_lag, each INSERT, UPDATE, and DELETE operation is delayed by ((purge_lag/innodb_max_purge_lag)×10)–5 milliseconds. The delay is computed in the beginning of a purge batch, every ten seconds. The operations are not delayed if purge cannot run because of an old consistent read view that could see the rows to be purged.

However, as I explained before , purge routine is not called as expected, Main Thread is looping all the time on trx_purge() - but let's see more in depth how it works.

Purge lag is also called as a Rollback segment History length within InnoDB code, and it's the name I prefere :-))

InnoDB Code Instrumentations
To be able to monitor InnoDB internals, I've added following stats:

the added code is looks like:

 fputs("--------------\n"

      "PURGE STATUS\n"
      "--------------\n", file); fprintf( file, "History len: %d\nDML Delay: %d max: %d\n",
     trx_sys->rseg_history_len, srv_dml_needed_delay, srv_dml_needed_delay_max ); srv_dml_needed_delay_max= 0;

As DML-delay (srv_dml_needed_delay) may vary a lot at any time, I've added a DML-delay-max value (srv_dml_needed_delay_max) - it's keeping the max delay value reached until current "show innodb status" call (and set to zero after).

First Observations
As you may see from my previous tests , a smaller redo log size gives a better TPS stability on InnoDB. While a bigger redo log gives a higher performance level, but - generates a periodical burst data flushing which may have a partial freeze of throughput as a side effect...

And even I don't like too much any kind of artificial limit (storage box is even not fully used here), but probably an innodb_max_purge_lag option may bring little bit more stability?..

Let's see..

First test :

For my big surprise, History length outpassed 8.000.000 within all cases and there was NO DIFFERENCE between these tests..

Why?..

Analyzing the code
Let's get a look on the InnoDB code where the innodb_max_purge_lag value is analyzed.

File : innobase/trx/trx0purge.c


      /* If we cannot advance the 'purge view' because of an old
      'consistent read view', then the DML statements cannot be delayed.
      Also, srv_max_purge_lag <= 0 means 'infinity'. */

      if (srv_max_purge_lag > 0 && !UT_LIST_GET_LAST(trx_sys->view_list))
      {
      float ratio = (float) trx_sys->rseg_history_len
      / srv_max_purge_lag;
      if (ratio > ULINT_MAX / 10000) {
      /* Avoid overflow: maximum delay is 4295 seconds */
      srv_dml_needed_delay = ULINT_MAX;
      } else if (ratio > 1) {
      /* If the history list length exceeds the
      innodb_max_purge_lag, the
      data manipulation statements are delayed
      by at least 5000 microseconds. */
      srv_dml_needed_delay = (ulint) ((ratio - .5) * 10000);
      }
      }

As you may see an innodb_max_purge_lag is considered only in case when UT_LIST_GET_LAST(trx_sys->view_list) returns zero! - it means there is no more any "consistent read" running and needing "unpurged" data. Tracing this condition I discovered it never returns zero until there is at least one select running within a database! So, my purge lag setting was simply ignored !

What is dangerous here?

Updated code:


      /* If we cannot advance the 'purge view' because of an old
      'consistent read view', then the DML statements cannot be delayed.
      Also, srv_max_purge_lag <= 0 means 'infinity'. */

//(dim) if (srv_max_purge_lag > 0 && !UT_LIST_GET_LAST(trx_sys->view_list)) if (srv_max_purge_lag > 0 )
      {
      float ratio = (float) trx_sys->rseg_history_len
      / srv_max_purge_lag;
      if (ratio > ULINT_MAX / 10000) {
      /* Avoid overflow: maximum delay is 4295 seconds */
      srv_dml_needed_delay = ULINT_MAX;
      } else if (ratio > 1) {
      /* If the history list length exceeds the
      innodb_max_purge_lag, the
      data manipulation statements are delayed
      by at least 5000 microseconds. */
      srv_dml_needed_delay = (ulint) ((ratio - .5) * 10000);
      } if( srv_dml_needed_delay > srv_dml_needed_delay_max)
      srv_dml_needed_delay_max= srv_dml_needed_delay;

      }

I've also added a srv_dml_needed_delay_max variable to keep the max DML delay value setting during the last loop.

Testing updated code
To make shorter I'll skip all various tests I've done to find the most optimal combination. I'll just present a short summary before to go more far.

Purge lag summary (with updated code!):

However :

Let's see what else may be done here...

Improving Dirty Pages Flushing code
As I explained before during my previous tests, the Master Thread on the high Read+Write workload is mostly looping on the trx_purge() call and may even not reach the code checking if innodb_max_dirty_pages_pct value is reached (nor Adaptive checkpoint code added by Percona team)...

After monitoring dirty pages flush calls with DTrace script, I may list here the buf_flush_batch() callers:

The most "furious" flashing happens when buf_flush_batch() is called by log_check_margins() - the demand is so urgent and flushing is so important that it creates a partial freeze of workload - all InnoDB time is spent to flush a part of data as soon as possible and then all depends on the disk storage subsystem performance...

What may be changed here?..

From my point of view, when flushing is involved from log_check_margins() function it's already too late - data should be flushed ahead to avoid an I/O burst! Analyzing the Master Thread code I've splitted it into 3 critical parts:

Initial code of purge loop :


      n_pages_purged = 1;
      last_flush_time = time(NULL);

      while (n_pages_purged) {
      if (srv_fast_shutdown && srv_shutdown_state > 0) {
      goto background_loop;
      }

      srv_main_thread_op_info = "purging";
      n_pages_purged = trx_purge();

      current_time = time(NULL);

      if (difftime(current_time, last_flush_time) > 1) {
      srv_main_thread_op_info = "flushing log";

      log_buffer_flush_to_disk();
      last_flush_time = current_time;
      srv_sync_flush++;
      }
      }



My additional changes :


      n_pages_purged = 1;
      last_flush_time = time(NULL);

      while (n_pages_purged) {
      if (srv_fast_shutdown && srv_shutdown_state > 0) {
      goto background_loop;
      }

      srv_main_thread_op_info = "purging";
      n_pages_purged = trx_purge();

//(dim)
      if (UNIV_UNLIKELY(buf_get_modified_ratio_pct()
      > srv_max_buf_pool_modified_pct/2 )) {

      n_pages_flushed = buf_flush_batch(BUF_FLUSH_LIST,
      PCT_IO(20), ut_dulint_max);
      skip_sleep = TRUE;
      }


      current_time = time(NULL);

      if (difftime(current_time, last_flush_time) > 1) {
      srv_main_thread_op_info = "flushing log";

      log_buffer_flush_to_disk();
      last_flush_time = current_time;
      srv_sync_flush++;

      }
      }


Re-Testing with Ahead Flushing enabled
Combining Purge Lag changes + Ahead Flushing, I've got the following result:

Following few graphs are comparing 2 cases: 128MB redo log (stable, but aggressive) and Ahead Flushing with 1024MB redo (performance level equal to 1024MB redo log, less aggressive and without periodic freezes due "furious flushing" :-))

Setting used in my.conf :



SUMMARY
Based on the presented performance study, here is my list of changes I'm suggesting to apply on MySQL 5.4 InnoDB engine.

Purge Lag :

Ahead FLushing : (if the idea is accepted :-))

Any comments are welcome! :-))