« MySQL Performance: Reaching 100M(!) Transactions/sec with MySQL 5.5 running on Exadata! | Main | MySQL Performance: 5.6 Notes, part 2 - Under full dbSTRESS workload... »

Monday, 11 April, 2011

MySQL Performance: 5.6 Notes, part 1 - Discovery...

I was lucky to discover MySQL 5.6 features ahead and here are my notes about performance improvements and pending issues observed on my benchmark workloads.. The story finally was very passionate and that's why very long for a one blog post :-)) so I've decided to split it into several parts and will publish all of them over a day..

Here is the first part, Discovery of MySQL 5.6.2 as it..

The new MySQL 5.6 version brings many "fundamental" improvement within InnoDB code:
  • first of all a split of the "kernel_mutex" was introduced! - it was one of the most hot contentions observed in MySQL 5.5 and limited InnoDB performance in so many cases..
  • purge processing became multi threaded! - so we may expect to not see purge lagging anymore or reduce it considerably on heavy writes..
  • page cleaning and flushing activity was removed from the Master Thread and assigned to the Page Cleaner (dedicated thread)
  • METRICS table was introduced to make monitoring easier ;-))
  • Performance Schema was improved and extended with a lot of new stuff..
There are other new features as well, but the listed ones were in my top priority list to test them and see what kind of new limits or contentions we'll discover now in 5.6 :-)) - as I always saying to our customers when they are coming in our Benchmark Center - "you may have a brilliant idea, but it may still be killed by poor or not optimal implementation.."

HW Platform - For my tests I've used a 48 cores AMD box running Oracle Linux 5.5 and having fast enough I/O level (Sun's Flash cards were used here, but I'm skipping any details about because I'm not seeking for I/O performance, but will stress MySQL to see existing contentions within its code..) - so I'm not using O_DIRECT to not enter into any I/O issues I've observed before in the previous tests :-))

SW Configuration : I did not use any specific system settings different from "default", and again (to be compatible) used the EXT4 filesystem. The following "base" MySQL configuration settings was used:

[mysqld]
#-------------------------------------------------------
max_connections=2000
key_buffer_size=200M
low_priority_updates=1
sort_buffer_size = 2097152
table_open_cache = 8000
# files
innodb_file_per_table
innodb_log_file_size=1024M
innodb_log_files_in_group=3
innodb_open_files=4000

# buffers
innodb_buffer_pool_size=16000M
innodb_buffer_pool_instances=16
innodb_additional_mem_pool_size=20M
innodb_log_buffer_size=64M

# tune
innodb_checksums=0
innodb_doublewrite=0
innodb_support_xa=0
innodb_thread_concurrency=0
innodb_flush_log_at_trx_commit=2
### innodb_flush_method= O_DIRECT
innodb_max_dirty_pages_pct=50
### innodb_change_buffering= none

# perf special
innodb_adaptive_flushing=1
innodb_read_io_threads = 16
innodb_write_io_threads = 16
innodb_io_capacity = 4000
innodb_purge_threads=1
### innodb_max_purge_lag=400000
### innodb_adaptive_hash_index = 0

# transaction isolation
### transaction-isolation = REPEATABLE-READ
### transaction-isolation = READ-COMMITTED
### transaction-isolation = READ-UNCOMMITTED

# Monitoring via METRICS table since MySQL 5.6
innodb_monitor_enable = '%'
#-------------------------------------------------------
All these settings were used "by default" within every test except if there are some other details are given..

Load Generator : don't think you'll be surprised I've used dbSTRESS :-)) (as usual).. My initial plan was to test all the workloads with various scenarios, and specially I was curious to see what will be the next bottleneck on the workloads which were limited before by the contention on the "kernel_mutex" (you may get a look on my previous posts about)..

How to read the following graphs :
  • first of all, there is a title on each graph ;-)
  • then the legend on the right side explaining the meaning of curves..
  • and then, if nothing else specified, I've tested a step by step growing workload, and on each step there were x2 times more concurrent user sessions than on the previous one..

So far, time is for testing now ;-)


First bad surprise..

Well, the first bad surprises came from the beginning on the yet first Read-Only probe tests - MySQL 5.6 was way slower than 5.5!.. and the main killing factor was an incredibly grown contention on the "btr_search_latch" mutex..

I was very surprised to see it so hot on the Read-Only workload.. - well, not really, as in my understanding this contention usually appeared until now when the new data pages were loaded into the buffer pool, so even it was a Read-Only activity, the contention was still observed until most of the pages were not kept in the buffer pool..

My feeling here that before the split, "kernel_mutex" limited impact of "btr_search_latch" as instead of one contention there were two contentions :-)) but now, when the "kernel_mutex" contention is gone, sessions are beating more hard for the "btr_search_latch".. - and I don't know why this contention is still remaining even once all accessed pages have been already loaded into the buffer pool?..

Here are TPS and Mutex Waits graphs from the beginning of the Read-Only test, just from 1 to 16 users, and it's clearly seen performance degradations is already starting since 16 users:


On the workload scenario which was previously blocked by the "kernel_mutex" contention in MySQL 5.5, the things are not yet better on 5.6 due the "trx_sys" mutex contention now:


The workload is still the same Read-Only scenario, but we're executing only "simple" queries here (SEL1 - a single line join).. As you see, on 5.5 performance is limited by growing kernel_mutex waits. But on 5.6 we're hitting "trx_sys" contention since 32 concurrent sessions..

On the same time I was curious why this issue was not seen before on other workloads like sysbench for ex., and realized that you need to have at least a server with 24cores. For example, if I'm limiting MySQL to only 16 cores, the issue near to be gone (which is normal, as I'm forcing a limiting of a concurrency)..

Here is the graph of the same test but having MySQL server running on 48, 16 and 24 cores:

As you see, trx_sys contention can be reduced here by limiting MySQL to 16 cores, but I cannot consider it as a solution.. MySQL 5.6 should scale far away comparing to 5.5, so let's forget about workarounds :-))


Next step: the "btr_search_latch" mutex is related to the Adaptive Hash Index (AHI) feature. Many people are advising to disable AHI and remove any use of "btr_search_latch" mutex within InnoDB code, and some even reporting to have a better performance when the AHI is turn OFF.. - Well, personally, until now, I saw only performance degradation with AHI=OFF, but things may be different in different workloads.. In my case it's not helping (even the result is better comparing to the initial one, but still worse than with 5.5):

Retesting with innodb_adaptive_hash_index=OFF :



yes, it's removing "btr_search_latch" bottleneck!!

then, I expected to see the buffer pool contention as it was in 5.5 before.. - but if I remember well, this part of code was improved in 5.6, so on 16 sessions TPS continued to increase ;-)) but then on 32 sessions TPS was dropped due "index mutex" locks (initially created within a dictionary).. - then with a growing load the "trx_sys" contention is killing performance completely..


MySQL 5.6.2-latest version

Finally I was lucky to discover I did not use the "latest" 5.6 trunk - as you can see the things are changing very quickly in the 5.6 development, and the issues I was observed were already fixed one week before.. :-))

Retesting with the MySQL 5.6.2-latest version:


Observations:
  • high contention on the btr_search_latch mutex since 16 sessions is gone!
  • contention on btr_search_latch mutex is starting now since 32 concurrent sessions as before..
  • since 64 sessions the trx_sys mutex contention is added too..
  • since 256 sessions the trx_sys mutex contention is dominating..
  • disabling A.H.I is completely removing btr_search_latch mutex contention, but it's not improving performance:
    • even before any other contention is coming in game, the TPS level is already lower when AHI=OFF
      (for ex.: on 8 sessions TPS is 7.500 with AHI=OFF, while it's 10.000 with AHI=ON)
    • then we hit a contention on the buffer pool index..
      NOTE: buffer pool mutex contention may be very random - all depends how the data arrived into the buffer pool.. - if one of the BP instances is more accessed than others, we're getting back a situation similar to a single BP instance..
    • then once again the trx_sys mutex is taking a dominating position..

Questions:
  • why do we still have a contention on the btr_search_mutex even once all accessed pages are already loaded to the buffer pool ???
  • any idea about reducing the trx_sys mutex contention ??..

An now retesting the SEL1 Read-Only:

It's way better than before, but performance is still limited by the "trx_sys" mutex contention..
NOTE: testing with READ COMMITTED and READ UNCOMMITTED transaction level isolations did not change absolutely nothing..

And now the same test on 5.6-latest but with a MySQL server limited to 24 and 32 cores:


Observations :
  • on 24 cores the ~max TPS level is kept up to 512 concurrent sessions
  • on 32 cores it's up to 256 sessions only..
  • on 48 cores performance is already decreasing since 64 sessions..

And now retesting initial Read-Only also on 16, 24 and 32 cores:
What is interesting: limiting thread concurrency by binding MySQL process to less cores even improved the peak performance obtained by 5.6 on this test!..

To be continued.. (see part #2)..



Posted by Dimitri at 15:56
Categories: MySQL
blog comments powered by Disqus
Note: if you don't see any "comment" dialog above, try to access this page with another web browser, or google for known issues on your browser and DISQUS..