Friday, 15 August, 2014

MySQL Performance: Analyzing LinkBench Workload on MySQL 5.7 and MariaDB 10.1

Reading the article published on MariaDB site about performance evaluation of MySQL 5.7 -vs- MariaDB 10.1 I've got a double feeling: from one side I have no any reason to have a doubt in obtained results, and from another side - my feeling says me there is something going odd.. But well, I don't have any attention to blame anybody (and sorry if you think so) -- my willing here is only to understand what is going on, why, and what are the challenges on the observed test workload(s). I'll cover here only my observations on the LinkBench workload as it was not yet enough tested and involving many questions around..

LinkBench Benchmark :

  • developed by Facebook Team, so represents a true real life workload
  • written in Java
  • from the beginning since it creation was reported as very sensible to the index lock contention in InnoDB
  • for this reason was modified to have partitions in the most hot table (to split a single index contention)
  • this helped to improve performance, but index lock contention remained limiting..
  • the issue with InnoDB index lock was fixed only in MySQL 5.7 giving a significant performance boost in LinkBench workload


What is looking odd for me in MariaDB's test results :
  • indeed, having compression feature within any database engine is very valuable
  • however, compression by itself cannot improve an overall performance (except if you have some limitation on other levels, like slow storage, etc.)..
  • and on extremely fast Fusion-IO flash storage there is no doubt that performance result will be better when compression is not used, and it's exactly what we see in the presented results..
  • but then it's hard to believe for me that without using any compression MySQL 5.7 which is having the fix for InnoDB index lock contention is running slower than MariaDB 10.1 which is based on InnoDB from MySQL 5.6 and not having such a fix..

Well, I may be wrong as well, but the following is what I'm observing on LinkBench workload on my system.

Configuration :
  • Linux box with 40cores-HT 2.3Ghz running Oracle Linux 6.5
  • Fusion-IO flash storage using NVMFS filesystem
  • LinkBench database of 150G

Tuning :
  • let's start with a Buffer Pool (BP) = 50% of database size to follow MariaDB test conditions (so, 75G in my case)
  • I'm usually using 32 BP instances
  • as BP will not be able to keep the whole data set, we may expect constant IO reads
  • so a high activity on LRU to expect + LRU flushing
  • means LRU depth should be tuned well, I'll set it to 4000 (not too big, not too small, as it's per BP instance, this will give me 32 x 4000 free pages amount to expect which should cover a potential page IO read speed from Fusion-IO)
  • REDO log size is also important, as a bigger REDO is allowing to delay page flushing and keep flushing/checkpoint activity more smooth -- 12GB REDO here seems to be enough
  • adaptive hash index (AHI) -- in most RW workloads it's better to have disabled (and even in some RO), but curiously here has no impact at all..
  • using AIO + O_DIRECT is the must
  • all other setting is just following general "best practices" for most of workloads..
  • Performance Schema (PFS) is enabled + mutex instrumentation too



TEST #1 : LinkBench-150G 64users with 75GB Buffer Pool

There are 5 test cases executed in the following order:
  • #1 - MySQL 5.7 no doublewrite (innodb_flush_log_at_trx_commit=1, checksums crc32, innodb_doublewrite=0)
  • #2 - MySQL 5.7 secure ((innodb_flush_log_at_trx_commit=1, with checksums crc32, innodb_doublewrite=1)
  • #3 - MariaDB 10.1 no doublewrite (same options as MySQL 5.7 in #1)
  • #4 - MariaDB 10.1 secure
  • #5 - MariaDB 10.1 with atomic writes (without doublewrite, but atomic writes instead)

Each test case was running during 30min (if was several hours initially, but then I've decided to reduce the test duration time --because once the load became stable on the beginning, then not too much is happening later, and I don't have unlimited time ahead ;-))


Observations :
  • without doublewrite MySQL 5.7 is near x2 times faster than MariaDB 10.1 (20K TPS -vs- 12K TPS)
  • within "secure" configuration MySQL 5.7 is loosing 25% in performance (20K -> 15K TPS)
  • interesting that this workload is pretty sensible to REDO flushing (sync)
  • but the main overhead is coming from a doublewrite anyway..
  • however, as you can see, it's still better than MariaDB
  • interesting to see that atomic writes in MariaDB are not lowering too much TPS
  • so, once atomic writes will be available in MySQL 5.7 I may expect the same 20K TPS in secure config here
  • NOTE: to avoid doublewrite overhead you may also use O_DSYNC flush setting combined with EXT4 data journaling as it was recently demonstrated by Percona

So far, why MariaDB is getting a lower TPS than MySQL 5.7 ?..

Let's get a look on what Performance Schema is reporting:

Observations :
  • MariaDB 10.1 is hitting a severe contention on the index lock
  • while MySQL 5.7 is not having it anymore
  • and that's why MariaDB is running slower on this workload..


Let's check there is nothing abnormal going with page flushing :

Observations :
  • Checkpoint Age did not reach any critical level (max 7GB from 12GB in REDO was used)
  • no LRU single page flush either, so the amount of free pages was always enough..


What is the page IO read rate in this workload?


Observations :
  • MySQL 5.7 is requiring 20K / 18K free pages/sec
  • while MariaDB 10.1 is only 10K..
  • the excessive page scanning seen in MariaDB is fixed in MySQL 5.7 (still remains in MySQL 5.6, but in fact not impacting a final performance)

Well, all observed is confirming my initial worry.

Now, let's see how both engines will perform within "expected" test conditions (Facebook Team is mentioning in LinkBench notes that the data set should be bigger at least x10 times than a Buffer Pool size to reproduce their environment).. -- and for sure, having only 16GB Buffer Pool within the same workload will involve much more page IO reads, where storage IO level will play a huge role.


TEST #2 : LinkBench-150G 64users with 16GB Buffer Pool


Observations :
  • without a doublewrite MySQL 5.7 is out passing MariaDB 10.1 by x2.5 times (10K TPS -vs- 4K TPS)..
  • in a "secure" mode MySQL 5.7 is still x1.5 times better (over 6K TPS -vs- 4K TPS)
  • NOTE: I've also replayed the MariaDB test with using its multithreaded flushing feature (innodb_mtflush_threads=16, innodb_use_mtflush=1) -- this is helping to reach 4500-5000 TPS, but still not enough to attend the MySQL 5.7 level..

What about internal contentions?

Observations :
  • MySQL 5.7 is blocked mainly by fil_system mutex and log_sys mutex contentions
  • fils_system mutex contention is killing on all IO-bound workloads (and that's why testing Read-Only workloads is important too to better understand the problems on Read+Write ;-))
  • bunch of other lock contentions is blocking MariaDB here..

Any flushing related issues?


Observations :
  • Checkpoint Age did not reach even 2GB
  • no LRU single page flush either
  • all ok..

What about page IO read rate?

Observations :
  • free page rate is much higher, but not enormous (I've already observed 70K-80K pages/sec on this server with Fusion-IO on IO-bound workloads)
  • page rate on MariaDB is lower than on MySQL 5.7


Well, yet another reason why MySQL 5.7 just rocks and will be the next the best ever MySQL release :-)

I'm not pretending either the config setting I'm using and workload conditions are the most optimal here. I'm just sharing what I'm observing and curious for any input/idea about how this workload performance could be improved..

Few other notes:
  • Compression : indeed, compression on IO level is much more simple an optimal for any application (not only databases), and more and more will come over a time for sure with a goal to reach as max as possible a performance level of "uncompressed" workload..
  • LinkBench : seems to me needs some more love and more testing.. - would be happy to discuss with developers and any other observations related to this workload. It give me an impression to be "self limited" as with 64 concurrent users CPU is not really used, looks like there are some internal waits are happening during processing. I've also suspected garbage collection time on JVM level, but monitoring JVM stats showed that it's not so.. Open to any discussion ;-)

MySQL configuration setting I've used :
[mysqld]
  max_connections=4000

# myisam
  key_buffer_size=4000M
  ft_max_word_len = 16
  low_priority_updates=1

# general
  table_open_cache = 8000
  table_open_cache_instances=16
  back_log=1500
  query_cache_type=0

# files
  innodb_file_per_table
  innodb_log_file_size=1024M
  innodb_log_files_in_group=12
  innodb_open_files=4000

# buffers
  innodb_buffer_pool_size= 75000M / 16000M
  innodb_buffer_pool_instances=32
  innodb_log_buffer_size=64M

# tune
  innodb_checksums=1
  innodb_checksum_algorithm=crc32
  innodb_doublewrite= 0 / 1 
  innodb_support_xa=0
  innodb_thread_concurrency=0
  innodb_flush_log_at_trx_commit=1
  innodb_flush_method=O_DIRECT
  innodb_max_dirty_pages_pct=90
  innodb_max_dirty_pages_pct_lwm=10
  innodb_lru_scan_depth=4000
  innodb_page_cleaners=4

  join_buffer_size=32K
  sort_buffer_size=32K
  innodb_use_native_aio=1
  innodb_stats_persistent = 1
  innodb_spin_wait_delay=6

# perf special
  innodb_adaptive_flushing = 1
  innodb_flush_neighbors = 0
  innodb_read_io_threads = 16
  innodb_write_io_threads = 16
  innodb_io_capacity=15000
  innodb_purge_threads=4
  innodb_max_purge_lag_delay=30000000
  innodb_max_purge_lag=1000000
  innodb_adaptive_hash_index=0

# Monitoring
  innodb_monitor_enable = '%'
  performance_schema=ON
  performance_schema_instrument='%sync%=on'

# MariaDB specific:
  innodb_compression_algorithm=0
#  innodb_mtflush_threads=16
#  innodb_use_mtflush=1
# with atomic writes :
  innodb_use_fallocate=1 
  innodb_use_atomic_writes=1



Any comments are welcome! ;-)

Rgds,
-Dimitri
Posted by Dimitri at 17:44 - Comments...
Categories: MySQL

Saturday, 12 July, 2014

dim_STAT v.9.0-u14 Core Update is here!

I'm happy to present you the latest CoreUpdate-14 for dim_STAT v.9.0 !

This update includes many minor fixes and small improvements making life better (like few more options on CLI scripts, more smart logic on dim_STAT-Server start/stop, few more options in web interface, updated look & feel, etc. etc.) - well, all these changes are inspired by my own and many other users daily usage of this tool. So, indeed, it remains alive and still helpful ;-)

However, there is a completely new feature I'd present to you which is coming with CoreUpdate-14 - and it's calling "Screenshots" ;-)

What are Screenshots? - in fact this feature was inspired by a more and more often need to save somewhere all or most of the graphs I'm seeing right now, while analyzing collecting data, rather generate them later when the work is done and observed issue was solved.. ReportTool is great for the post-analyze work, or for a straight forward reporting, etc. (you know exactly the time intervals you want to see, stats to graph, and so on) -- while during a live performance analyze you may have many intermediate graphs you're looking on which are part of the whole story board during your investigation, and they are all important as well as the base of your step-by-step logic and your final conclusion..

So far, Screenshots :

  • at any time you see any PNG graphs on your dim_STAT web page you'll also a small [Screenshot] button on the bottom of the page
  • as soon as you click on [Screenshot] button a new Tab/Window will be opened within your web browser and containing all the graphs you're currently seeing + the 2 special fields: Title and Notes to allow you to add some annotations related to your graphs (as well you may select all the graphs or only a part of them)..
  • similar to ReportTool, the wiki-style syntax is allowed within Notes content, while HTML syntax is also allowed (see details from the dim_STAT UsersGuide)
  • once editing is finished, you click on [Save] and your given Snapshot is saved
  • each Snapshot has a [PermaLink] (a static URL link you may use to share your Snapshot page with any users having an access to your dim_STAT web server)
  • at any time you may re-edit your Snapshot once again, or duplicate it to make a different version (ex.: less graphs, shorter Notes, etc.)
  • you can delete Snapshots as well, but be careful - there is no restore ;-)
  • also, Snapshots data are partially saved within your database, so they are depending on it, and from one database you cannot access Snapshots of another one (except you're clicking on a [PremaLink], but this will just print you a static document, without any editing possibility)..

The Snapshots page is looking like that:


So, at any time you may select to see :
  • the latest N Screenshots
  • the Screenshots matching a Title pattern and ordered by Time or Title, Ascending or Descending..
  • link to [PermaLink] is pointing to a URL with a static page with Snapshot content
  • link to [tar.Z] is pointing to a compressed TAR archive containing the whole Snapshot data (so can be sent as it and then deployed on any other computer as an HTML document)..

Publishing Snapshots :
  • at any time one or several Snapshots may be selected for publishing
  • a published document is then no more depending on a database, has its own Title and Notes, and containing all the data from the selected Snapshots
  • once generated, a published document will have as well its own [PermaLink] and [tar.Z], but also a [PDF] link as sometimes sharing a PDF document may be more preferable ;-)
  • the published document may still be re-edited in place (then PDF and tar.Z will be re-generated on every Save)
  • keep in mind that all data of a published document are kept only locally within its own directory and nowhere else (nothing in database, etc.) - so, re-editing is really implemented in place and based on content files
  • saving a document under a different title will create a new document leaving the original as it
  • re-editing can be disabled for a given document if its directory contains a file with a name ".noedit"
  • re-editing of all published documents may be disabled globally if the "pub" (upper) directory contains ".noedit" file
  • there is no way to delete a published document via a web interface (and this is made intentionally, as all "published" is expected to remain forever) -- while of course you may delete it manually, all these documents are just simple files after all.. ;-))

Few comments about internals :
  • as soon as you're involving any Snapshot action, there will be created a missed dim_Snapshot table in your current database and additional directories on your dim_STAT web server (/apps/httpd/home/docs/Snapshots/*)
  • Snapshot path: /apps/httpd/home/docs/Snapshots/data/{DBNAME}/{Snapshot-ID}
  • Published Document path: /apps/httpd/home/docs/Snapshots/pub/{Title}

Well, it's much more fun to use Snapshots than speak about, so hope you'll adapt it very quickly and and enjoy as me ;-)

The upgrade process is as usual :
  • 1.) download the latest tgz (WebX_apps-v90-u14.tgz) file from the CoreUpdates repository: http://dimitrik.free.fr/Core_Updates/
  • 2.) backup your current apps scripts: $ cd /opt/WebX ; tar czf apps-bkp.tgz apps
  • 3.) deploy the CoreUpdate-14 scripts bundle: $ cd /opt/WebX; tar xzf /path/to/WebX_apps-v90-u14.tgz
  • 4.) enjoy ;-)

As usually, any feedback is welcome!


Posted by Dimitri at 1:32 - Comments...
Categories: dim_STAT

Thursday, 15 May, 2014

MySQL Tech Day @Paris, 22/May-2014


The next MySQL TechDay is taking place in Paris, 22/May (the next week!!!) - if you're MySQL lover and will be in Paris area this day - hurry up to register on the event page and attend it - trust me, you'll not regret ;-))

We're continuing to follow our TechDay tradition:

  • the event is completely free (but places are limited, so you have to be registered to attend)
  • the content is pure technical and directly from Oracle engineering, no marketing ;-)
  • this is a true full day event, and we're reserving enough time to go in depth for each presented stuff..
  • the event is taking place in Oracle office in a pretty wide and comfortable amphitheater, covered by WiFi, so you may twit live about #mysqltechday and remain "connected" if this is a part of your constrains ;-)
  • we're starting at 10:30 to let you arrive "stressless" regardless traffic issues and distance (we know from previous experience that many arriving from different places in France, far away from Paris, and also some will come even from different countries! - Brussels, London, Birmingham, Dublin are already in our map for now ;-))
  • for those who will arrive earlier, a hot coffee with some sweats will be already waiting since 10:00 as a bonus ;-)
  • note: if you're arriving via public transportation keep in mind there is a direct tram going to the Oracle office from La Defense station (15min and you're arrived)..
  • and to finish with organization points:
    • around 13:00 we'll have a lunch in Oracle enterprise restaurant,
    • around 15:30 a coffee break
    • and around 17:30 we're expecting to finish (and let you in the same "stressless" conditions arrive at home ;-))

And now about the content..
  • very briefly we'll provide you an overview about the latest tech news from the MySQL Team

  • then, as promised from the last TechDay, I'll tell you the whole story about heavy OLTP workloads:
    • In-Memory and IO-bound, Read-Only and Read+Write..
    • their problems, solutions, workarounds, and improvements already made in MySQL 5.7
    • there was a long and hard work made since then, the result are surprising and amazing on the same time - and there are still many questions remaining without an answer.. ;-)
    • and, as promised, this time with a full deep dive into InnoDB internals -- we'll dig in details all the story with InnoDB flushing and purge, what was wrong before MySQL 5.5, what remained wrong in 5.5, improved in 5.6, redesigned and probably fixed in 5.7 -- how read-on-write issues were resolved, why parallel + improved flushing was implemented, what can be wrong and how to tune LRU flushing -- I'll tell you ALL ;-))
    • I'll have 2 hours to tell you the whole story, so be sure, you'll have for your time ;-)

  • then, again, as promised, we'll have Mark LEITH as our special guest during this event!
    • last time Mark was unable to come due unexpected "management issues"..
    • while this time we fixed all issues ahead, and just crossing fingers now for the flying conditions, as Mark will fly from UK to Paris the same day ;-)
    • if you did not attend any Mark's talks before, I'd present him as a "Practical MySQL Performance Schema Magician" !! ;-)
    • Performance Schema (PFS) is a gold mine of various valuable information about your MySQL instance
    • while entering in any huge gold mine you may feel yourself little bit lost.. ;-)
    • but Mark will show you how easy to find there your way in practice, and how powerful solutions built around PFS could be..
    • Mark will also present you his "ps_helper" - a collection of scripts he made to simplify practical PFS usage - this is a really great stuff, I'd compare it to what DTrace Toolkit made for DTrace -- you may use many scripts as they are just straightforward, then learn by example and create many new ones adapted explicitly to what you need, etc..
    • and trust me, some examples will really surprise you about how deep you may go with PFS ;-)
    • the best will be if you'll come with your laptop with installed latest MySQL 5.7 (or 5.6) on it and play with presented stuff by yourself..
    • BTW, ps_helper is fully integrated now within 5.7 and taking part of "sys" schema
    • as well, if you are not already doing, think to use MySQL Workbench 6.1+ : while this GUI tool is simply great for many general DBA tasks, it also introduced since v.6.1 a very helpful interface to discover, request and configure PFS via GUI.. - the tool is free and can be downloaded from here: http://dev.mysql.com/downloads/tools/workbench/ (Linux, MacOSX and Windoze versions)
    • and of course Mark will speak about the latest MySQL Enterprise Monitor (MEM) version - it's fully using now PFS in its metrics and the result is really amazing.. - Mark will tell you all about and show you a live demo, and if you want to try your hands on - you may start from here: http://www.mysql.com/products/enterprise/monitor.html (the tool is not free, but has a long enough trial period to try)..

Well, I hope you'll have a lot of fun and a lot of food for your brain this day! ;-))

See you there!
And also think ahead about other tech topics you'll happy so see covered the next MySQL TechDay..

Useful event links:
UPD: my talk was based on my Percona Live 2014 slides
Posted by Dimitri at 14:28 - Comments...
Categories: MySQL

Wednesday, 02 April, 2014

MySQL 5.7 just rocks! ;-)

A next MySQL 5.7 milestone release is available an it just rocks! ;-)

few benchmark results to see where we're today comparing:

  • MySQL 5.7 / 5.6 / 5.5
  • Percona Server 5.6 / 5.5
  • MariaDB 10 / 5.5

for all engines the latest available versions were used; the data set is fitting memory size, so the main focus is on the internal contentions here: already fixed for some engines, or still remained for another ones ;-)

Sysbench OLTP_RO 8-tables :


Sysbench OLTP_RO Point-Selects 8-tables :


Sysbench OLTP_RW 8-tables :


All details about these benchmark results and others (IO-bound OLTP_RW, Uniform & Pareto, DBT2, LinkBench) - I'll present during my talk tomorrow at PerconaLive 2014. I will also cover: all internals about InnoDB flushing design and how we're making it yet more improved it in 5.7, fixed and pending issues we have for today, the impact of InnoDB purge, filesystem and flash storage choices on IO-heavy workloads, and several still pending unexplained mysteries around.. - prepare your brain for some storming ;-)

If you're attending Percona Live, don't miss the following talks from MySQL Team :

as well we'll be present all here this evening during our "Meeting MySQL Team" BOF session to answer any questions you want and discuss any issues you have - your valuable feedback helps us to make MySQL yet more better! - and nothing is better here than live and fair face to face discussions.. - so, don't miss ;-)

Also, to get an overview of all the new features and improvements coming within this 5.7 milestone release - you may find many interesting information for you by reading Geir's article - http://mysqlserverteam.com/the-mysql-5-7-4-milestone-release-is-available/
UPD: my slides are here - MySQL_Perf-Percona_Live_2014-dim-key.pdf 
Posted by Dimitri at 16:37 - Comments...
Categories: MySQL

Friday, 31 January, 2014

MySQL & Friends @ FOSDEM-2014

February was yet so far.. - and finally it's just tomorrow, starting with MySQL & Friends Dev Room at FOSDEM 2014 in Brussels. I have a talk about "Starting with MySQL PERFORMANCE SCHEMA" - in fact I would call it rather "Using PFS with zero configuration" ;-) -- many people are thinking PFS is complicate, while in reality it's very simple, and just need little bit of love ;-) Since MySQL 5.6 PFS is enabled by default, and as the result - there are several very useful instrumentation stats available out-of-the-box, and my talk will be about them..

Of course I'll speak about MySQL Performance as well, and feel free to ask any questions about.

Also, don't miss talks from our MySQL Team :

See you all there!

(and hurry up to not miss MySQL & Friends Community Dinner - only few places left)

Rgds,
-Dimitri, heading to FOSDEM in train ;-)

UPD: my slides are here - MySQL_PFS_2014-dim.pdf 

Posted by Dimitri at 15:10 - Comments...
Categories: MySQL