dim_STAT : v.9.0 CoreUpdate-20-12

Just realized I did not post any notes about dim_STAT CoreUpdates during the last 3 years, so will try to fix it now with the following short summary ;-))

Read more...

MySQL Performance : Understanding InnoDB IO Internals & "Checkpointing"

Few weeks ago with a big curiosity I was reading several articles published by Percona about TPCC Benchmark results and MySQL 8.0 "checkpointing" issues..

Unfortunately, in these articles there was no any explanation nor any tentative to understand what is going on, an probably at least try and validate some "first coming in mind" tuning / troubleshooting options.. (And even no any try to show in action so often advertised PMM, and see on what it'll point ?)..

All in all, in the following article I'll try to feel up the "white holes" left in this TPCC testing..

Read more...

MySQL Performance : TPCC "Mystery" [SOLVED]

The TPCC workload "mystery" exposed in the following post was already clarified the last year, and I've presented explanations about the observed problem during PerconaLIVE-2019. But slides are slides, while article is article ;-)) So, I decided to take a time to write a few lines more about, to keep this post as a reference for further TPCC investigations..

The "mystery" is related to observed scalability issues on MySQL 8.0 under the given TPCC workload -- just that on the old aged DBT-2 workload (TPCC variation) I was getting much higher TPS when running on 2 CPU Sockets, comparing to1 CPU Socket, which is was not at all the case for Sysbench-TPCC.

Read more...

MySQL Performance : XFS -vs- EXT4 Story

This post was remaining in stand-by for a long time, specially that I was expecting that observed issues will be fixed soon. But time is going, and the problems are remaining. And I'm constantly asked "why, Dimitri, you're suggesting now to use XFS, while in the past you always suggested EXT4 ??" -- hope the following article will clarify you the "why" and maybe motivate you to do your own evaluations to see how well the things are working for you on your own systems under your own workloads..

NOTE : this will also clarify why the new Double Write did not appear in MySQL 8.0 in 2018, as it was planned, but only recently (http://dimitrik.free.fr/blog/posts/mysql-80-perf-new-dblwr.html)

Read more...

MySQL Performance : The New InnoDB Double Write Buffer in Action

The new MySQL-8.0.20 release is coming with re-designed InnoDB Double Write Buffer (DBLWR), and, indeed, it's one huge historical PITA less.. -- why it was so painful and cost us much blood in the past, I could not better explain than already done it in the following article yet from 2018 about MySQL on IO-bound workloads.. The story is not complete, as it's missing the 2019's chapter (will tell it later, np) -- but if you'll (re)read the mentioned above article first, you'll better understand the next ;-))

But at least the current post is only about good news now -- the new DBLWR and how it helps to solve historical MySQL performance problems ! -- and as one picture is better than million words, I'll try to save 3M words here (as there are 3 pictures in this article ;-))

Well, I'll also skip all new design details (I think Sunny will better explain them all himself "from the first hands") -- I'll only mention the following :

  • the DBLWR is no more part of "system tablespace", and can be placed anywhere you like (so, if you have a possibility to use a different storage for DBLWR files, you can totally get a rid of DBLWR impact on your main storage) -- but by default, DBLWR is stored in the same directory as your DATA
  • you can configure how many DBLWR files you want to use
  • and also how many pages per DBLWR files to have (which is also directly related to final DBLWR file size)

For more details about config options you can check MySQL 8.0 doc about DBLWR explaining this all in details.

Enough words, let's go to the test results..

Read more...

My Slides about MySQL 8.0 Scalability & Benchmarks from #PerconaLIVE 2019 Austin, TX

Here are my slides about MySQL 8.0 Scalability & Benchmarks from Percona LIVE 2019 in Austin, TX. (and I also realized I forgot to publish my slides from MySQL pre-FOSDEM 2019 Day about MySQL Performance Tuning, so fixing it now ;-))

P.S. Percona LIVE was "just awesome" ! ;-))

Rgds,
-Dimitri

Comments...

dim_STAT : Collect MySQL & System stats @Linux locally

Generally you have much bigger benefit with dim_STAT when you're collecting and analyzing your data live (online). However, there maybe still many situations when this is simply not possible (due security restrictions, wide remote distance, or simply externally inaccessible hosts, etc.) -- for such cases dim_STAT has special tool EasySTAT allowing you to collect all the needed stats locally on the given machine, and then later to upload them to your dim_STAT server for post-analyze. To simplify overall setup, EasySTAT is integrated into STAT-service of dim_STAT and can be directly involved from there, also following the same rules for MySQL access and so on (e.g. if your STAT-service is already operational, then your EasySTAT will be too ;-))

The following article is explaining how to deploy and start EasySTAT on your Linux server from scratch, even if you never hear about..

Read more...

My Slides about MySQL 8.0 Performance from #OOW18 and #PerconaLIVE 2018


As promised, here are slides about MySQL 8.0 Performance from my talks at Oracle Open World 2018 and Percona LIVE Europe 2018 -- all is combined into a single PDF file to give you an overall summary about what we already completed, where we're going in the next updates within our "continuous release", and what kind of performance issues we're digging right now.. ;-))

Also, I'd like to say that both Conferences were simply awesome, and it's great to see a constantly growing level of skills of all MySQL Users attending these Conferences ! -- hope you'll have even more fun with MySQL 8.0 now ;-))

Comments...

MySQL Performance : 8.0 on IO-bound OLTP_RW vs Percona Server 5.7

This article is inspired by Percona blog post comparing MySQL 8.0 and Percona Server 5.7 on IO-bound workload with Intel Optane storage. There are several claims made by Vadim based on a single test case, which is simply unfair. So, I'll try to clarify this all based on more test results and more tech details..

But before we start, some intro :

InnoDB Parallel Flushing -- was introduced with MySQL 5.7 (as a single-thread flushing could no more follow), and implemented as dedicated parallel threads (cleaners) which are involved in background once per second to do LRU-driven flushing first (in case there is no more or too low amount of free pages) and then REDO-driven flushing (to flush the oldest dirty pages and allow more free space in REDO). The amount of cleaners was intentionally made configurable as there were many worries that these threads will use too much CPU ;-)) -- but at least configuring their number equal to number of your Buffer Pool (BP) Instances was resulting in nearly the same as if you have dedicated cleaner-per-BP-instance.

Multi-threaded LRU Flusher -- was introduced in Percona Server 5.7, implementing dedicated LRU cleaners (one thread per BP instance) independently running in background. The real valid point in this approach is to keep LRU cleaners independent to so called "detected activity" in InnoDB (which was historically always buggy), so whatever happens, every LRU cleaner remains active to deliver free pages according the demand. While in MySQL 5.7 the same was expected to be covered by involving "free page event" (not what I'd prefer, but this is also historical to InnoDB). However, on any IO-bounded workload I've tested with MySQL 5.7 and 8.0 by configuring 16 BP instances with 16 page cleaners and with LRU depth setting matching the required free page rate -- I've never observed lower TPS comparing to Percona..

Single Page Flushing -- historically, in InnoDB when a user thread was not able to get a free page for its data, it was involving a "single page flush" itself, expecting to get a free page sooner -- the motivation behind such an approach was "better to try to do something than just do nothing". And this was blamed so often.. -- while, again, it's largely exaggerated, because the only real problem here is coming due a historical "limited space" for single page flush in DoubleWrite Buffer, and that's all. To be honest, making this option configurable could allow anyone to evaluate it very easily and decide to keep it ON or not by his own results ;-))

DoubleWrite Buffer -- probably one of the biggest historical PITA in InnoDB.. -- the feature is implemented to guarantee page "atomic writes" (e.g. to avoid partially written pages, each page is written first to DoubleWrite (DBLWR) place, and only then to its real place in data file). It was still "good enough" while storage was very slow, but quickly became a bottleneck on faster storage. However, such a bottleneck you could not observe on every workload.. -- despite you have to write your data twice, but as long as your storage is able to follow and you don't have waits on IO writes (e.g. not on REDO space nor on free pages) -- your overall TPS will still not be impacted ;-)) The impact is generally becomes visible since 64 concurrent users (really *concurrent*, e.g. doing things on the same time). Anyway, we addressed this issue yet for MySQL 5.7, but our fix arrived after GA date, so it was not delivered with 5.7 -- on the same time Percona delivered their "Parallel DoubleWrite", solving the problem for Percona Server 5.7 -- lucky guys, kudos for timing ! ;-))

Now, why we did NOT put all these points on the first priority for MySQL 8.0 release ?
  • there is one main thing changes since MySQL 8.0 -- for the first time in MySQL history we decided to move to "continuous release" model !
  • which means that we may still deliver new changes with every update ;-))
  • (e.g. if the same was possible with 5.7, the fix for DBLWR would be already here)
  • however, we should be also "realistic" as we cannot address fundamental changes in updates..
  • so, if any fundamental changes should be delivered, they should be made before GA deadline
  • and the most critical from such planned changes was our new REDO log implementation !
  • (we can address DBLWR and other changes later, but redesigning REDO is much more complex story ;-))
  • so, yes, we're aware about all the real issues we have, and we know how to fix them -- and it's only a question of time now..

So far, now let's see what of the listed issues are real problems and how much each one is impacting ;-))

For my following investigation I'll use :
  • the same 48cores-HT 2S Skylake server as before
  • x2 Optane drives used together as a single RAID-0 volume via MDM
  • same OL7.4, EXT4
  • Sysbench 50M x 8-tables data volume (same as I used before, and then Vadim)
  • Similar my.conf but with few changes :
    • trx_commit=1 (flush REDO on every COMMIT as before)
    • PFS=on (Performance Schema)
    • checksums=on (crc32)
    • doublewrite=off/on (to validate the impact)
    • binlog=off/on & sync_binlog=1 (to validate the impact as well)

Test scenarios :
  • Concurrent users : 32, 64, 128
  • Buffer Pool : 128GB / 32GB
  • Workload : Sysbench OLTP_RW 50Mx8tab (100GB)
  • Config variations :
    • 1) base config + dblwr=0 + binlog=0
    • 2) base config + dblwr=1 + binlog=0
    • 3) base config + dblwr=0 + binlog=1
    • 4) base config + dblwr=1 + binlog=1
    • where base config : trx_commit=1 + PFS=on + checksums=on

NOTE : I did not build for all the following test results "user friendly" charts -- all the graphs are representing real TPS (Commit/sec) stats collected during the tests, and matching 3 load levels : 32, 64, and 128 concurrent users.

128GB BUFFER POOL

So far, let's start first with Buffer Pool =128GB :

OLTP_RW-50Mx8tab | BP=128G, trx=1, dblwr=0, binlog=0

Comments :
  • with Buffer Pool (BP) of 128GB we keep the whole dataset "cached" (so, no IO reads)
  • NOTE : PFS=on and checksums=on -- while TPS is mostly the same as in the previous test on the same data volume (where both PFS & checksums were OFF), they are not impacting here
  • as well from the previous test you can see that even if the data are fully cached in BP, there is still an impact due used storage -- the result on Intel SSD was way lower than on Intel Optane
  • so, in the current test result you can see that MySQL 8.0 also getting better benefit from a faster storage comparing to Percona Server, even if the given test is mostly about REDO related improvements ;-))

And now, let's switch DobleWrite=ON :

Read more...

MySQL Performance : 8.0 GA on IO-bound TPCC

This post is mainly inspired by findings from the previous testing of MySQL 8.0 on TPCC workload(s) and observations from IO-bound Sysbench OLTP on Optane -vs- SSD. But also by several "urban myths" I'm often hearing when discussing with users about their IO-bound OLTP performance problems :

Myth #1 : "if I'll double the number of my storage drives -- I'll get x2 times better TPS !"
  • this was mostly true during "HDD era", and again..
  • (ex.: a single thread app doing single random IO reads from a single HDD will not go faster by doing the same from 2x HDD -- similar like single thread workload will not run faster on 8CPU cores -vs- 2CPU cores, etc.)
  • all depends on your workload and how many parallel IO operations you're involving..
  • indeed, it is much more easier to saturate HDD, but it's much more harder to do it with modern SSD/NVMe
  • NOTE : we're speaking about OLTP (e.g. if you started to observe full table scans in your workload -- means you're already doing something wrong ;-))
  • simple rule : if you're not saturating your storage on any of its limits => you'll not see any gain by adding more drives, you'll probably just have a bigger storage space, that's all.

Myth #2 : "I'll go faster with flash drive which is claimed capable x2 times more IOps in specs than my current drive !"
  • if you're expecting to run OLTP workload, rather pay attention to IO latency first !
  • sometimes it may be not mentioned these x2 times more IOps were obtained with x4 times more IO threads ;-))
  • e.g. a drive capable of x2 times more IOps but with x4 times higher latency will still be x4 times slower than your current drive on 1 user load, and 2, and 4, and .. maybe up to 64 ? (depends on where your current drive the limit is reached) -- and if you don't have more than 64 concurrent users ? -- then you'll never see your x2 times more IOps, but rather x4 times worse TPS ;-))

Test it yourself - this is the only advice I could give you !
  • because "only a real test will give you a real answer" (and I'm repeating to say it again and again.. ;-))
  • testing your own workload will give you the best answer !
  • otherwise you may still use some generic benchmark workloads which are representative for you
  • for ex. your new flash drive may be look better from all points and passing very well all generic pure IO tests, but show x3 times worse results once used by MySQL -- and this is just because, for ex., every involved fsync() will take x3 times more time, etc. (based on real story, no kidding ;-))

So far, for the following story I'll use :
  • Sysbench-TPCC workload, 10x100W (x10 of 100 warehouses, ~100GB data, here is why)
  • Same Skylake server as before, same config, etc.
  • and the same Optane & SSD as in the previous testing, except that I'll also add to the game x2 Optane drives used as a single RAID-0 MDM volume !
  • EXT4 is used on top of each drive or volume

Starting Test scenario :
  • concurrent users : 1, 2, 4, .. 1024
  • trx_commit : 1 (flush REDO on every COMMIT)
  • Buffer Pool (BP) : 128GB (in-memory), 32GB (IO-bound)

Sorry, no "user friendly" graphs this time, but hope it'll still be easy to understand the results ;-))

On the first graph you'll see :
  • 3 test results : with x2 Optaine first, then with single Optane, and then single SSD
  • on each test the load is going from 1, 2, 4 .. up to 1024 concurrent users
  • the Commits/sec curve is showing obtained TPS level

and the first result is with 128GB BP :


InnoDB Buffer Pool 128GB

Comments :
  • first of all, as you can see, TPS is not better with two -vs- one Optane
  • as we're not hitting any single limit of Optane drive, there is no any gain by using x2 ;-))
  • also, regardless the used data volume is ~100GB, we're not observing here such a big difference between Optane -vs- SSD as it was with Sysbench OLTP on a similar data size and still "in-memory"..
  • this because TPCC workload is much less aggressive on IO writes, so REDO flushing is less impacted..

now, how TPS will be impacted if BP size was smaller, just 32GB ?

Read more...