Saturday, 30 December, 2017

dim_STAT v.9.0 CoreUpdate-17-12 is here !

Year is finishing, and as a kind of "end-of-year" gift, I'm happy to announce that a freshy new CoreUpdate-17-12 is available from now ! ;-))

IMPORTANT : this is a very "sensible" update, and you cannot just to apply it on the top of already deployed dim_STAT instance as before.. -- there was a need to fix several issues within "under hood" binaries to make the new stuff working properly, so the new code has simply no chances to work with old binaries.. So far, I decided to make it all as a "single shot move" -- align a new update shipment with a moving to "64bit" versions support only :

  • e.g. this is not a new version of dim_STAT
  • this is just a "remastered" 64bit release + several visible and internal fixes
  • any 32bit releases are remained "as it", and it makes no more sense to continue to support them..
  • 64bit versions of dim_STAT v.9.0 are available for Linux and MacOSX (macOS)
  • any further CoreUpdate will work only with 64bit version having the latest supported binaries..

So, what about this "new stuff" requiring such deep changes to go till the binaries remastering ?.. -- a very long story short, this is all about support of SVG images ! ;-)) -- I've started to develop dim_STAT exactly 20 years (!!) ago.. (hard to believe time is flying so fast..) -- initially dim_STAT used Java Applets (it was very lightweight (yes! it was so 20 years ago ;-)) and it was really cool, "live", etc.) -- but then Java support in a browser became only heavier and heavier, so I've added PNG images support (which could bring back the initial lightweight to dim_STAT and make it "usable" again ;-)) -- and today, things are changing again ;-)) -- "retina" and other "high resolution" screens become more an more popular, and on these screens my previously "good enough" PNG graphs are looking just "ugly" (to be polite ;-)). After testing and analyzing tons of various JS-based (or other) live graphing tools/libs, I've finally stopped my choice on SVG ! -- it's already supported by most of web browsers, still lightweight, and has a huge advantage -- it's extremely well "readable" !! and you can scale (!!) it very easily as much as you want ;-)) (so, no more color confusions and simply ugly graphics ;-))

An SVG graph is looking like this :


well, this is just a snapshot, but hope you already can get an idea about how much "more clean" your graphs could be ;-))

NOTE : some web browsers (for ex. FireFox) may require to not be limited in the "smallest" font size to draw SVG images correctly (as the text you see on your graphs is reproduced by your browser itself).

Now, few words about how SVG is integrated in dim_STAT :
  • you have a new option for Graph image now : SVG (default)
  • and in fact at any time you're selecting SVG or PNG -- your graph is generated in both formats on the same time
  • however, the same image "size" may look differently in PNG vs SVG..
  • for this reason there is an additional option : SVG Scale (which is x1.0 by default, but you may change it as you like)
  • and most of dim_STAT "modules" are considering SVG option since now ;-))

So far, what is new in CoreUpdate-17-12 :
  • SVG support !
    • mandatory : remastered dim_STAT v.9.0 64bit version
    • always generating PNG & SVG together
    • SVG Scale option (web interface)
    • default SVG Scale=1.0
  • Snapshots :
    • always containing PNG & SVG
    • editing :
      • allow image reordering by mouse dragging
      • use SVG or PNG images according to what was recently used..
    • redesigned Snapshots list :
      • select criteria : Title pattern, order by time / title
      • show all || first / last [N]
      • option : show first N SVG/PNG images..
      • option : show notes..
      • click on title link involves snapshot editing
    • redesigned Published Snapshots :
      • have now also PNG and SVG document versions (HTML)
      • HTML documents are now composed from first 2 images of each Snapshot + its notes, then <<More>> link is pointing on a full Snapshot page..
      • PDF documents format remains the same (single document starting with a list of links to all included Snapshots)..
  • Multi-Line stats :
    • "CacheList" feature to speed-up names look-ups ! (generated once, then only re-used, can be re-generated again at any time on demand)
    • allow bigger varchar limits for Add-On "name" collumn : 8/ 16/ 24/ 32/ 48/ 64/ 80/ 96/ 128
  • Report Tool :
    • you can now add Published Snapshots to your reports !
    • code remastering / cleanup
    • better / more accurate formatting
    • remove external / java editors
    • minor bug fixes..
  • Bookmarks :
    • Duplicate existing Preset
    • Edit Preset (via checkbox list + reorder)
    • code remastering..
  • STAT-service :
    • STAT-service script now saves PID on start
    • new commands: status, restart
    • status : check if process with saved PID is running
    • stop : call x.killSTAT with saved PID
    • x.killSTAT : remaster to use CMD or PID
    • restart : calls itself with stop, then start
    • respecting chkconfig standard
    • respecting LSB standard
    • to setup under systemd :
      • # ln -s /apps/STATsrv/STAT-service /etc/init.d
      • # systemctl enable STAT-service
      • # service STAT-service start
      • # service STAT-service status
      • # /apps/STATsrv/STAT-service status
    • involving ALARM signal : ejecting on x10 Timeout non-activity period in all MySQL/ PG/ ORA scripts !!!
      • => this could be very important in case when your monitored database engine becomes "frozen", but still continue to accept new connections..
      • so dim_STAT will not see any stats coming from your database and suppose the connection was lost..
      • and dim_STAT will then start a new stat command..
      • which may still connect to your database, but not go any further, as database is frozen..
      • and this may continue endless in loop every time once "connection timeout" is expired..
      • and your server will get tons of stat commands started on its side and trying to collect some stats from your frozen database engine..
      • so, to avoid this, each stat command is now having its own "non-activity alarm" -- if within x10 times Timeout period there was nothing collected from a given database, the stat command will self-eject by alarm event ;-))
    • innodbMUTEX : advanced aggregation with tot@ / max@ / avg@ wait stats !
    • Linux netLOAD_v2 : reports also MB/sec + fixed counters overflow
    • Linux mysqlSTACK / ProcSTACK (based on quickstack from Yoshi (@FB))
    • Linux mysqlCPU / ProcCPU (based on tiptop, not compatible with "perf", use with caution)
    • Linux PerfCPUSTAT (based on "perf", reports CPU HW counters)
    • Linux PerfSTAT_v2 (improved PerfSTAT)
      • extended with Cycles/s numbers !!
      • new option : -Event ... (ex: -Event cycles:u)
    • EasySTAT :
      • extended on Linux with : mysqlSTACK, mysqlCPU, netLOAD_v2, PerfCPUSTAT, PerfSTAT_v2
      • now also auto-generating start/end log messages (to simplify post-analyze)
      • LoadDATA.sh is allowing now to pass args from command line (path to BatchLOAD and DB name)
  • General :
    • Analyze Top-N values : choice => MIN/ MAX/ AVG/ 95% !
    • export data in CSV format via Analyze page & dim_STAT-CLI
    • tag LOG Messages with different colors according text pattern !
    • on stat names checkbox => popup with note description (when available)
    • improved auto-scaling Y-axe values according image size !
    • improved default HTML style, re-look, etc.
    • New Add-Ons & their Bookmarks :
      • mysqlSTACK
      • mysqlCPU
      • ProcSTACK
      • ProcCPU
      • netLOAD_v2
      • PerfCPUSTAT
      • PerfSTAT_v2
      • ...
    • minor bug fixes..

Hope you'll enjoy all this new stuff as I'm already doing ;-))

For those who already using dim_STAT and want to preserve their collected data -- here are few simple steps to make the migration to the latest 64bit version smooth :
  • first of all - no matter if you used 32bit version before, or 64bit (or SPARC, etc.), instructions are exactly the same
  • however, even the version remains the same, we're moving to new remastered binaries..
  • so the whole process is looking as "migration" (rather a simple CoreUpdate apply)
  • (and if your data are critical, better to try you migration steps on another host first ;-))
  • so far, let's suppose you already deployed dim_STAT into "/apps" on your host1
  • 1) stop dim_STAT : # /apps/ADMIN/dim_STAT-Server stop
  • 2) backup your whole (!!) database directory : /apps/mysql/data
  • 3) if you created any Reports or Snapshots, then backup also the whole web server docs directory : /apps/httpd/home/docs
  • 4) now on host2 install the latest 64bit dim_STAT version (supposing it'll be re-installed to the same default "/apps" path)
  • 5) restore your "docs" backup into "/apps/httpd/home/docs" on host2
  • 6) from your databases backup restore all to "/apps/mysql/data" on host2 except the following directories :
    • /apps/mysql/data/mysql
    • /apps/mysql/data/performance_schema
    • /apps/mysql/data/dim_00
  • 7) start dim_STAT on host2 : # /apps/ADMIN/dim_STAT-Server start
  • 8) connect to your newly installed dim_STAT on host2 and check you can find all your previously collected data and created documents / snapshots..
  • 9) if something is going odd.. -- then ping me back, let's see what is going wrong ;-))
  • 10) if all is fine, then test if for a while, and once everything is really ok, then you're comfortable to migrate your host1 too ;-)) (and don't hesitate to ping me with good news too ;-))

Crossing fingers.. -- hope all will go just fine for you ;-))

And for the end.. -- there is yet "one more thing"..

This "one more thing" was inspired by observing many dim_STAT users doing some completely "unexpected things" (unexpected in my mind, but looking very "natural" for those who are doing) -- honestly, I'm really surprised by all use cases I've seen over past years (even users generating graphs via CLI commands involved from app servers and then grouping them into various documents, charts, etc.) -- but the most painful for me was to see users trying to involve "web oriented" actions in dim_STAT via curl, wget, etc.. -- this could bring to something wrong as there could be tons of options used within each POST/GET order, and all expected calls could be just broken after further CoreUpdates..

And to make your life more simple for such use cases, let me present you the REST-like Interface available since CoreUpdate-17-12 ! ;-))

dim_STAT-REST interface is supporting the following commands right now :
  • DB_LIST -- list all available databases
  • DB_CREATE -- create a new database
  • HOST_LIST -- print current host list in database
  • HOST_ADD -- add new hostname into host list in database
  • HOST_STATS -- request the list of available STATS from host STAT-service
  • COLLECT_LIST -- list all available STAT collects in database
  • COLLECT_NEW -- create and start a New Collect in database
  • COLLECT_STOP -- stop Active Collect(s) in database
  • COLLECT_RESTART -- restart Stopped Collect(s) in database
  • LOG_MESSAGE -- add a LOG Message to database

the output of each command is going in simple "pre-formatted" ASCII text, easy to parse and check.

Here is an example of "COLLECT_LIST" output :
==========================================================================================
 dim_STAT-REST (dim) v.1.0
==========================================================================================
 > CMD: COLLECT_LIST
  ----------------------------------------------------------------------------------------
      1 | goldgate     | -OFF- | 1998-12-18 16:28:27 |  15 sec. | Demo 1
      6 | test         | -OFF- | 2002-10-20 23:37:01 |  15 sec. | test
      7 | bezout       | -OFF- | 2003-06-26 13:46:51 |  30 sec. | test err
      9 | fidji        | -OFF- | 2003-09-17 13:23:12 |  10 sec. | test MPXIO + nocanput
     12 | test         | -OFF- | 2003-10-06 17:15:43 |  20 sec. | test MTB bug
     15 | localhost    | -OFF- | 2004-09-26 21:48:49 |  20 sec. | test
     16 | localhost    | -OFF- | 2004-09-26 21:51:59 |  20 sec. | test
     17 | gauss        | -OFF- | 2004-10-01 12:29:37 |  20 sec. | test RESTART
     18 | neel         | -OFF- | 2004-10-01 12:30:00 |  20 sec. | test RESTART
     19 | monod        | -OFF- | 2004-10-01 12:33:52 |  20 sec. | test RESTART
     20 | monod        | -OFF- | 2004-10-01 20:36:37 |  20 sec. | test RESTART
     21 | localhost    | -OFF- | 2004-10-12 20:30:51 |  15 sec. | test statOEE
     22 | dimitri      | -OFF- | 2007-01-21 21:06:43 |   5 sec. | test IObench
     23 | dimitri      | -OFF- | 2009-06-15 16:36:49 |  10 sec. | System Load...
  ----------------------------------------------------------------------------------------

 > OK
==========================================================================================


but I think more simple is to discover this all by yourself -- right now, try to execute the following command from your shell to test dim_STAT installed on your host2 (port 80) :
 $ curl -L "http://host2:80/cgi-bin/WebX.mySQL/dim_STAT/x.REST"

this will print you the following help message :
==========================================================================================
 dim_STAT-REST (dim) v.1.0
==========================================================================================
 > Usage: curl -L "http://host2:80/cgi-bin/WebX.mySQL/dim_STAT/x.REST?CMD=Command[&..options..]" 

    CMD=DB_LIST                    -- list all available databases 

    CMD=DB_CREATE                  -- create a new database "Name"
      &DB=Name                        -- db name
      &Engine=InnoDB|MyISAM           -- db engine (InnoDB or MyISAM)
      &Passwd=password                -- optional password to protect admin actions 

    CMD=HOST_LIST                  -- print current host list in database "Name"
      &DB=Name                        -- db name 

    CMD=HOST_ADD                   -- add new hostname into host list in database "Name"
      &Host=hostname                  -- new hostname (format: [alias/]hostname[:Port])
      &DB=Name                        -- db name
      &RESET=1                        -- optionally: reset hostlist to empty 

    CMD=HOST_STATS                 -- request the list of available STATS from host STAT-service
      &Host=hostname                  -- alias OR hostname (format: [alias/]hostname[:Port])
      &DB=Name                        -- db name 

    CMD=COLLECT_LIST               -- list all available STAT collects in database "Name"
      &DB=Name                        -- db name 

    CMD=LOG_MESSAGE                -- add a LOG Message to database "Name"
      &DB=Name                        -- db name
      &Message=text                   -- text message
      [&Host=hostname]                -- hostname (multiple Host args can be used)
      [&ID=id]                        -- Collect ID (multiple ID args can be used)
                                      * Host and ID are optional :
                                        > if ID is given : use provided ID(s) only
                                        > if no ID nor Host : add the message to all active collects
                                        > if only Host : add the message to active collects matching hostname(s) 

    CMD=COLLECT_NEW                -- create and start a New Collect in database "Name"
      &DB=Name                        -- db name
      &Host=hostname                  -- hostname (only one Host can be user at time)
      &Timeout=Nsec                   -- STATs timeout in seconds
      &Title=title                    -- Collect title
      &STATS=list                     -- list of STATs to collect: stat1[,stat2[,stat3]...] or "all"
                                         all: means all STATs available from Host STAT-service
      [&LOG=filename]                 -- full filename of LOG file to watch
      [&Message=text]                 -- text message to log Collect start 

    CMD=COLLECT_STOP               -- stop Active Collect(s) in database "Name"
      &DB=Name                        -- db name
      [&Message=text]                 -- text message to log on Collect(s) stop
      [&Host=hostname]                -- hostname (multiple Host args can be used)
      [&ID=id]                        -- Collect ID (multiple ID args can be used)
                                      * Host and ID are optional :
                                        > if ID is given : use provided ID(s) only
                                        > if no ID nor Host : stop all active collects
                                        > if only Host : stop active collects matching hostname(s) 

    CMD=COLLECT_RESTART            -- restart Stopped Collect(s) in database "Name"
      &DB=Name                        -- db name
      [&Message=text]                 -- text message to log on Collect(s) restart
      [&Host=hostname]                -- hostname (multiple Host args can be used)
      [&ID=id]                        -- Collect ID (multiple ID args can be used)
                                      * Host and ID are optional :
                                        > if ID is given : use provided ID(s) only
                                        > if no ID nor Host : restart all recently stopped collects
                                        > if only Host : restart recently stopped collects matching hostname(s) 

    ...
==========================================================================================

## ERROR:
=> CMD is not filled !! 

==========================================================================================


while you can see there many options listed, there are many actions are simplified "by default" -- and to explain this, let me show you a simple use case of the whole workflow by example :
  • you're starting a testing with say "Customer 1234"
  • so, first you creating a dedicated database CU_1234_YourName
  • (adding your name to dbname to be sure the name is unique ;-))
  • then you adding to this database the hosts you're wanting to use (say: host_N1, host_N2, host_N3)
  • (note: this also can be the same HW server, but running several STAT-services (each one for different db instance, etc.)
  • once you're ready, you're :
    • starting New Collect for host_N1
    • starting New Collect for host_N2
    • starting New Collect for host_N3
    • NOTE: instead of building the list of stats you want to collect from your hosts, you can use "STATS=all" option, which will collect everything -- this could be dangerous (sometimes too much is too much ;-)) -- but you can easily limited "everything" to "just what you need" by editing the STAT-service "access" file (and leave uncommented there only the stats you'll really need) -- so, again, you can keep your own "pre-configured" STAT-service tarball, deploy & start it on your host(s) before any testings, and then in your scripts always use just "STATS=all" (regardless which system you're using and delegate it to your pre-defined STAT-service config ;-))
  • after what, you can run your first test workload..
  • your test script may contain in any place a command to send a Log Message
  • NOTE: without mentioning any ID in the command, the Message will be automatically added to all currently active Collects !
  • (so, in your scripts you even don't need to worry which exactly hosts are used, etc. -- all you need to know is the URL of your dim_STAT server and the dbname you're using ;-))
  • supposing you got your first results, and now need a feedback from Customer/Dev about, so no more tests for the moment.. -- then you just involve COLLECT_STOP for your database
  • NOTE: without any ID provided the command will stop all active collects within your database (so, no need to worry you forgot anyone ;-))
  • then, few days later, you have more info, and need to run other tests within the same conditions and on the same hosts..
  • so, all you need to do is just to involve COLLECT_RESTART command, and again, without any ID and by only giving DBNAME the tool will restart the latest most recent existing collects ;-))
  • and in case you need to run some tests only on say "host_N2" => you then just giving "DB=CU_1234_YourName&Host=host_N2" and the tool will automatically find the most recently created Collect corresponding to "host_N2" and restart it !
  • same, your test script continues to send Log Messages, and if the only host_N2 Collect is active during this time => then only host_N2 Collect will log them, and not other Collects ;-))
  • and then again, but involving COLLECT_STOP with no ID, it'll stop all your running collects, no need to worry to miss any one of them ;-))

Well, don't hesitate to ping me if you need any more details !

That's all for the moment. As usual, all the stuff above is available for free download from my site :
- http://dimitrik.free.fr
Any comments are welcome ! Happy New Year & Have Fun ! ;-))

Rgds,
-Dimitri
Posted by Dimitri at 0:29 - Comments...
Categories: dim_STAT, Linux

Monday, 23 October, 2017

MySQL Performance: 8.0 re-designed REDO log & ReadWrite Workloads Scalability

This post is following the story of MySQL 8.0 Performance & Scalability started with article about 2.1M QPS obtained on Read-Only workloads. The current story will cover now our progress in Read-Write workloads..

Historically our Read-Only scalability was a big pain, as Read-Only (RO) workloads were often slower than Read-Write (sounds very odd: "add Writes to your Reads to go faster", but this was our reality ;-)) -- and things were largely improved here since MySQL 5.7 where we broke 1M QPS barrier and reached 1.6M QPS for the first time. However, improving Writes or mixed Read+Writes (RW) workloads is a much more complex story..

What are the main scalability show-stoppers in MySQL 5.7 and current 8.0 release candidate for RW and IO-bound workloads? - the most "killing" are the following ones :

  • REDO log contentions are blocking your whole transaction processing from going faster..
  • Transaction (TRX) management becomes very quickly a huge bottleneck as soon as your transactions are fast and/or short..
  • internal locking and row locking (LOCK) will quickly kill your performance as soon as your data access pattern is not uniform, etc..
  • and yet more, historically as soon as you're involving any IO operations, they all will go via one single and global locking path (fil_system mutex) which will make a use of faster storage solutions (flash) simply useless..

so, it was definitively a time to take our hands on this ;-))

The whole story about is pretty amazing, but I have to be short, so :
  • in short : we know exactly what we have to do to get a rid of this all
  • we have a working prototype code allowing us to expect pretty important potential gains in RW and pure Write workloads
  • the only real problem here is that a road from "prototype" to "production quality" code is much longer than anyone could expect (even me ;-))
  • so within MySQL 8.0 GA timeframe we could not deliver all the fixes we have, and we have to go by priority here..
  • and the priority #1 from the list of issues mentioned above is for sure going to REDO and IO problems, as the most common show-stoppers for most of RW workloads today
  • the 8.0 planned changes are not yet final, but you may already get a first idea about by trying our "preview" MySQL 8.0-labs release

While this article will be mostly about the changes we're doing for REDO.

And in fact the story around InnoDB REDO contains many various surprises :
  • historically, very often the perception of how REDO activity is impacting overall InnoDB performance was seen as a balance between performance -vs- security in "trx_commit" tuning settings, e.g. :
    • innodb_flush_log_at_trx_commit=1 : flushing (fsync) REDO on every COMMIT
    • innodb_flush_log_at_trx_commit=2 : flushing REDO only once per second
    • general observations : using innodb_flush_log_at_trx_commit=2 gives a better performance
    • common conclusion :
      • fsync operations are having an important cost, doing it less frequently helps performance
      • use innodb_flush_log_at_trx_commit=2 if you want a better performance and can accept to loose 1sec of last transactional activity in case of power off..

  • and the main key point in this perception is : "doing fsync to flush REDO is costly"
  • while even 20 year ago there were many storage solutions capable to greatly improve write performance (like arrays having battery-protected-cache on controller, or simple write-cache chips, etc.) -- which are particularly will be very efficient with REDO writes which as small and fully sequential..
  • however, most of the time the slowness of trx_commit=1 was mostly attributed to "frequent fsync calls" rather to REDO design itself..
  • our very fist suspects about REDO design started yet 2 years ago when Sunny implemented a probe dirty patch just to see the potential impact of a different approach in REDO queueing.. => which gave a completely unexpected result : surprisingly observed performance on the same RW workload was better with trx_commit=1 comparing to trx_commit=2..
  • after what it became clear that the whole issue is rather related to REDO design, while frequently involved fsync is not representing its main reason but just amplifying the problem..

During the past few years we came with several tentatives to improve InnoDB REDO log design, before to come with an understanding of what exactly do we need ;-)) -- our main target was to improve performance when trx_commit=1 is used (true security when we do flush REDO on every COMMIT), and from "do less fsync to improve performance" we came to conclusion "let's rather be driven by storage capacity".

So far, this is how the new REDO design is different comparing to before (very simplified) :


if before users were constantly fighting for permission to write to REDO, in new design they are not fighting anymore :
  • User threads are sending their records to Log Buffer (lock-free)
  • Log Writer thread is pushing the records further from Log Buffer to FS cache (buffered write())
  • after what Log Flusher thread is involving fsync() to flush REDO asap
  • if required, User threads are waiting to be notified their records are flushed (in case of COMMIT for ex.)
  • the whole chain is asynchronous and event-driven
  • we're not trying to write or to flush less or more / or more or less often.. -- all IO activity is driven by storage capacity
  • if storage is capable to write quickly (low write latency) => fsync()s will be more frequent, but with smaller data
  • otherwise there will be less frequent fsync()s, but with bigger amount of data to flush
  • at the end, the whole processing rate will depend only on storage capacity to write REDO records fast enough !

This new model is only the first step in further REDO improvements. Currently by resolving bottlenecks on REDO layer, we're unlocking user threads to do more work, which is resulting in yet more hot bottlenecks on TRX and LOCK layers. So, there is still a lot of work ahead, and indeed, we're only on the beginning..

However, with new model we discovered few surprises we did not expect on the beginning :


this all related to "low level loads" :
  • with old REDO when you have only 1-2 concurrent users, they are not fighting too much for writes
  • while with new REDO all the processing work is event-driven and following a chain of events from thread to thread
  • and a chain of events between different threads can hardly compete for efficiency -vs- a single thread which is doing all the work alone without any notification waits, etc..
  • our final solution for low loads is not yet finalized, but several options are considered
  • one of the options : involve CPU spinning in thread waits, which is on cost of additional 20% of CPU usage allowing to catch the same TPS on 1-2 users comparing to what it was with old REDO, but already on 4 concurrent users load reach a higher TPS than before !

The following "dirty" snapshot from intermediate benchmark results on pure UPDATE workload comparing MySQL 5.6/ 5.7/ 8.0-rc/ 8.0-labs/ 8.0-labs-spinning could give you an idea what kind of headache we may have :

and our main expectation is by combining together all these bits + involving some kind of auto-tuning (or "auto-guided" tuning, etc.) come with something really stable and easy "tunable" for all kind of loads ;-))

While for the time being, here are few benchmark results we currently obtaining on Sysbench RW workloads :
  • Server : 48cores-HT, 2CPU sockets (2S) 2.7Ghz (Skylake), OL7.3
  • Storage : Intel Optane PCIe 375GB, EXT4

Sysbench OLTP_RW
  • workload : New Sysbench OLTP_RW
  • data volume : 8 tables of 10M rows each
  • encoding : latin1
  • user load levels : 1, 2, 4, .. 1024
  • engines : MySQL 8.0-labs, MySQL 5.7, MySQL 5.6
  • innodb_flush_log_at_trx_commit=1
Observations :
  • 30% gain over MySQL 5.7
  • 50% gain over MySQL 5.6
  • NOTE: in OLTP_RW workload the majority of queries are Reads, so it's really great to see such a gain, because the main gain on Reads scalability was already reached with MySQL 5.7 ;-)
  • on the same time, TRX and LOCK bottlenecks are still remaining..


Sysbench UPDATE-NoIDX
  • workload : New Sysbench UPDATE-no_index
  • data volume : 8 tables of 10M rows each
  • encoding : latin1
  • user load levels : 1, 2, 4, .. 1024
  • engines : MySQL 8.0-labs, MySQL 5.7, MySQL 5.6
  • innodb_flush_log_at_trx_commit=1
Observations :
  • 100% gain over MySQL 5.7
  • 50% gain over MySQL 5.6
  • NOTE: there is no mistakes in the results ;-)
    • 5.7 is really that worse -vs- 5.6 on this workload..
    • so, we're very happy to fix this gap finally with 8.0 !

The following config settings was used during the presented benchmark workloads :
[mysqld]
# general
 max_connections=4000
 table_open_cache=8000
 table_open_cache_instances=16
 back_log=1500
 default_password_lifetime=0
 ssl=0
 performance_schema=OFF
 max_prepared_stmt_count=128000
 skip_log_bin=1
 character_set_server=latin1
 collation_server=latin1_swedish_ci
 transaction_isolation=REPEATABLE-READ

# files
 innodb_file_per_table
 innodb_log_file_size=1024M
 innodb_log_files_in_group=32
 innodb_open_files=4000

# buffers
 innodb_buffer_pool_size=32000M
 innodb_buffer_pool_instances=16
 innodb_log_buffer_size=64M

# tune
 innodb_doublewrite=0
 innodb_thread_concurrency=0
 innodb_flush_log_at_trx_commit=1
 innodb_max_dirty_pages_pct=90
 innodb_max_dirty_pages_pct_lwm=10

 join_buffer_size=32K
 sort_buffer_size=32K
 innodb_use_native_aio=1
 innodb_stats_persistent=1
 innodb_spin_wait_delay=6

 innodb_max_purge_lag_delay=300000
 innodb_max_purge_lag=0
 innodb_flush_method=O_DIRECT_NO_FSYNC
 innodb_checksum_algorithm=none
 innodb_io_capacity=4000
 innodb_io_capacity_max=20000
 innodb_lru_scan_depth=9000
 innodb_change_buffering=none
 innodb_read_only=0
 innodb_page_cleaners=4
 innodb_undo_log_truncate=off

# perf special
 innodb_adaptive_flushing=1
 innodb_flush_neighbors=0
 innodb_read_io_threads=16
 innodb_write_io_threads=16
 innodb_purge_threads=4
 innodb_adaptive_hash_index=0

# monitoring
 innodb_monitor_enable='%'

NOTE:
  • yes, I know, PFS is turned OFF, the results comparing PFS=on/off are coming later, no worry ;-))
  • with 32GB Buffer Pool the whole dataset is remaining in memory, only writes are going to the storage
  • checksums were not set either as they are not impacting in this workload
  • (the tests results comparing checksum impact are coming later too)
  • other tuning details I'll explain in the next articles..

Also, I've intentionally skipped here all the problems related to InnoDB Double Write Buffer (the only feature protecting you today from partially written pages (except if you're using COW FS (like ZFS or similar)) -- this feature as it is became a huge bottleneck by itself.. -- our fix was ready yet for MySQL 5.7, but missed the GA timeframe, so was delayed for 8.0, where it met a list of several pre-requirement before allowed to be pushed, but finally it's only a question of time now to see the fix applied and delivered as part MySQL 8.0 features..

In case you want to replay the same tests, you may follow the same instructions as in the previous post to setup the scripts and load the test data, then just execute :

OLTP_RW :
cd /BMK
for nn in 1 2 4 8 16 32 64 128 256 512 1024 
do 
   sh sb_exec/sb11-OLTP_RW_10M_8tab-uniform-ps-trx.sh $nn 300
   sleep 60
done

UPDATE-NoIDX :
cd /BMK
for nn in 1 2 4 8 16 32 64 128 256 512 1024 
do 
   sh sb_exec/sb11-OLTP_RW_10M_8tab-uniform-upd_noidx1-notrx.sh $nn 300
   sleep 60
done

So far, we're expecting to see a significant progress on RW performance with MySQL 8.0 ! However, regardless positive overall benchmark results comparing to previous MySQL version, we're still far from scaling on Writes.. -- work in progress, stay tuned, yet more to come.. ;-))

and THANK YOU for using MySQL !

Rgds,
-Dimitri

Posted by Dimitri at 16:19 - Comments...
Categories: MySQL

Wednesday, 04 October, 2017

MySQL Performance : 2.1M QPS on 8.0-rc

The first release candidate of MySQL 8.0 is here, and I'm happy to share few performance stories about. This article will be about the "most simple" one -- our in-memory Read-Only performance ;-))

However, the used test workload was here for double reasons :


Going ahead to the second point, the main worry about New Sysbench was about its LUA overhead (the previous version 0.5 was running slower than the old one 0.4 due LUA) -- a long story short, I can confirm now that the New Sysbench is running as fast as the oldest "most lightweight" Sysbench binary I have in use ! so, KUDOS Alex !!! ;-))

While regarding the improvements coming with MySQL 8.0 on Read-Only workloads I'd mention :
  • several "overheads" were fixed
  • the most notable one is related to UTF8, of course
  • however, even latin1 related functions were improved little bit
  • but this was only about "overheads", and nothing about "scalability"
  • because the main "scalability" gap was already made with MySQL 5.7 two years ago ;-))
  • so, our main merit with MySQL 8.0 here will be rather NOT TO LOOSE the already obtained gain !
  • (agree, sounds very odd, but if you'll just look on the list of the all new features coming with 8.0 you can imagine our code path is not going to be shorter, right ? ;-))
  • so the fair test here will be to compare 8.0 vs 5.7 and 5.6 with latin1 encoding
  • (for UTF8 the winner is 8.0 and from very far, which you already know)

The most "sensible" RO workload in Sysbench is Point-Selects, so here is my test scenario:
  • workload : New Sysbench RO point-selects
  • data volume : 8 tables of 10M rows each
  • encoding : latin1
  • user load levels : 1, 2, 4, .. 1024
  • engines : MySQL 8.0, MySQL 5.7, MySQL 5.6
  • server : 96cores-HT 4CPU sockets 2.2Ghz (Broadwell), OL7.3

and here is the result :


Observations :
  • 2.1M SQL Query/sec for MySQL 8.0 -- our current new Max QPS record obtained in MySQL history !
  • which is great, no doubt !
  • however, there is a clearly seen small, but visible QPS regression on lower load levels..
  • which is not really cool (even if could be easily explained by increased code path + new DD + etc.. etc..)
  • well, adding it to my list of "low load" performance issues and will investigate later..

So far, the 2.1M QPS result is obtained on the "old" Broadwell CPU, there is no any bottlenecks observed (only potential overheads), so for the moment I have no idea what to expect on the same workload on 4CPU sockets Skylake, will share the results once have such a box in my hands ;-))

Then, many people are constantly asking me about how to reproduce the presented results, so the following all all exactly details you may need:

1) install and start your MySQL 8.0 instance (rc or current labs release)

2) the config settings I've used is here :
[mysqld]

# general
 max_connections=4000
 table_open_cache=8000
 table_open_cache_instances=16
 back_log=1500
 default_password_lifetime=0
 ssl=0
 performance_schema=OFF
 max_prepared_stmt_count=128000
 skip_log_bin=1
 character_set_server=latin1
 collation_server=latin1_swedish_ci
 transaction_isolation=REPEATABLE-READ

# files
 innodb_file_per_table
 innodb_log_file_size=1024M
 innodb_log_files_in_group=3
 innodb_open_files=4000

# buffers
 innodb_buffer_pool_size=32000M
 innodb_buffer_pool_instances=16
 innodb_log_buffer_size=64M

# tune
 innodb_doublewrite=0
 innodb_thread_concurrency=0
 innodb_flush_log_at_trx_commit=1
 innodb_max_dirty_pages_pct=90
 innodb_max_dirty_pages_pct_lwm=10

 join_buffer_size=32K
 sort_buffer_size=32K
 innodb_use_native_aio=1
 innodb_stats_persistent=1
 innodb_spin_wait_delay=6

 innodb_max_purge_lag_delay=300000
 innodb_max_purge_lag=0
 innodb_flush_method=O_DIRECT_NO_FSYNC
 innodb_checksum_algorithm=none
 innodb_io_capacity=4000
 innodb_io_capacity_max=20000
 innodb_lru_scan_depth=9000
 innodb_change_buffering=none
 innodb_read_only=0
 innodb_page_cleaners=4
 innodb_undo_log_truncate=off

# perf special
 innodb_adaptive_flushing=1
 innodb_flush_neighbors=0
 innodb_read_io_threads=16
 innodb_write_io_threads=16
 innodb_purge_threads=4
 innodb_adaptive_hash_index=0

# monitoring
 innodb_monitor_enable='%'
 

NOTE: yes, I know, PFS is turned OFF, the results comparing PFS=on/off are coming later, no worry ;-))

4) download the tarball with my scripts + sysbench binary + my.conf, and then untar it into "/" directory

5) this will create /BMK directory with all the stuff inside, so edit the ".bench" file to provide the account details to connect to your MySQL instance (user, passwd, host, port)

6) create "sysbench" database

7) run the load data script :
$ cd /BMK
$ sh sb_exec/sb11-Prepare_10M_8tab-InnoDB.sh
this will load 8 tables with 8M rows each

8) run ANALYZE on each table within sysbench database to be sure your stats for these tables are up-to-date

9) run the test :
$ cd /BMK
$ for nn in 1 2 4 8 16 32 64 128 256 512 1024 
do 
   sh sb_exec/sb11-OLTP_RO_10M_8tab-uniform-ps-p_sel1-notrx.sh $nn 300
   sleep 60
done

10) enjoy ;-))

stay tuned, more to come..

Rgds,
-Dimitri

Posted by Dimitri at 3:33 - Comments...
Categories: MySQL

Tuesday, 09 May, 2017

MySQL Performance: 8.0-dev Progress Details

As promised, here is the follow up of the MySQL 8.0-dev Progress teaser ;-)

so, yes, we expect to see the most hot contentions gone, as it's seen from the following graph :

Observations :

  • the graph is representing the spin waits / spin rounds events happening during the same Sysbench Update-NoKEY workload
  • the load is progressing from 8 concurrent users to 16, 32, .. 512
  • 3 engines are tested one after one : MySQL 5.7, MySQL 8.0 (current DMR), MySQL 8.0-dev (current prototype)
  • first all 3 engines are running on 22cores-HT only (1 CPU socket)
  • then, the second time : on 44cores-HT (2 CPU sockets)
  • and the most important thing to retain about this graph is that on 8.0-dev we see all main hot contentions gone ;-)
And not having all these contentions is resulting in the following :

Observations :
  • one of the most painful things about current InnoDB WRITE performance is its scalability.. (in fact the absence of scalability ;-))
  • the problem is not new, just that since READ scalability was greatly improved since MySQL 5.7, seeing WRITE to remain not scaling become even more painful..
  • and this is because of all overhead and contentions we have on REDO level, transactions & lock management, etc..
  • so, MySQL 5.7 and 8.0 DMR are limited by this, and moving from 1 CPU socket to 2 CPU socket cannot help here (in fact the things may become only worse as with more CPU cores all these contentions will become only higher)..
  • while 8.0-dev code is giving us a huge expectation to finally see all these problems gone, and potentially get x2 times better performance even on a single CPU socket !! ;-))
  • NOTE : we're not running for "high numbers" here, there was a long and hard battle to see performance improvement since a low load as well, so a positive difference is already seen on 8 users, and even more seen since 16 ;-)

However, this is not the only benefit.. -- the story is going way more far, because all this deep remastering is potentially allowing us to get a rid of all the overhead we see when READ COMMITTED (RC) transaction isolation is used.. - example of such an overhead you can see from here :
- http://dimitrik.free.fr/blog/archives/2015/02/mysql-performance-impact-of-innodb-transaction-isolation-modes-in-mysql-57.html

and this was one of many reasons why REPEATABLE READ (RR) transaction isolation is historically used within InnoDB by default (and still continue to be the default in MySQL 5.7)..

While potentially with 8.0-dev the things could finally change ;-)

The following graphs are representing the results from Sysbench OLTP_RW workload comparing MySQL 5.7 and 8.0-dev :

Observations :
  • first the test is executed with RR isolation on both 5.7 and 8.0-dev
  • yes, 8.0-dev is doing better than 5.7, but the most important is the next ;-)
  • the next step both are executed with RC isolation
  • and then you can see TPS on 5.7 going lower..
  • while on 8.0-dev TPS level remains just the same on RC as on RR !! ;-)
So, crossing fingers, if all is going as expected, this could allow us for the first time to have RC isolation in InnoDB as default.. -- what does it mean for all MySQL users and their apps ?.. => huge overall experience changes + less headache + less locking conflict, etc. etc. etc. (and to not go very far, just to mention the yesterday's article from Alex : https://www.percona.com/blog/2017/05/08/chasing-a-hung-transaction-in-mysql-innodb-history-length-strikes-back/)..

Well, work in progress, crossing fingers, stay tuned ;-))

the last remark: just to bring your attention once more that MySQL 8.0 is moved to have UTF8 charset by default !!! => so, if you're planning to run any tests with 8.0 DMR, please, mind to remember this !! - so use UTF8 on both sides (server and client) and consider the UTF8 overhead, or switch the server back to latin1 if your "client" apps are latin1.. -- otherwise you'll not be happy ;-))

more benchmark results and further observations about MySQL 8.0-dev you can find in my slides :
- MySQL-8.0-dev-Benchmarks-Scalability-Apr.2017-dim.pdf

Thank you for using MySQL ! and Go MySQL !! ;-))

Rgds,
-Dimitri
Posted by Dimitri at 21:05 - Comments...
Categories: MySQL

Friday, 14 April, 2017

MySQL Performance: 8.0-dev Progress..

As you already know, a new MySQL-8.0 milestone release is available (and hope you did not miss all the news coming from MySQL Server Team site - starting by what's new article and followed by many others (and you'll see yet more to come ;-))..

There are also many good changes improving overall MySQL 8.0 Performance. However to see a real boost on OLTP workloads you'll need to have little bit more of patience.. -- we're attacking InnoDB fundamentals.. -- the parts of design which are probably remained mostly unchanged since InnoDB creation ;-)) -- you can easily understand that such a work has a long road from idea/ prototype to a final release.. On the same time our "Preview" results are looking very encouraging, and I'll be happy to say you more about during my talk @PerconaLive, 27/Apr 1:50pm :

- https://www.percona.com/live/17/sessions/mysql-80-performance-scalability-benchmarks

to give you a first idea about what is coming, I'll post here a singe "teaser" graph about a pure UPDATE performance on 22cores-HT (1CPU socket), then 44cores-HT server (2CPU sockets (kind of today's "commodity" HW ;-)) -- the graph is taken from the results comparing MySQL 5.7 vs current 8.0 vs 8.0-dev (which is yet in dev progress), but instead of TPS/QPS you can see here just levels about InnoDB internal contentions :

Your browser is not supporting SVG..



to help you little bit to find the diff within these graphs, in short :

  • yes, log_sys contention is gone in MySQL 8.0-dev ;-)
  • and log_write too..
  • and trx_sys..
  • and lock_sys..
  • and...

If it's saying you something, you could already get a first idea about what kind of TPS boost you may expect and what kind of possibilities it opens.. ;-) -- but if not, don't worry.. -- I'll tell you all this during my talk in 2 weeks ;-)

(yes, slides will be published, no worry; and yes, this article will be also updated with more details)

until then, stay tuned.. ;-))

And THANK YOU for using MySQL !!!

UPDATE: the follow up article is here - http://dimitrik.free.fr/blog/archives/2017/05/mysql-performance-80dev-progress-details.html

Rgds,
-Dimitri

Posted by Dimitri at 2:10 - Comments...
Categories: MySQL