The following is a short HOWTO about deployment and use of Benchmark-kit (BMK-kit). The main idea of this kit is to simplify your life in running various MySQL benchmark workloads with less blood and minimal potential errors.
Generally as simple as the following :
$ bash /BMK/sb_exec/sb11-Prepare_50M_8tab-InnoDB.sh 32 # prepare data $ for users in 1 2 4 8 16 32 64 128 256 512 1024 do # run OLTP_RW for 5min each load level.. bash /BMK/sb_exec/sb11-OLTP_RW_50M_8tab-uniform-ps-trx.sh $users 300 sleep 15 done
Preface
I'm seeing the new (Lua-based) Sysbench since the v.1.0 as a load generator "platform" (and no more as like just yet another test app). All proposed tests in the kit are running on Sysbench, so all results outputs are similar, and nothing more than single Sysbench binary is required to run any of them. The provided binary of Sysbench in the kit is statically compiled with MySQL client lib, so generally it should just work.
This kit and its workload scenarios are proposed "as is" and "as I prefer" them. In case you do not agree with something or prefer other approaches in testing -- feel free to implement yourself whatever you desire, but don't expect me to debate and defend any detailed point, etc.. I'm just doing things my way and sharing it here in case it can be useful for someone else as well. But I'm not pretending on absolute truth, and you're always free to do it your way, and this is where the beauty of life is coming.. ;-))
Setup BMK-kit
The test kit requires minimal configuration, but has several "configurable" places in its scripts. For example using /BMK
directory by default as its main home place -- but can be easily changed via BMK_HOME
env. variable, etc..
So, I'll use in all the following steps
/BMK
as default home, just mind that you can deploy the toolkit anywhere and just pointBMK_HOME
env. variable to your directory.
So far, to deploy and setup the stuff just do the following :
-
download the
BMK-kit.tgz
tarball from : http://dimitrik.free.fr/BMK-kit.tgz -
deploy it anywhere, or from
/
directory asroot
user :
# cd / # tar xzf /path/to/BMK-kit.tgz
- this will create
/BMK
directory and deploy all the needed scripts inside -
now you'll need to edit
/BMK/.bench
config file to provide connection details to your MySQL Server (user
,password
,host
, IPport
, path to UNIXsocket
anddb
name) -
NOTE : by default Sysbench will use client connection with "native" password to be compatible with older MySQL versions, but for MySQL 8.0 you'll need to use user account created with
mysql_native_password
authentication plugin (or simply set is as global default just for the time of testing)
for example :
mysql> CREATE USER 'dim'@'%' IDENTIFIED WITH mysql_native_password BY "dim";
or my.conf :
[mysqld] default_authentication_plugin=mysql_native_password ...
-
you may also need to re-assign (if needed) the name of the Sysbench binary you're willing to use (all current scripts will use
sb11
(codename for sysbench v.1.1) and by default it'll point tosysbench-1.1-new2020
binary (Linux/x64), but there is also another one with SSL support (see: Special Sysbench binary for MySQL 8.0, below), and also same pair of binaries for Linux/ARM64 -
sb11_MAX
value : this value in.bench
file is added for cases when very large number of user connections in MySQL should be tested (more than 10K for ex.) -- because sometimes there could be something going odd on the system or on Sysbench itself, and impact all your tests, and periodically it could not be able to connect anymore any new users (or disconnect, or whatever). One of the workarounds to run such high load test workloads could be to start more Sysbench processes in parallel, andsb11_MAX
is here to say how many users per single Sysbench process should be assigned (currently it's intentionally set by default to a very big value, so just a single Sysbench process will be used by default, but you can change it to a lower value (for ex. 1000) if you're hitting this kind of problems).
So far, at this step you're ready to start your first Benchmark Workload ! -- be sure your MySQL Server is running, create your database, and follow the next steps.
The steps for any test workload in BMK-kit are all similar :
- you "prepare" your database first
- then just execute a test workload you wish with loaded data
Every script name is "explicit" by itself and self-explaining what kind of test will be executed and with which options, etc. to reduce potential level of errors to the minimal..
Localhost & MySQL
In case you're not aware, MySQL has few historical config agreement which can be pretty confusing, but we have to keep them in the code due backward compatibility :
- when you set
localhost
for host name => this means user connections go via UNIX socket - and using
127.0.0.1
for host name => user connections go via local host IP stack (loopback)
just always be sure what exactly type of connections you want to test, because communications via UNIX socket are going faster (sometimes much faster, up to 50% gain), but this can be totally unrealistic for your final results, if at the end you're planning to use TCP/IP connections. From the other side, using UNIX socket allowing you to skip IP stack and then probably better see other bottlenecks, etc.. -- all this is up to you, just be always aware about what you're doing :-))
NOTE : the full difference in performance when connections are going via UNIX socket -vs- IP loopback still remains unexplained, but seems to be pure SW problem in Linux kernel, as similar gaps in performance are observed on both Linux on x64 and ARM64 -- for ex. see here : http://dimitrik.free.fr/blog/posts/mysql-performance-80-ga-ip-port-vs-unix-socket-impact.html
Special Sysbench binary for MySQL 8.0
By default the Sysbench binaries I'm shipping with BMK-kit are compiled statically (so you don't need any additional libs) and they are supposed to work with all MySQL versions starting from MySQL 5.5 and later. However, MySQL 8.0 introduced several new features, and one of them is caching_sha2_password
identification method, which did not exist before. And as at some point I was needing to test this feature and also combine it with SSL, I came to conclusion to have a special Sysbench binary for MySQL 8.0.
The name of this binary is sysbench-1.1-new2020-mysql80-ssl
and it was compiled with openssl-1.1.1
libs which is allowing it to work with both openssl-1.1.1
and openssl-1.0
libraries, regardless what your MySQL Server is using. However, in case you need explicit Sysbench binary using only one of the OpenSSL libs, there are 2 binaries : sysbench-1.1-new2020-mysql80-ssl10
and sysbench-1.1-new2020-mysql80-ssl111L
(respectively compiled with OpenSSL-1.0 and 1.1.1L).
NOTE : binaries compiled with OpenSSL-1.1.1 are also perfectly working with MySQL Server running with OpenSSL-3.0, just in case.
All these binaries are not compiled statically, so they need MySQL client lib libmysqlclient.so.21
which is shipped within /BMK/lib-*
directories, and .bench
file has additional LD_LIBRARY_PATH
settings adding required /BMK/lib-*
to the path (it ships also OpenSSL libs as well), so all you have to do to get it working is to enable the right path for the libraries to use :
/BMK/.bench
... #----------------------------------------- # MySQL 8.0 Client and SSL Libs.. # # use openssl-1.0 ## SSL=ssl10 # use openssl-1.1.1l SSL=ssl111L # set LD libs path accordingly export LD_LIBRARY_PATH=$BMK_HOME/lib-$SSL:$LD_LIBRARY_PATH # ...
- use
SSL=ssl10
if you test connections withopenssl-1.0
- use
SSL=ssl111L
if you test connections withopenssl-1.1.1l
(default)
And one big advantage with this dedicated binary is that for running tests via SSL connections all you need is just to add to each test script the --mysql-ssl=REQUIRED
option and then it just works ! (e.g. no need to provide keys / certificates / etc.)
In Short
In short all you need to do to run your tests on MySQL 8.0 with SSL connections there are just two steps to follow :
1) in .bench
file you need to uncomment "special" binary for MySQL 8.0 :
# -- special MySQL 8.0, with SSL 1.0 or 1.1.1l, dynamic MySQL Client lib -- sb11=$BMK_HOME/sysbench-1.1-new2020-mysql80-$SSL
2) then additionally use --mysql-ssl=REQUIRED
option in every script, like this :
$ time bash /BMK/sb_exec/sb11-Prepare_10M_8tab-InnoDB.sh 32 --mysql-ssl=REQUIRED $ bash /BMK/sb_exec/sb11-OLTP_RW_10M_8tab-uniform-trx.sh 128 120 --mysql-ssl=REQUIRED
and so on.. ;-))
Sysbench "Original" Workloads
First of all you'll need to load the data. I'll show here the most common tests scenarios, but you can explore further and choose whatever you like.
Sysbench "original" workload scenarios are pretty good "entry ticket" test cases to evaluate your MySQL instance capacity, and as well the system you're using, and finally your platform as the whole.
The data set with 8 tables having 10M rows each (10Mx8tab) is the most common for the start point as it's not too small (to not just run on CPU cache only) and not too big (to not involve too much IO activity when you're having writes). The dataset will represent something like 20GB of data, so using BP size of 32GB is more than enough to keep all data in memory.
Preparing 10Mx8tab data set
To prepare your data, you'll need to just execute the following :
$ time bash /BMK/sb_exec/sb11-Prepare_10M_8tab-InnoDB.sh 32
NOTE : 32 -- is the number of user connections to be used for parallel data load (I've modified original Sysbench scripts to make it possible, so in the above example there will be 4 user connections loading each table in parallel). You can increase this number if your system capacity is allowing and if load will go faster (you need to measure yourself to see which level of parallelism is the most optimal in your case). By default all "Prepare" scripts are using 32 user connections.
Read-Only Point-Selects Workload
Point-Select workload is the most simple one, it's doing pure "key => value" lookups which are testing the full stack of MySQL code for overall code efficiency and best possible query execution latency. Staying fully in memory, it's the most simple way to evaluate your system setup and its scalability limits. Only RAM access and CPU power will be involved, so generally if you're already observing some performance issues in this test workload -- most often is the used system / platform to blame ;-))
A simple example how to run the test :
$ for users in 1 2 4 8 16 32 64 128 256 512 1024 do bash /BMK/sb_exec/sb11-OLTP_RO_10M_8tab-uniform-ps-p_sel1-notrx.sh $users 300 sleep 15 done
Comments :
- in this example we execute point-select workload with growing load levels
- starting with 1 user connections, we're going then with 2, 4, and so on up to 1024 concurrent users, where 300 is execution time (5min) for each level
-
sb11-OLTP_RO_10M_8tab-uniform-ps-p_sel1-notrx.sh
is the script executing required test workload (and all other test scripts are auto-generated in the same manner) - for execution it requires 2 command line arguments : number of user connections to use (1st arg.) and execution time (2nd arg.) -- after what you can provide few more "specific" arguments (if required) -- see more about later..
- few comments about the abbreviations used in the "name" of the script :
-
sb11
-- means script will be executed with sysbench-1.1 (sb11 value in.bench
config) -
OLTP_RO_10M_8tab
-- Read-Only workload, 8 tables with 10M rows each -
uniform
-- using "uniform" access pattern -
ps
-- using prepared statements -
p_sel1
-- using only 1 "point-select" query from OLTP_RO scenario -
notrx
-- don't use transactions
-
Similar "logic" is used in all other scripts for test workloads, so by the name of the script you can directly say what kind of load scenario will be generated (like if you have the same script, but without ps
-- this will mean it'll execute the same workload, but without prepared statements, or if you'll have socket
in the script name, you'll know that instead of IP port (default) it'll use UNIX socket instead, and so on)..
OLTP_RO
OLTP is a "mixed" Read-Only test workload and more "complex", as it consists of :
- 10 point-select queries
- 1 simple range query
- 1 ordered range query
- 1 sum range query
- 1 distinct range query
e.g. 14 queries in total. And if point-selects are mostly not "sensetive" to wrong configurations as long as you used big enough BP size to keep data in memory (32GB), all other queries are not so ;-))
For example :
- latin1 or UTF8 charsets : there is a significant difference if you're using latin1 or UTF8 charset for your data (the overhead of UTF8 before MySQL 8.0 was much more bigger -- see the following articles for more details : http://dimitrik.free.fr/blog/posts/mysql-performance-80-and-utf8-impact.html and http://dimitrik.free.fr/blog/posts/mysql-performance-80-ga-more-in-depth-latin1-utf8mb4.html)
- but additionally if you use UTF8 (default since MySQL 8.0) and your
sort_buffer
will not be configured big enough, you'll get horrible experience ;-)) (256KB value is ok for the current example) - you need to pay attention to malloc library you're using (my current choice is
tcmalloc-4.4.5
which is coming with OL7 repo) - NOTE : distinct range query for ex. can be very aggressive on memory allocations, as well on Temp table space usage, take care ;-))
A simple example now how to run OLTP_RO test :
$ for users in 1 2 4 8 16 32 64 128 256 512 1024 do bash /BMK/sb_exec/sb11-OLTP_RO_10M_8tab-uniform-ps-notrx.sh $users 300 sleep 15 done
as you can see, the only difference comparing to "point-select" execution is missed p_sel1
in the name of the script ;-))
NOTE : Read-Only workloads are generally used without transactions and the results are reported as QPS (query/sec). If for some reasons you need to use transactions when investigating this workload performance, mind that
begin
andcommit
are reported by Sysbench as queries, and you need to discard them from final QPS number by yourself to not report "fake" results ;-))
OLTP_RW
OLTP Read+Write is using transactions, and within the same transaction it's executing all Read-Only queries from OLTP_RO, and then 2 UPDATEs, 1 INSERT, and 1 DELETE queries. Many MySQL configuration / tuning options will come in the game here, but first of all -- the Storage you're using on your system !
A simple example now how to run OLTP_RW test :
$ for users in 1 2 4 8 16 32 64 128 256 512 1024 do bash /BMK/sb_exec/sb11-OLTP_RW_10M_8tab-uniform-ps-trx.sh $users 300 sleep 15 done
UPDATE-NoKEY (update no index)
If Point-Select workload is the most aggressive Read-Only workload, the UPDATE-NoKEY test is the same for Write-Only :
- it's not changing any index data (NoKEY / no index)
- so, all UPDATEs of data are happening in-place
- only full stack of transaction management is tested for its efficiency
- the thing were improved a lot since MySQL 8.0
- but we're still NOT scaling on this workload due present
trx_sys
contention (work in progress) - so, you should not be surprised to see better results on 1CPU Socket -vs- 2CPU Sockets
- this workload is executed without transactions
- pure UPDATE "bombarding"
A simple example how to run UPDATE-NoKEY test :
$ for users in 1 2 4 8 16 32 64 128 256 512 1024 do bash /BMK/sb_exec/sb11-OLTP_RW_10M_8tab-uniform-upd_noidx1-notrx.sh $users 300 sleep 15 done
and yes, the main difference with OLTP_RW script is this presence of upd_noidx1
in the script name, saying to executed only "update no-index" query from OLTP_RW scenario..
And if you will replace upd_noidx1
by upd_idx1
-- you're be running UPDATE-KEY (update index) workload. Which is no more "in-place" UPDATE as it'll touch secondary index values and involve yet more internal locks and contentions.
50Mx8tab data set
So far, 10M x 8 tables data set is pretty good for quick (generally "in-memory") MySQL / system evaluations. But to go more far and test it all deeper, you can switch to 50M rows pet table (still with 8 tables), which sill result in over 100GB data space and allow you to test :
- in-memory workload with 128GB BP size (check if QPS / TPS will still be the same as with 10M rows before, and if it's not so => try to explain "why" ;-))
- partially IO-bound with 64GB BP size
- and heavily IO-bound with 32GB BP size (and will also fully evaluate your storage performance)
To execute any similar test as before, but now with 50M rows tables -- just change 10M
to 50M
in script names from above examples, and you're done ;-))
NOTE : same by using 500M in the script names you'll be using 1TB of data in all your tests, which is quickly becomes more challenging, specially if your system is limited in RAM.
Data Access Pattern
Sysbench in parallel with "uniform" access also allowing "pareto" and few other options for access pattern. Historically the "uniform" is NOT the default in Sysbench, which over time resulted in many fake results obtained and published by various vendors..
For me the most interesting and worth for using are the following 3 ones :
-
uniform
-- full random and uniform access, wide bombarding, hard life in IO-bound -
pareto
-- random, but "grouped" access, going around not far from the same rows, creating concurrent access to the same rows, which involves row locks and scalability limits -
zipfian
-- some rows are accessed more often than others, but in a more wider way, so generally not creating row locking during execution
At the end of this article you can see the benchmark results obtained with all available data access options in Sysbench within different test conditions (8 tables -vs- single table, in-memory -vs- IO-bound, etc.) -- I hope from the results it'll be more clear why I selected only these 3 options above to be used.
New Extensions For "Original" Sysbench Workloads
I've added few more extensions to "original" Sysbench scripts :
-
--table-name=name
: base name to use for table names (def:sbtest
) -
--mysql-table-partitions=N
: use N range(id) partitions for each table (def: 0) -
--mysql-table-compression=name
: extra table transparent compression option, ex.:lz4
-
--rnd-data=N
: percentage (%) of random data in row values : 1 .. 100 (to be able to evaluate higher/lower compression possibilities, def: 100) -
--extra-cols=N
: add N extra columns to table(s) -
--extra-cols-type=TYPE
: data type to use for extra columns (ex. VARCHAR(32)) -
--extra-cols-default=value
: default value to use for extra columns -
--extra-cols-options=...
: extra columns options -
--select-star=on/off
: useSELECT * ...
in point-selects and ranges (def: off) -
--rnd-loop=N
: use random-loop instead of rand() : 0=def, 1=chunk, 2=shift, 3=loop (def: 0) -
--update-range-size=N
: allow range UPDATEs (instead of single row UPDATE by default) -
--load-mode=name
: initial Data Load Mode [original, parallel, parallel_ordered] (def: parallel) -
--load-bulk-size=N
: number of rows to use in bulk INSERTs during parallel* Load of data (def: 1000) -
--trx-retry=on/off
: replay the same transaction again in case of error/ ROLLBACK/ DEADLOCK (def: off) -
--mysql-session-options=...
: extra session options- ex.:
--mysql-session-options="set session sort_buffer_size=1000; set session join_buffer_size=256000
- in fact there can be any kind of SQL statement(s) separated by
;
which will be executed on the start of each user connection
- ex.:
-
--mysql-query-hint=...
: use the given Query Hint in all queries- ex. :
--mysql-query-hint="MAX_EXECUTION_TIME(100)"
- ex. :
-
--sleep-before-commit=N
extra option for more advanced test scenarios to simulate longer data locking by sleeping N usec. before COMMIT (def: 0) -
--sleep-after-query=N
extra option for more advanced test scenarios to simulate longer user's "think time" by sleeping N usec. after each query (def: 0)
You can add any of these options as extended arguments to the scripts, for ex. :
$ time bash /BMK/sb_exec/sb11-Prepare_10M_8tab-InnoDB.sh 32 \ --rnd-data=50 --mysql_table_compression=lz4 $ for users in 1 2 4 8 16 32 64 128 256 512 1024 do bash /BMK/sb_exec/sb11-OLTP_RW_10M_8tab-uniform-ps-trx.sh $users 300 \ --rnd-data=50 --mysql-table-compression=lz4 sleep 15 done
Re-Connect Workloads
Historically there was always a need to test OLTP workloads which are constantly re-connecting to MySQL Server (like Web servers executing PHP queries and disconnecting once the page is generated, and so on). I the past the real need for such evaluation was mostly to know how many re-connects/sec we can deliver with MySQL -- e.g. the idea is to :
- connect
- execute a quick SELECT
- disconnect
And for this kind of test scenarios we were using Point-Select workload with re-connect after each SELECT query. To cover this need, I've implemented and added explicit Lua script doing Point-Selects with re-connects and corresponding Shell scripts (may add more reconnect workloads later, let's see..)
For ex. to start Point-Selects re-connect on 10Mx8tab data set :
$ for users in 1 2 4 8 16 32 64 128 256 512 1024 do bash /BMK/sb_exec/sb11-OLTP_RO_10M_8tab-uniform-p_sel1-reconnect-notrx.sh $users 300 sleep 15 done
IMPORTANT
- makes no sense to use prepared statements on re-connect workloads, this option is ignored
- mind to allow TCP reuse on your "client" host to not run out of sockets :
$ sudo sysctl net.ipv4.tcp_tw_reuse=1
TPCC
TPCC workload was ported by Percona Team to Sysbench-1.1 (Lua scrips) which is simplified a lot any further investigations on this workload and hacking the code to check various test conditions. As the test is executed by Sysbench, the use steps are exactly the same as with other Sysbench scripts (and way more simple than historically used DBT2 open source implementation of TPCC, etc.)
Finally, this is allowed to propose a "workaround" to by-pass excessive index lock contentions involved in "original" TPCC implementation and make it also part of the test cases -- see this article for more details : http://dimitrik.free.fr/blog/posts/mysql-80-tpcc-mystery.html
So far, you can now use both variations -- using NULL or DEFAULT value in queries.
NOTE : to use NULL variation you need to explicitly use scripts having "-NULL" in its name, which is not by default..
Preparing TPCC data set
The "size" of data sets in TPCC is "measured" in number of "data-ware-houses" (DWH) created in database. This number is labeled as W
(e.g. 1000W means thousand DWH, and so on)..
Historically, MySQL was not scaling really well on TPCC workload, and to lower internal MySQL contentions, but still run TPCC workload, it was possible to run several TPCC tests in parallel against the same MySQL Server instance (this was at least allowing to see if "generally" MySQL will be ever able to keep higher load once internal contentions will be fixed). The same approach was implemented in Sysbench-TPCC scripts, but in a yet more simple way -- you can simply use more data sets in parallel from the same Sysbench process (like if you have several TPCC databases in the same MySQL instance, and TPCC will use them all in parallel).
The most common use cases are to use 1 or 10 TPCC data sets.
To prepare your data for a single TPCC data set with 1000W, just execute the following :
$ time bash /BMK/sb_exec/sb11-Prepare_TPCC_1000W-InnoDB-NoFK.sh 32
And for x10 TPCC data sets with 100W (to have the same 1000W data volume in total), execute the following :
$ time bash /BMK/sb_exec/sb11-Prepare_TPCC_10x100W-InnoDB-NoFK.sh 32
Few comments :
-
NoFK
-- historically most common use case for TPCC workload on MySQL was "without foreign keys" (FK), you can skip it in script name to test "with FK" -
NoFK2
-- same asNoFK
, but also not creating secondary indexes which are supposed to be used by FK -
NULL
-- (when present in script name) means to use NULL values (as original TPCC), otherwise use DEFAULT values in db schema and queries, which is lowering index lock contentions and allowing better overall scalability (can be easily adapted for any production workload, etc.)
for ex. if you really willing to use NULL values :
$ time bash /BMK/sb_exec/sb11-Prepare_TPCC_1000W-NULL-InnoDB-NoFK.sh 32
NOTE : generally it makes NO SENSE to test TPCC workload with data set smaller than 1000W in total ! -- adopt from the beginning your testing with 1000W (or more) data to not waste your time and base your conclusions on useless results.. -- However, if your target is to reproduce a test workload with high level of "data concurrency" (e.g. many users will be fighting for access to the same data), then using smaller number of warehouses makes perfectly sense ! -- e.g. with 100W data set, for ex., you'll get exactly this ! -- so, the main message here is probably "always understand what you're doing" and then everything should be just fine ;-))
Run TPCC Workload
Finally the most simple step in the whole story :
$ for users in 1 2 4 8 16 32 64 128 256 512 1024 do bash /BMK/sb_exec/sb11-TPCC_1000W.sh $users 300 sleep 15 done
or for 10x100W data set :
$ for users in 1 2 4 8 16 32 64 128 256 512 1024 do bash /BMK/sb_exec/sb11-TPCC_10x100W.sh $users 300 sleep 15 done
NOTE : just mind to respect the same "NULL or not to NULL" naming as you used in your "Prepare" script initially, and also the data set size, and you're then all good ;-))
New Extensions For "Original" TPCC Workload
I've added few more extensions to "original" TPCC scripts :
-
--trx-retry=on/off
: retry the same transaction again on ROLLBACK / error (def: off) -
--trx-debug=N
: debug mode - execute only one from 5 transactions (--trx-debug=[1-5]), (def: 0 (off)) -
--use-fk=N
: use foreign keys -- 0:no but sec.idx, 1:yes+sec.idx, 2:none (def:1) -
--for-update=N
: use FOR UPDATE -- 0:no, 1:yes, 2:enforced (def:1) -
--force-primary=0/1
: force PRIMARY INDEX in some queries -- 0:no, 1:yes (def:0) -
--mysql-table-compression=name
: extra table transparent compression option, ex.: 'lz4' -
--mysql-session-options=...
: extra session options- ex.:
--mysql-session-options="set session sort_buffer_size=1000; set session join_buffer_size=256000
- in fact there can be any kind of SQL statement(s) separated by
;
which will be executed on the start of each user connection
- ex.:
Same as for "original" Sysbench workloads, you can add any of these options as extended parameter for test scenario scripts.
dbSTRESS
This benchmark test case has a long history and based on a true customer's workload. Surprisingly, in the past, it helped to point on many deep problems in MySQL / InnoDB design. However, over a time systems and MySQL itself became much faster, and the old original implementation started to be too slow and inefficient. So, once Sysbench-1.1 appeared, it was "natural" for me to plan to port dbSTRESS workload to Sysbench. Now it's done ! ;-))
Database schema in db_STRESS is composed only from 5 tables and simulating sort of library (or stock) object(s) placement :
- STAT
- HISTORY
- OBJECT
- SECTION
- ZONE
And there are the following relations ("by reference IDs" ) between these tables:
[STAT] <--(1:M)-- [HISTORY] --(20:1)--> [OBJECT] --(N:1)--> [SECTION] --(P:1)--> [ZONE]
For each OBJECT record there are 20 records of HISTORY (kind of historical trace of the last 20 changes). While the number of records in STAT, SECTION and ZONE tables is fixed:
- STAT: 1000 rows
- SECTION: 100 rows
- ZONE: 10 rows
STAT records are used as a "tag" (or status) for HISTORY records. Each HISTORY record may have different "tag" than others, and any HISTORY record can change its "tag" over a time.
Then, SECTION with ZONE records are defining OBJECT "placement" -- there are 10 Zones, and each Zone has 10 Sections, so 100 Sections in total. And any given OBJECT belongs to one of these Sections. None of OBJECT records can change its "placement".
Considering the fixed sizes of STAT, SECTION and ZONE tables, and fixed 1:20 relation between OBJECTs and HISTORY -- it's obvious that the database size is driven by the number of OBJECTs you'll create.
All relations between records from different tables are managed by corresponding "reference IDs" from each side (generally referencing "parent" PRIMARY KEY). Historically, to simplify RDBMS comparisons, no foreign key or other constrains were used. For the moment I'm leaving it as it, but over a time probably will add an option to use FK (on/off), let's see..
Similar to other Sysbench workloads you can also create several dbSTRESS "instances" to be able to "split" locks over several OBJECT tables and corresponding to them others.
Preparing Your Dataset
The logic here is pretty similar as with other workloads. For example to prepare a dataset with 10M OBJECTs you can just execute the following :
$ time bash /BMK/sb_exec/sb11-Prepare_dbSTRESS_10M-InnoDB.sh 32
Or to have 10 OBJECT tables with 1M rows each :
$ time bash /BMK/sb_exec/sb11-Prepare_dbSTRESS_10x1M-InnoDB.sh 32
Going little bit ahead, I need to mention you may hit pretty different bottlenecks during dbSTRESS workload if you're using INDEX in HISTORY tables is created as PRIMARY KEY or not (but default it's not, like it can be in many apps) -- "why" there is such an impact I let you yet to discover, but for the testing with schema having PK in HISTORY you can just use scripts having "-HPK" in the name, like :
$ time bash /BMK/sb_exec/sb11-Prepare_dbSTRESS_10M-InnoDB-HPK.sh 32
Test Scenarios
db_STRESS generates OLTP workload to stress database as much as possible. Performance level of workload is measured mainly in TPS (transactions per second), but you have also a freedom to have more or less queries per "transaction", that's why any result should be present with its scenario context to make sense (same for QPS). During any given transaction we are first randomly choosing OBJECT reference and then performing READ or READ and WRITE operations :
READ operation consists of 2 SELECT queries :
- SEL1: read related
OBJECT --> SECTION --> ZONE
data by given OBJECT ID - SEL2: read related
HISTORY --> STAT
data by given OBJECT ID
WRITE operations consists of :
- delete one HISTORY record by OBJECT ID + insert new HISTORY record for the given OBJECT
- update one HISTORY record of the given OBJECT
NOTE : delete & insert orders are always present together if executed in WRITE operation. This is done to protect your database from "intentional" constant growing in the size. However, if in your case during dbSTRESS workload you'll have InnoDB Purge lagging, you can easily hit "unintentional" database size growing due increase of UNDO space.
Additionally you can precise the ratio between dbSTRESS transactions having only READ operation, or both READ & WRITE -- this is allowing you to make the test case more or less aggressive on data writes.
By default the following test scenarios are proposed :
-
sb11-dbSTRESS_10M-RO-notrx.sh
: read-only with SEL1 and SEL2 queries -
sb11-dbSTRESS_10M-RO-SEL1-notrx.sh
: read-only with SEL1 query -
sb11-dbSTRESS_10M-RO-SEL2-notrx.sh
: read-only with SEL2 query -
sb11-dbSTRESS_10M-RW1-trx.sh
: read+write, R/W ratio 1:1 -
sb11-dbSTRESS_10M-RW10-trx.sh
: read+write, R/W ratio 10:1 -
sb11-dbSTRESS_10M-RW1-UPD-trx.sh
: read+write, but write = UPDATE only, R/W ratio 1:1 -
sb11-dbSTRESS_10M-RW10-UPD-trx.sh
: read+write, but write = UPDATE only, R/W ratio 10:1
Comments :
- hope it was obvious that
10M
in the script name means 10M OBJECT records -
RO
: read-only, and if addedSEL1
orSEL2
then it's with inly given SELECT query -
trx
ornotrx
: about use or don't use transactions (BEGIN/COMMIT) -
RW
means read+write followed by R/W ratio (RW1
= 1:1,RW10
= 10:1, and so on) - and if there is
UPD
i name, then only execute UPDATE query in WRITE operation
And to make it more complex :
-
xDEAD
: don't use "anti-deadlock" limits in ID ranges -
xREF
: allow READ/WRITE operations to use different OBJECT IDs within the same transaction
Example of dbSTRESS Workload
Preparing 10M dataset :
$ time bash /BMK/sb_exec/sb11-Prepare_dbSTRESS_10M-InnoDB.sh 32
Running Read-Only workloads :
$ for users in 1 2 4 8 16 32 64 128 256 512 1024 2048 do bash /BMK/sb_exec/sb11-dbSTRESS_10M-RO-notrx.sh $users 300 sleep 15 done $ for users in 1 2 4 8 16 32 64 128 256 512 1024 2048 do bash /BMK/sb_exec/sb11-dbSTRESS_10M-RO-SEL1-notrx.sh $users 300 sleep 15 done $ for users in 1 2 4 8 16 32 64 128 256 512 1024 2048 do bash /BMK/sb_exec/sb11-dbSTRESS_10M-RO-SEL2-notrx.sh $users 300 sleep 15 done
Running Read-Write workloads :
$ for users in 1 2 4 8 16 32 64 128 256 512 1024 2048 do bash /BMK/sb_exec/sb11-dbSTRESS_10M-RW1-trx.sh $users 300 sleep 15 done $ for users in 1 2 4 8 16 32 64 128 256 512 1024 2048 do bash /BMK/sb_exec/sb11-dbSTRESS_10M-RW10-trx.sh $users 300 sleep 15 done
Full dbSTRESS options for Sysbench
There are also other options which you can add when you're using the above scripts, or executing dbSTRESS scripts directly with Sysbench :
-
--obj-table-size=num
: number of rows in OBJECT table (def:1000000) -
--obj-tables=num
: number of OBJECT tables (def:1) -
--mysql-storage-engine=name
: Storage Engine to use for tables (def:InnoDB) -
--mysql-table-options=ops
: extra table options, ex.: 'organization=heap' -
--ordered-load=on/off
: initial Load of data is ordered by PK (def:off) -
--SEL1=num
: number of SEL1 queries per transaction (def:1) -
--SEL2=num
: number of SEL2 queries per transaction (def:1) -
--updates=num
: number of UPDATE queries per transaction (def:1) -
--delete-inserts=num
: number of DELETE/INSERT combination per transaction (def:1) -
--rw-ratio=num
: ratio between RO and RW transactions (def:1) -
--hpk=on/off
: use PRIMARY KEY for HISTORY table (def:off) -
--anti-dead=on/off
: anti-deadlock, each session is using its own REF range (def:on) -
--same-ref=on/off
: use the same OBJECT REF value within the same transaction (def:on) -
--skip-trx=on/off
: execute all queries in the AUTOCOMMIT mode (def:off) -
--for-update=on/off
: use FOR UPDATE in all SELECTs (def: off) -
--mysql-table-compression=name
: extra table transparent compression option, ex.: 'lz4' -
--mysql-session-options=...
: extra session options- ex.:
--mysql-session-options="set session sort_buffer_size=1000; set session join_buffer_size=256000
- in fact there can be any kind of SQL statement(s) separated by
;
which will be executed on the start of each user connection
- ex.:
So, for example, if I want to force my default Read-Write test to also use FOR UPDATE in all SELECT queries, I can just add --for-update=on
to the args :
$ for users in 1 2 4 8 16 32 64 128 256 512 1024 2048 do bash /BMK/sb_exec/sb11-dbSTRESS_10M-RW1-trx.sh $users 300 --for-update=on sleep 15 done
SYNC_file Module
The SYNC_file.lua
module was added in BMK-kit to provide a simple way to synchronize the start from many threads, or many Sysbench processes, or many remote client hosts -- all expected to start the given test on the same time.
This module is integrated to all above test workloads and works as the following :
- for any test script you can provide additional option
--sync-file=filename
- NOTE : this "filename" should have full path
- if this option is used, the test script will create the given file once all Sysbench threads are finished their initialization and all connected to MySQL Server
- after what, all Sysbench threads will spin in wait while the given sync-file is not removed.. -- once removed, all the threads are released to run !
- by default, threads are looping with 10ms sleeps on the file check
- however, on small systems with many threads started can create high CPU usage
- but you can change the default spin time via
--sync-wait=N
(N milliseconds)
Let's get a look on this by example with 512 users TPCC-1000W workload using /tmp/start
file for synchronization and 20ms spin loops :
$ bash /BMK/sb_exec/sb11-TPCC_1000W.sh 512 300 --sync-file=/tmp/start --sync-wait=20 ... => SYNC-file : synchronization via /tmp/start => SYNC-file : ready to start..
- first you'll see the confirmation message that your sync-file is considered and used
- and once all 512 threads finished their initialization and all are connected to MySQL Server -- you'll get a message "ready to start"
- at this point all Sysbench threads are just looping in wait as long as sync-file is not removed
And once the sync-file is gone, the test will be started, and you'll see the following confirmation message:
=> SYNC-file : started !
General Notes About Test Workloads
There are few important points worth to mention, as some of them could be in disagreement with different people, while some others are simply important to keep in mind to avoid to considering "false positive" or simply "fake" results..
Single Table -vs- Many Tables
There are as many opinions on the topic as many people you'll discuss about.. -- there are even voices loudly crying that TPCC workload (just for ex.) is "obsolete" as it's not using hundreds of tables as most of production systems would have it today.. And you already can see many people running Sysbench with hundreds of small tables and so on..
Why all such claims makes no sense for me ?
- first of all you need always keep in mind "why" you're running you Benchmark Workload
- if your goal is to reproduce Production Scenario => focus your efforts on this, but reproduce it exactly to match your Production
- but if you're running "generic" Workloads with already prepared Scenarios, then try to understand each test case first
- because every "generic" Workload is here to point you on some potential problems, or simply to help you to evaluate your DB Engine, its config setup, and whole OS/HW stack you're using
- the goal of any Single Table Workload is to show you what will happen if one of your tables in your Production environment will become "hot" and get most of the concurrent access by all active users
- while Multi Tables Workloads are here to show you how overall processing will change if by some way you could "dispatch" your users to several tables instead (or force them to use different partitions of the same table, and so on)..
- however, not all test scenarios used for Benchmarking are really respecting such a "split" of activity with Multi Tables Workloads -- pretty often the choice of table to use can be totally random, which on its turn will periodically "randomly transform" your Workload into Single Table one, then Multi Table, etc.. -- to avoid such kind of "surprises", in all my scripts all users are explicitly "attributed" to given table(s) (e.g. if you run 64 users on 8 tables OLTP_RW workload you'll have 8 groups of users, where each group is using its own table only)
IO-bound Workloads
Curiously, it's very easy to get "fake" or "false positive" results on IO-bound workloads. You should be extremely attentive to what you're doing and to what is going on. And first of all always mind to test all load levels, and not just one of them (like Percona guys are doing pretty often, etc.)..
IO-Bound on Single-Table :
- the main biggest problem for Single-Table on IO-Bound will be historical File IO lock
- you can observe this as
fil_shard
mutex contention in MySQL 8.0 andfil_sys
contention in earlier versions.. - since MySQL 8.0 this historical global lock was "sharded" (so, you'll have small chances to observe this contention in Multi-Table IO-Bound)
- however, in earlier versions it's just a global lock involved for every File IO operation, which can quickly hurt you and having faster storage could not help..
NOTE : the same problem can still happen in MySQL 8.0 on Multi-Table IO-Bound if you'll place all your tables to the same file (tablespace) -- this is the reason to use dedicated files for all your tables which are having high IO activity !
IO-Bound on Multi-Tables :
- the biggest problem with Multi-Table IO-Bound workloads is to get fake results..
- how can you recognize the result is "fake" ?
- first of all attentively look on your IO traffic ! -- if you see your TPS going up, but NOT your IO activity, there are huge chances your result is totally fake..
- why such can happen ?
- it's very easy on Multi-Table workload to get one of the tables to be better "cached" in InnoDB Buffer Pool, so all queries to this table will go from memory and executed way faster than any other ones, which will false the "global picture" giving you impression of improved performance, while it'll be not so at all ;-))
- to check if performance was really improved, try to replay the same workload with the same global data volume, but on Single-Table -- this will give you the final verdict
- as well mind to test many concurrent users load levels to build a more complete full picture (e.g. 1, 2, 4, .. 1024 users)
A "classic" example of "fake" IO-Bound OLTP_RW results you can see here :
- you can see a great "improvement" on 128 users here with a huge TPS jump !
- however, this TPS result is simple "fake"
- and you can clearly see that during this 128 users load level the IO activity is showing exactly opposite ;-))
- as well it's good to test all other users load level as it was done here, as you can also more clearly see that TPS result on 128 users is not matching "overall tendency"..
Since I got few more questions about "fake" IO-bound results :
- first of all -- keep in mind that the data access pattern is "uniform"
- and data set is much more bigger than BP size
- which means that there is a kind of very high probability that most of page access will involve IO read
- on the same time this is OLTP_RW workload (e.g. we have writes)
- which means that we'll quickly have many dirty pages
- so, before to be able to read a new page we'll need to find a free page in BP
- and to get a free page we need to evict one from LRU, which will quickly means to flush page first before to evict
- e.g. this will be all about IO activity
- and the main target here is "IO-bound correctness" in the test case (like if we want to evaluate a new Storage vendor, or some special FS features, etc.)
- if "correctness" cannot be confirmed -- the result is "fake" (or "false positive" if you like ;-))
Now :
- we can see that at some point TPS is going much higher, but IO activity becomes lower..
- how ever this could be possible if we're expecting to have a heavy IO-bound test case ?
- because if we're processing with a higher TPS rate, then we have yet more dirty pages, right ?
- and again, the access is "uniform", e.g. probability to hit the same data is low !!
- we could suspect that just on higher load levels we have more users, so "potentially" some already cached data will be reused, and TPS will become higher, why not ?
- ok, but still more changes => more IO writes and at least similar IO reads, but it's not so..
- then, why on next load level TPS is going down ?? with more users "probability" to use cached data should be only higher, so what ?
- in fact the main reason in this TPS gain is the following :
- we're using 8 tables, and in some point of time one of the tables becomes more cached in BP than others
- so, users which are working with this table will get higher rate of COMMITs
- thus the global TPS will be higher
- while users working with less cached tables will get their TPS lower
- and this was easy to observe in the past when Sysbench could work only with 1 table, so to work with 8 tables we needed to start 8 Sysbench processes in parallel
- all in all, IO-bound "correctness" is not respected here, and the given TPS result is "fake" and should be ignored if the goal is to evaluate Storage of IO related / FS feature..
IP port -vs- UNIX socket
The results obtained on any test workload are generally always better if UNIX socket is used instead of IP port (even it's "loopback" interface for localhost).. The same is valid for Linux on both Intel/AMD/x64 and ARM64 systems. And the gap in result may go to more than 30% difference ! -- see for ex. here : http://dimitrik.free.fr/blog/posts/mysql-performance-80-ga-ip-port-vs-unix-socket-impact.html
Currently test scripts with "socket" in names are generated only for "original" Sysbench workloads, as they are representing the most aggressive tests. I did not see any need to have it for TPCC or dbSTRESS, but this can be changed if these workloads will also see some speed-up..
NOTE : using "loopback" or UNIX socket makes sense only if you're testing your MySQL Server "locally" on the same system, but is your productions is also running "locally" ? -- and if not, then mind also to test the same, but over a real Network to be sure it'll not be your major bottleneck yet before MySQL.
When use Uniform / Pareto / Zipfian access pattern in Sysbench
Once again, always mind what exactly do you want to test, and then use the "random" option accordingly :
Uniform : totally random access of data, so no concurrent access to the same data to expect
- good for any general evaluation of your overall setup config and HW you're using
- also very good for IO-bound Workload simulations, as even with not much big dataset can give you some very representative results to evaluate your Storage, Filesystem, etc..
- as such access is mostly "predictable", this allowing you to better understand the limits of your system, tuning setup, and so on..
Pareto : this access pattern is opposite to "uniform" and will involve concurrency on data !
- very good to evaluate capabilities of DB Engine to manage data locking
- as well deadlocks detection and ability to progress even within data hot locking conditions
- particularly interesting will be to test
--trx-retry=on
additional option to see how well your DB Engine will manage users which are trying to replay the same transaction in case of rollback (some Engines are skipping deadlock detection on early stages and aborting active transaction, so if user will start new transaction with different data => this will allow to show higher overall TPS, but could be not "fair" for real-world apps where user will retry the same transaction again and again)..
Zypfian : similar to "pareto", except there will be mostly no data concurrency
- good for cases when you want to understand what kid of performance to expect from your DB Engine when not whole db data are constantly accessed, but only part of it
- this will not change anything too much if your whole db is cached in memory
- however, it'll be totally different in IO-Bound conditions :
- you can evaluate if your storage is generally already "good enough" if most of accessed data will fit your memory (BP size)
- or if you'll place most accessed data to faster storage
- etc..
All Together
To summarize now all explained above General Notes, let's get a look on how the combination of choices from the following list will impact final OLTP_RW workload performance :
- Random Pattern :
- uniform
- pareto
- special (default in original Sysbench)
- gaussian
- zipfian
- Scenario :
- Single-Table
- Multi-Table
- Workload conditions :
- In-Memory (no IO reads)
- Partially IO-bound (more than 60% of data can be cached in BP)
- IO-bound (less than 33% of data can be cached in BP)
In-Memory
OLTP_RW, 10Mx8tab (20GB) :
OLTP_RW, 50Mx1tab (12.5GB) :
- curiously, the only really impacting rand pattern is "pareto"
- the impact is much higher when single tables is used -vs- 8 tables
Still, all workloads are in-memory. What about IO-bound ?..
Partially IO-bound | BP size=64G
OLTP_RW, 50Mx8tab (100GB) :
OLTP_RW, 400Mx1tab (100GB) :
Heavily IO-bound | BP size=32G
OLTP_RW, 50Mx8tab (100GB) :
OLTP_RW, 400Mx1tab (100GB) :
Hope it's all clear now ? -- happy testing ;-))
Sysbench Output
Just in case you never seen Sysbench results output before, here are few examples about. Well, first of all, the default output "format" is exactly the same for all above mentioned workloads, which is extremely useful for understanding of obtained results ;-))
Generally, once the test is started, you'll see corresponding start messages, after what at the end you'll get a result summary reported by Sysbench, similar to the following OLTP_RW result with 1024 users :
sysbench 1.1.0 (using bundled LuaJIT 2.1.0-beta3) Running the test with following options: Number of threads: 1024 ... Threads started! ... SQL statistics: queries performed: read: 15985872 write: 4567392 other: 2283696 total: 22836960 transactions: 1141848 (3801.40 per sec.) queries: 22836960 (76028.07 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.) Throughput: events/s (eps): 3801.4034 time elapsed: 300.3754s total number of events: 1141848 Latency (ms): min: 19.91 avg: 269.10 max: 4241.68 95th percentile: 493.24 sum: 307269899.28 Threads fairness: events (avg/stddev): 1115.0859/17.93 execution time (avg/stddev): 300.0683/0.02
The most important summary result's metrics are :
- QPS (queries/sec, which is 76K in above example)
- TPS (transactions/sec, which is 3.8K above)
- Query response times :
- avg (269ms above)
- 95% percentile (493ms above)
- max (4.2 sec(!) above)
Which is generally enough to get an idea about delivered performance :
- QPS and TPS are saying you about your processing throughput
- while QPS -vs- TPS ratio gives an idea how many queries in avg were executed in transaction
- (in the above example there is roughly 20 queries per transaction)
- and response time says you about overall processing stability :
- in the above example the 95% time is x10 times lower than max
- means that max time is rather very exceptional, and not common
- from the other side, 95% time is nearly x2 times higher than avg
- means that most of response times are really around avg
- so, overall processing was rather stable, and the only we need to know if 493ms response time is acceptable in the given case, or not
However, personally the most interesting for me is also to see the real-time Sysbench result stats reported for the currently running test workload right now. This feature is involved in Sysbench by using --report-interval=N
where N
is time interval in seconds.
For example, with OLTP_RW :
$ bash /BMK/sb_exec/sb11-OLTP_RW_10M_8tab-uniform-trx.sh 256 300 --report-interval=1
which will provide you this kind of output during the run :
[ 1s ] thds: 1024 tps: 3148.59 qps: 77199.23 (r/w/o: 56873.00/13033.03/7293.21) lat (ms,95%): 549.52 err/s: 0.00 reconn/s: 0.00 [ 2s ] thds: 1024 tps: 3972.10 qps: 81953.76 (r/w/o: 56304.64/17705.92/7943.21) lat (ms,95%): 442.73 err/s: 0.00 reconn/s: 0.00 [ 3s ] thds: 1024 tps: 4359.29 qps: 84721.70 (r/w/o: 60014.04/15987.07/8720.59) lat (ms,95%): 442.73 err/s: 0.00 reconn/s: 0.00 [ 4s ] thds: 1024 tps: 4057.11 qps: 81225.16 (r/w/o: 57150.52/15964.43/8110.22) lat (ms,95%): 419.45 err/s: 0.00 reconn/s: 0.00 [ 5s ] thds: 1024 tps: 4410.87 qps: 81122.65 (r/w/o: 54418.42/17899.48/8804.74) lat (ms,95%): 475.79 err/s: 0.00 reconn/s: 0.00 [ 6s ] thds: 1024 tps: 3577.92 qps: 79383.33 (r/w/o: 58109.78/14097.70/7175.85) lat (ms,95%): 434.83 err/s: 0.00 reconn/s: 0.00 [ 7s ] thds: 1024 tps: 4176.04 qps: 81865.78 (r/w/o: 55109.52/18416.18/8340.08) lat (ms,95%): 419.45 err/s: 0.00 reconn/s: 0.00 [ 8s ] thds: 1024 tps: 4243.74 qps: 84787.79 (r/w/o: 61201.24/15087.07/8499.48) lat (ms,95%): 411.96 err/s: 0.00 reconn/s: 0.00 [ 9s ] thds: 1024 tps: 4476.26 qps: 85672.03 (r/w/o: 58498.43/18239.07/8934.52) lat (ms,95%): 356.70 err/s: 0.00 reconn/s: 0.00 [ 10s ] thds: 1024 tps: 4072.91 qps: 83152.23 (r/w/o: 59076.74/15911.66/8163.83) lat (ms,95%): 390.30 err/s: 0.00 reconn/s: 0.00 ...
- from where you're getting your current QPS and TPS
- 95% query response times
- and the presence of errors, if any
Personally, I'd suggest you to always execute any Sysbench workload with live stats, because :
- if your test for some unexpected reasons will crash at the end, you'll not get the summary
- while from the live stats output you may still get an overall idea about the final result
- as well, I'm always using 1 sec. interval for live reports
- this will help you to discover if there were any processing stalls during the test
- because having stalls in production can have much more bigger negative impact than anything else !! -- always keep it in mind..