MySQL Performance : Benchmark kit (BMK-kit)

The following is a short HOWTO about deployment and use of Benchmark-kit (BMK-kit). The main idea of this kit is to simplify your life in running various MySQL benchmark workloads with less blood and minimal potential errors.

Generally as simple as the following :

$ bash /BMK/sb_exec/sb11-Prepare_50M_8tab-InnoDB.sh 32   # prepare data

$ for users in 1 2 4 8 16 32 64 128 256 512 1024 2048
do   
  # run OLTP_RW for 5min each load level..
  bash /BMK/sb_exec/sb11-OLTP_RW_50M_8tab-uniform-ps-trx.sh $users 300
  sleep 15
done

the latest public online version of the following HOWTO is always available from here : http://dimitrik.free.fr/blog/posts/mysql-perf-bmk-kit.html

Preface

I'm seeing the new (Lua-based) Sysbench since the v.1.0 as a load generator "platform" (and no more as like just yet another test app). All proposed tests in the kit are running on Sysbench, so all results outputs are similar, and nothing more than single Sysbench binary is required to run any of them. The provided binary of Sysbench in the kit is dynamically compiled with MySQL client lib, and MySQL client libs are also shipped in the same package, so generally it should just work.

The initial idea about having a such BMK-kit is coming from the need to label each test workload I'm running on MySQL in a short form to quickly get an idea about which test conditions were applied during the run. And indeed, as many people you may have around, as many variations for such a "labeling" you'll have ;-))

For ex. for Sysbench OLTP read-write, with 8 tables each of 10M rows, uniform access, with prepared statement, using transactions -- it was pretty common to see a label like:

rw8x10upstr -- which is probably short enough, but not easy to decode ;-))

personally I've rather prefer to use something like this :

OLTP_RW-10Mx8tab-uniform-ps-trx

and if the same test was running using "pareto" access pattern, and without prepared statements nor transactions :

OLTP_RW-10Mx8tab-pareto-notrx

And so on.

Over a time I just realized that yet more helpful for me would be also to have dedicated shell scripts with exactly the same names which could then execute required test within exactly these "labeled" conditions ! -- which will always clearly point on what exactly test was executed, and also protect my test runs from execution with wrong / missed options / args, etc. -- there are so many different common mistakes still possible when people are using Sysbench to run their performance evaluation tests..

So, later, I've just finished by auto-generating a bunch of different shell scripts for all kind of test workloads I'm expecting to run. Which is then resulted in the given BMK-kit package.

NOTE : This kit and its workload scenarios are proposed "as is" and "as I prefer" them. In case you do not agree with something or prefer other approaches in testing -- feel free to implement yourself whatever you desire, but don't expect me to debate and defend any detailed point, etc.. I'm just doing things my way and sharing it here in case it can be useful for someone else as well. But I'm not pretending on absolute truth, and you're always free to do it your way, and this is where the beauty of life is coming.. ;-))

Setup BMK-kit

The test kit requires minimal configuration, but has several "configurable" places in its scripts. For example using /BMK directory by default as its main home place -- but can be easily changed via BMK_HOME env. variable, etc..

So, I'll use in all the following steps /BMK as default home, just mind that you can deploy the toolkit anywhere and just point BMK_HOME env. variable to your directory.

So far, to deploy and setup the stuff just do the following :

download the BMK-kit.tgz tarball from : http://dimitrik.free.fr/BMK-kit.tgz
deploy it anywhere, or from / directory as root user :

# cd /
# tar xzf /path/to/BMK-kit.tgz

this will create /BMK directory and deploy all the needed scripts inside
now you'll need to edit /BMK/.bench

by default the config part in your .bench file will look like the following :

#!/bin/bash                                                                                             
#===================================================================================                    
# Benchmark-kit config settings..                                                                       
#===================================================================================                    
  BMK_HOME=${BMK_HOME:=/BMK}                                                                            
  export BMK_HOME                                                                                       

#-----------------------------------------                                                              
# Linux/x64 (default) or Linux/arm64                                                                    
#                                                                                                       
  EXT="x64"                                                                                             
  ## EXT="arm64"                                                                                        

#-----------------------------------------                                                              
# MySQL connection                                                                                      
#                                                                                                       
  user=dim                                                                                              
  pass=dimitri                                                                                          
  host=127.0.0.1                                                                                        
  port=5400                                                                                             
  socket=/apps/mysql8/data/mysql.sock                                                                   
  mysql=/apps/mysql8/bin/mysql                                                                          
  db=sysbench                                                                                           
  TMP_DIR=/tmp                                                                                          

# max connections for single Sysbench-1.1 process                                                       
  sb11_MAX=20000

you just need to edit it and provide your connection details to your MySQL Server (user, password, host, IP port, path to UNIX socket and db name)
as well if you use Linux on arm64, you'll need to comment out EXT=x64 and uncomment EXT=arm64 in the .bench file
sb11_MAX value : this value in .bench file is added for cases when very large number of user connections in MySQL should be tested (more than 10K for ex.) -- because sometimes there could be something going odd on the system or on Sysbench itself, and impact all your tests, and periodically it could not be able to connect anymore any new users (or disconnect, or whatever). One of the workarounds to run such high load test workloads could be to start more Sysbench processes in parallel, and sb11_MAX is here to say how many users per single Sysbench process should be assigned (currently it's intentionally set by default to a very big value, so just a single Sysbench process will be used by default, but you can change it to a lower value (for ex. 1000) if you're hitting this kind of problems).

NOTE : the main goal of this ".bench" config file is only to help you to reduce the number of options you'll need to provide in Sysbench command line to execute any given test workload ! -- and you can always overwrite these options from the command line..

So far, at this step you're ready to start your first Benchmark Workload ! -- be sure your MySQL Server is running, create your database, and follow the next steps.

The main steps for any test workload in BMK-kit are all similar :

you "prepare" your database first (initial data load)
then just execute a test workload you wish, using "prepared" data

Every script name is "explicit" by itself and self-explaining what kind of test will be executed and with which options, etc. to reduce potential level of errors to the minimal..

Localhost & MySQL

In case you're not aware, MySQL has few historical config agreement which can be pretty confusing, but we have to keep them in the code due backward compatibility :

when you set localhost for host name => this means user connections go via UNIX socket
and using 127.0.0.1 for host name => user connections go via local host IP stack (loopback)

just always be sure what exactly type of connections you want to test, because communications via UNIX socket are going faster (sometimes much faster, up to 50% gain), but this can be totally unrealistic for your final results, if at the end you're planning to use TCP/IP connections. From the other side, using UNIX socket allowing you to skip IP stack and then probably better see other bottlenecks, etc.. -- all this is up to you, just be always aware about what you're doing :-))

NOTE : the full difference in performance when connections are going via UNIX socket -vs- IP loopback still remains unexplained, but seems to be pure SW problem in Linux kernel, as similar gaps in performance are observed on both Linux on x64 and ARM64 -- for ex. see here : http://dimitrik.free.fr/blog/posts/mysql-performance-80-ga-ip-port-vs-unix-socket-impact.html

Using SSL

By default the Sysbench binaries I'm shipping now with BMK-kit are now compiled with MySQL 8.0 client libs and OpenSSL-1.1.1L -- so, you don't need to add anything to .bench config to enable SSL on Sysbench side.

However, to require SSL usage during your test workload, you'll need to add --mysql-ssl=REQUIRED option into your command line:

# -- executing without SSL --
$ time bash /BMK/sb_exec/sb11-Prepare_10M_8tab-InnoDB.sh 32
$ bash /BMK/sb_exec/sb11-OLTP_RW_10M_8tab-uniform-trx.sh 128 120

# -- executing with SSL --
$ time bash /BMK/sb_exec/sb11-Prepare_10M_8tab-InnoDB.sh 32 --mysql-ssl=REQUIRED
$ bash /BMK/sb_exec/sb11-OLTP_RW_10M_8tab-uniform-trx.sh 128 120 --mysql-ssl=REQUIRED

and so on.. ;-))

Additionally to enforce SSL condition in your tests scenarios you may create your database users with require SSL option, for ex.:

mysql> drop user 'dim'@'%';

mysql> create user 'dim'@'%' identified WITH caching_sha2_password by 'xxxxx';
mysql> grant SELECT,INSERT,DELETE,UPDATE,CREATE,DROP,PROCESS,USAGE,INDEX on *.* to 'dim'@'%' ;
mysql> grant SHOW DATABASES on *.* to 'dim'@'%' ;

mysql> alter user 'dim'@'%' require SSL;  -- <= REQUIRE SSL (!)

Sysbench "Original" Workloads

First of all you'll need to load the data. I'll show here the most common tests scenarios, but you can explore further and choose whatever you like.

Sysbench "original" workload scenarios are pretty good "entry ticket" test cases to evaluate your MySQL instance capacity, and as well the system you're using, and finally your platform as the whole.

The data set with 8 tables having 10M rows each (10Mx8tab) is the most common for the start point as it's not too small (to not just run on CPU cache only) and not too big (to not involve too much IO activity when you're having writes). The dataset will represent something like 20GB of data, so using BP size of 32GB is more than enough to keep all data in memory.

Preparing 10Mx8tab data set

To prepare your data, you'll need to just execute the following :

$ time bash /BMK/sb_exec/sb11-Prepare_10M_8tab-InnoDB.sh 32

NOTE : 32 -- is the number of user connections to be used for parallel data load (I've modified original Sysbench scripts to make it possible, so in the above example there will be 4 user connections loading each table in parallel). You can increase this number if your system capacity is allowing and if load will go faster (you need to measure yourself to see which level of parallelism is the most optimal in your case). By default all "Prepare" scripts are using 32 user connections.

OLTP_RO | Point-Selects

Point-Select workload is the most simple one, it's doing pure "key => value" lookups which are testing the full stack of MySQL code for overall code efficiency and best possible query execution latency. Staying fully in memory, it's the most simple way to evaluate your system setup and its scalability limits. Only RAM access and CPU power will be involved, so generally if you're already observing some performance issues in this test workload -- most often is the used system / platform to blame ;-))

A simple example how to run the test :

$ for users in 1 2 4 8 16 32 64 128 256 512 1024 2048 4096
do
  bash /BMK/sb_exec/sb11-OLTP_RO_10M_8tab-uniform-ps-p_sel1-notrx.sh $users 300
  sleep 15
done

Comments :

in this example we execute point-select workload with growing load levels
starting with 1 user connections, we're going then with 2, 4, and so on up to 1024 concurrent users, where 300 is execution time (5min) for each level
sb11-OLTP_RO_10M_8tab-uniform-ps-p_sel1-notrx.sh is the script executing required test workload (and all other test scripts are auto-generated in the same manner)
for execution it requires 2 command line arguments : number of user connections to use (1st arg.) and execution time (2nd arg.) -- after what you can provide few more "specific" arguments (if required) -- see more about later..
few comments about the abbreviations used in the "name" of the script :
sb11 -- means script will be executed with sysbench-1.1 (sb11 value in .bench config)
OLTP_RO_10M_8tab -- Read-Only workload, 8 tables with 10M rows each
uniform -- using "uniform" access pattern
ps -- using prepared statements
p_sel1 -- using only 1 "point-select" query from OLTP_RO scenario
notrx -- don't use transactions

Similar "logic" is used in all other scripts for test workloads, so by the name of the script you can directly say what kind of load scenario will be generated (like if you use the same script, but without ps -- this will mean it'll execute the same workload, but without prepared statements, or if you'll have socket in the script name, you'll know that instead of IP port (default) it'll use UNIX socket instead, and so on)..

ex. the same test, but without prepared statements :

$ for users in 1 2 4 8 16 32 64 128 256 512 1024 2048 4096
do
  bash /BMK/sb_exec/sb11-OLTP_RO_10M_8tab-uniform-p_sel1-notrx.sh $users 300
  sleep 15
done

OLTP_RO

OLTP is a "mixed" Read-Only test workload and more "complex", as it consists of :

10 point-select queries
1 simple range query
1 ordered range query
1 sum range query
1 distinct range query

e.g. 14 queries in total. And if point-selects are mostly not "sensitive" to wrong configurations as long as you used big enough BP size to keep data in memory (32GB), all other queries are not so ;-))

For example :

latin1 or UTF8 charsets : there is a significant difference if you're using latin1 or UTF8 charset for your data (the overhead of UTF8 before MySQL 8.0 was much more bigger -- see the following articles for more details : http://dimitrik.free.fr/blog/posts/mysql-performance-80-and-utf8-impact.html and http://dimitrik.free.fr/blog/posts/mysql-performance-80-ga-more-in-depth-latin1-utf8mb4.html)
but additionally if you use UTF8 (default since MySQL 8.0) and your sort_buffer will not be configured big enough, you'll get horrible experience ;-)) (256KB value is ok for the current example)
you need to pay attention to malloc library you're using (my current choice is tcmalloc-4.4.5 which is coming with OL7 repo)
NOTE : distinct range query for ex. can be very aggressive on memory allocations, as well on Temp table space usage, take care ;-))

A simple example now how to run OLTP_RO test :

$ for users in 1 2 4 8 16 32 64 128 256 512 1024
do
  bash /BMK/sb_exec/sb11-OLTP_RO_10M_8tab-uniform-notrx.sh $users 300
  sleep 15
done

as you can see, the only difference comparing to "point-select" execution is missed p_sel1 in the name of the script ;-))

NOTE : Read-Only workloads are generally used without transactions and the results are reported as QPS (query/sec). If for some reasons you need to use transactions when investigating this workload performance, mind that begin and commit are reported by Sysbench as queries, and you need to discard them from final QPS number by yourself to not report "fake" results ;-))

Sometimes you may need to execute only one of Ranges Queries to investigate one or another given test case alone, so here are examples for each one:

OLTP_RO | Simple Ranges

$ for users in 1 2 4 8 16 32 64 128 256 512 1024
do
  bash /BMK/sb_exec/sb11-OLTP_RO_10M_8tab-uniform-s_ranges1-Rsize100-notrx.sh $users 300
  sleep 15
done

OLTP_RO | Ordered Ranges

$ for users in 1 2 4 8 16 32 64 128 256 512 1024
do
  bash /BMK/sb_exec/sb11-OLTP_RO_10M_8tab-uniform-ord_ranges1-Rsize100-notrx.sh $users 300
  sleep 15
done

OLTP_RO | SUM Ranges

$ for users in 1 2 4 8 16 32 64 128 256 512 1024
do
  bash /BMK/sb_exec/sb11-OLTP_RO_10M_8tab-uniform-sum_ranges1-Rsize100-notrx.sh $users 300
  sleep 15
done

OLTP_RO | Distinct Ranges

$ for users in 1 2 4 8 16 32 64 128 256 512 1024
do
  bash /BMK/sb_exec/sb11-OLTP_RO_10M_8tab-uniform-dst_ranges1-Rsize100-notrx.sh $users 300
  sleep 15
done

OLTP_RW

OLTP Read+Write is using transactions, and within the same transaction it's executing all Read-Only queries from OLTP_RO, and then 2 UPDATEs, 1 INSERT, and 1 DELETE queries. Many MySQL configuration / tuning options will come in the game here, but first of all -- the Storage you're using on your system !

A simple example now how to run OLTP_RW test :

$ for users in 1 2 4 8 16 32 64 128 256 512 1024
do
  bash /BMK/sb_exec/sb11-OLTP_RW_10M_8tab-uniform-trx.sh $users 300
  sleep 15
done

Non-Index-UPDATE (aka UPDATE-NoKEY)

If Point-Select workload is the most aggressive Read-Only workload, the UPDATE-NoKEY test is the same for Write-Only :

it's not changing any index data (NoKEY / no index)
so, all UPDATEs of data are happening in-place
and this is a very simple and very short query
mostly only full stack of transaction management is tested for its efficiency
the things were improved a lot since MySQL 8.0
but we're still NOT scaling on this workload due present trx_sys contention (work in progress)
so, you should not be surprised to see better results on 1CPU Socket -vs- 2CPU Sockets for ex.
this workload is generally executed without transactions
e.g. pure UPDATEs "bombarding"

A simple example how to run UPDATE-NoKEY test :

$ for users in 1 2 4 8 16 32 64 128 256 512 1024
do
  bash /BMK/sb_exec/sb11-OLTP_RW_10M_8tab-uniform-upd_noidx1-notrx.sh $users 300
  sleep 15
done

and yes, the main difference with OLTP_RW script is this presence of upd_noidx1 in the script name, saying to executed only "update no-index" query from OLTP_RW scenario..

Index-UPDATE (aka UPDATE-KEY)

And if you will replace in above script name upd_noidx1 by upd_idx1 -- you'll be running UPDATE-KEY (update index) workload. Which is no more "in-place" UPDATE as it'll touch secondary index values, and involve yet more internal locks and contentions. So, in some cases it can be event yet more aggressive than Non-Index-UPDATE ;-))

A simple example how to run UPDATE-KEY test :

$ for users in 1 2 4 8 16 32 64 128 256 512 1024
do
  bash /BMK/sb_exec/sb11-OLTP_RW_10M_8tab-uniform-upd_idx1-notrx.sh $users 300
  sleep 15
done

Write-Only

Also, if you replace upd_noidx1 by upd_insdel1 => this will give you the script name for write-only workload ;-)) -- but just mind that for mixed Writes you'll absolutely need to use transactions !! -- otherwise you'll get data conflicts and errors ! -- thus notrx in the previous scripts is going to change to trx like here:

A simple example how to run Write-Only test :

$ for users in 1 2 4 8 16 32 64 128 256 512 1024
do
  bash /BMK/sb_exec/sb11-OLTP_RW_10M_8tab-uniform-upd_insdel1-trx.sh $users 300
  sleep 15
done

NOTE : in write-only you only have 2 UPDATEs (index and not-index) and 1 DELETE + 1 INSERT, that's why this upd_insdel1 codename, but at least it's explicit, and by script name we can directly say which exactly test was executed..

50Mx8tab data set

So far, 10M x 8 tables data set is pretty good for quick (generally "in-memory") MySQL / system evaluations. But to go more far and test it all deeper, you can switch to 50M rows pet table (still with 8 tables), which sill result in over 100GB data space and allow you to test :

in-memory workload with 128GB BP size (check if QPS / TPS will still be the same as with 10M rows before, and if it's not so => try to explain "why" ;-))
partially IO-bound with 64GB BP size
and heavily IO-bound with 32GB BP size (and will also fully evaluate your storage performance)

To execute any similar test as before, but now with 50M rows tables -- just change 10M to 50M in script names from above examples, and you're done ;-))

NOTE : same by using 500M in the script names you'll be using 1TB of data in all your tests, which is quickly becomes more challenging, specially if your system is limited in RAM.

Data Access Pattern

Sysbench in parallel with "uniform" access also allowing "pareto" and few other options for access pattern. Historically the "uniform" is NOT the default in Sysbench, which over time resulted in many fake results obtained and published by various vendors..

For me the most interesting and worth for using are the following 3 ones :

uniform -- full random and uniform access, wide bombarding, hard life in IO-bound
pareto -- random, but "grouped" access, going around not far from the same rows, creating concurrent access to the same rows, which involves row locks and scalability limits
zipfian -- some rows are accessed more often than others, but in a more wider way, so generally not creating row locking during execution

At the end of this article you can see the benchmark results obtained with all available data access options in Sysbench within different test conditions (8 tables -vs- single table, in-memory -vs- IO-bound, etc.) -- I hope from the results it'll be more clear why I selected only these 3 options above to be used.

New Extensions For "Original" Sysbench Workloads

I've added few more extensions to "original" Sysbench scripts :

--table-name=name : base name to use for table names (def: sbtest)

--mysql-table-partitions=N : use N range(id) partitions for each table (def: 0)

--mysql-table-compression=name : extra table transparent compression option, ex.: lz4

--mysql-index-options=... : extra index options, ex.: COMMENT 'MERGE_THRESHOLD=40'

--mysql-index-type=... : use the given index type, BTREE or HASH (def: BTREE)

--rnd-data=N : percentage (%) of random data in row values : 1 .. 100 (to be able to evaluate higher/lower compression possibilities, def: 100)

--extra-cols=N : add N extra columns to table(s)

--extra-cols-type=TYPE : data type to use for extra columns (ex. VARCHAR(32))

--extra-cols-default=value : default value to use for extra columns (you can also use # and @ symbols in default values to randomize data (each slash symbol will be replaced by random number from 0-9 range, and each @ by random a-z letter) -- this is also working for BLOB/TEXT and any other data types (including INT, etc.))

--extra-cols-options=... : extra columns options

--select-cols="..." option to allow replace "default" c column in SELECT queries for point-select and simple/ordered range queries by something else:

this can be a list of other columns like --select-cols="id,k"
this can be simply a "star" as --select-cols="*" (to get SELECT * from ...)
this can be JSON object like --select-cols="JSON_OBJECT(id,k,c,pad)"
this can be JSON array like --select-cols="JSON_ARRAY(id,k,c,pad)"
etc..

--rnd-loop=N : use random-loop instead of rand() : 0=def, 1=chunk, 2=shift, 3=loop (def: 0)

--update-range-size=N : allow range UPDATEs (instead of single row UPDATE by default)

--load-mode=name : initial data load mode [original, parallel, parallel_ordered] (def: parallel)

--load-bulk-size=N : number of rows to use in bulk INSERTs during initial data (def: 1000)

--inserts-only=N : number of INSERTs queries in transaction in insert only and write only workloads (def: 1)

--trx-retry=on/off : replay the same transaction again in case of error/ ROLLBACK/ DEADLOCK (def: off)

--mysql-session-options=... : extra session options

ex.: --mysql-session-options="set session sort_buffer_size=1000; set session join_buffer_size=256000
in fact there can be any kind of SQL statement(s) separated by ; which will be executed on the start of each user connection

--mysql-query-hint=... : use the given Query Hint in all queries

ex. : --mysql-query-hint="MAX_EXECUTION_TIME(100)"

--sleep-before-commit=N extra option for more advanced test scenarios to simulate longer data locking by sleeping N usec. before COMMIT (def: 0)

--sleep-after-query=N extra option for more advanced test scenarios to simulate longer user's "think time" by sleeping N usec. after each query (def: 0)

--extra-query-before="..." execute extra query before each event/transaction

--extra-query-after="..." execute extra query after each event/transaction

example: --extra-query-before="select WAIT_FOR_EXECUTED_GTID_SET( @@global.gtid_executed );"

--by-secidx=true/false (def: false) to allow WHERE criteria use secondary index k column instead of default primary key id column.

You can add any of these options as extended arguments to the scripts, for ex. :

$ time bash /BMK/sb_exec/sb11-Prepare_10M_8tab-InnoDB.sh 32 \
  --rnd-data=50 --mysql_table_compression=lz4

$ for users in 1 2 4 8 16 32 64 128 256 512 1024
do
  bash /BMK/sb_exec/sb11-OLTP_RW_10M_8tab-uniform-ps-trx.sh $users 300 \
  --rnd-data=50 --mysql-table-compression=lz4
  sleep 15
done

New Extended Workloads

Few more new test workloads were also added to cover more various test cases, some became possible directly due added new extended options.

OLTP_RW+

OLTP_RW+ is just a logical code-name for "more write-aggressive" OLTP_RW -- in fact it's doing exactly the same as OLTP_RW, except "range" SELECTs are missed, but 10 point-selects remain (e.g. executing less SELECTs to lower ratio between Reads:Writes). The script name reflects this by "p_sel10" prefix.

A simple example now how to run OLTP_RW+ test :

$ for users in 1 2 4 8 16 32 64 128 256 512 1024
do
  bash /BMK/sb_exec/sb11-OLTP_RW_10M_8tab-uniform-p_sel10-trx.sh $users 300
  sleep 15
done

OLTP_RW++

OLTP_RW+ is a logical code-name for "yet more write-aggressive" OLTP_RW -- it's doing exactly the same as OLTP_RW+, but with only 1 point-select for SELECTs. (e.g. we enforce yet more Write activity, but still doing 1 SELECT to not transform it to write-only). The script name reflects this by "p_sel1" prefix.

A simple example now how to run OLTP_RW++ test :

$ for users in 1 2 4 8 16 32 64 128 256 512 1024
do
  bash /BMK/sb_exec/sb11-OLTP_RW_10M_8tab-uniform-p_sel1-trx.sh $users 300
  sleep 15
done

UPDATE-ranges

With new --update-range-size=NNN we can now transform a single UPDATE query to update NNN rows in a single shot, instead of one row as it's designed in sysbench by default. So, the "index-updates" with ranges=100 can be executed as the following :

index-update-ranges

for users in 1 2 4 8 16 32 64 128 256 512 1024
do
  bash /BMK/sb_exec/sb11-OLTP_RW_10M_8tab-uniform-upd_idx1-upd_range100-notrx.sh $users 300
done

for "non-index-updates" ranges you just need to switch to "upd_noidx1" script :

non-index-update-ranges

for users in 1 2 4 8 16 32 64 128 256 512 1024
do
  bash /BMK/sb_exec/sb11-OLTP_RW_10M_8tab-uniform-upd_noidx1-upd_range100-notrx.sh $users 300
done

OLTP_RO over Secondary Index

The new --by-secidx=true/false option transforms any OLTP_RO test from PK-lookup (by PK column "id") into Secondary-Index lookups (by SecIDX column "k"). For example the standard OLTP_RO with SecIDX lookups will look like :

$ for users in 1 2 4 8 16 32 64 128 256 512 1024
do
  bash /BMK/sb_exec/sb11-OLTP_RO_10M_8tab-uniform-SecIDX-notrx.sh $users 300
  sleep 15
done

or simple-ranges over SecIDX :

OLTP_RO | Simple Ranges over Secondary Index

$ for users in 1 2 4 8 16 32 64 128 256 512 1024
do
  bash /BMK/sb_exec/sb11-OLTP_RO_10M_8tab-uniform-s_ranges1-Rsize100-SecIDX-notrx.sh $users 300
  sleep 15
done

and so on for other OLTP_RO workloads..

Re-Connect Workloads

Historically there was always a need to test OLTP workloads which are constantly re-connecting to MySQL Server (like Web servers executing PHP queries and disconnecting once the page is generated, and so on). I the past the real need for such evaluation was mostly to know how many re-connects/sec we can deliver with MySQL -- e.g. the idea is to :

connect
execute a quick SELECT
disconnect

And for this kind of test scenarios we were using Point-Select workload with re-connect after each SELECT query. To cover this need, I've implemented and added explicit Lua script doing Point-Selects with re-connects and corresponding Shell scripts (may add more reconnect workloads later, let's see..)

For ex. to start Point-Selects re-connect on 10Mx8tab data set :

$ for users in 1 2 4 8 16 32 64 128 256 512 1024
do
  bash /BMK/sb_exec/sb11-OLTP_RO_10M_8tab-uniform-p_sel1-reconnect-notrx.sh $users 300
  sleep 15
done

IMPORTANT

it makes no sense to use prepared statements on re-connect workloads, this option is ignored
mind to allow TCP reuse on your "client" host to not run out of sockets :

$ sudo sysctl net.ipv4.tcp_tw_reuse=1

TPCC

TPCC workload was ported by Percona Team to Sysbench-1.1 (Lua scrips) which is simplified a lot any further investigations on this workload and hacking the code to check various test conditions. As the test is executed by Sysbench, the use steps are exactly the same as with other Sysbench scripts (and way more simple than historically used DBT2 open source implementation of TPCC, etc.)

Finally, this is allowed to propose a "workaround" to by-pass excessive index lock contentions involved in "original" TPCC implementation and make it also part of the test cases -- see this article for more details : http://dimitrik.free.fr/blog/posts/mysql-80-tpcc-mystery.html

So far, you can now use both variations -- using NULL or DEFAULT value in queries.

NOTE : to use NULL variation you need to explicitly use scripts having "-NULL" in its name, which is not by default..

Preparing TPCC data set

The "size" of data sets in TPCC is "measured" in number of "data-ware-houses" (DWH) created in database. This number is labeled as W (e.g. 1000W means thousand DWH, and so on)..

Historically, MySQL was not scaling really well on TPCC workload, and to lower internal MySQL contentions, but still run TPCC workload, it was possible to run several TPCC tests in parallel against the same MySQL Server instance (this was at least allowing to see if "generally" MySQL will be ever able to keep higher load once internal contentions will be fixed). The same approach was implemented in Sysbench-TPCC scripts, but in a yet more simple way -- you can simply use more data sets in parallel from the same Sysbench process (like if you have several TPCC databases in the same MySQL instance, and TPCC will use them all in parallel).

The most common use cases are to use 1 or 10 TPCC data sets.

To prepare your data for a single TPCC data set with 1000W, just execute the following :

$ time bash /BMK/sb_exec/sb11-Prepare_TPCC_1000W-InnoDB-NoFK.sh 32

And for x10 TPCC data sets with 100W (to have the same 1000W data volume in total), execute the following :

$ time bash /BMK/sb_exec/sb11-Prepare_TPCC_10x100W-InnoDB-NoFK.sh 32

Few comments :

NoFK -- historically most common use case for TPCC workload on MySQL was "without foreign keys" (FK), you can skip it in script name to test "with FK"
NoFK2 -- same as NoFK, but also not creating secondary indexes which are supposed to be used by FK
NULL -- (when present in script name) means to use NULL values (as original TPCC), otherwise use DEFAULT values in db schema and queries, which is lowering index lock contentions and allowing better overall scalability (can be easily adapted for any production workload, etc.)

for ex. if you really willing to use NULL values :

$ time bash /BMK/sb_exec/sb11-Prepare_TPCC_1000W-NULL-InnoDB-NoFK.sh 32

NOTE : generally it makes NO SENSE to test TPCC workload with data set smaller than 1000W in total ! -- adopt from the beginning your testing with 1000W (or more) data to not waste your time and base your conclusions on useless results.. -- However, if your target is to reproduce a test workload with high level of "data concurrency" (e.g. many users will be fighting for access to the same data), then using smaller number of warehouses makes perfectly sense ! -- e.g. with 100W data set, for ex., you'll get exactly this ! -- so, the main message here is probably "always understand what you're doing" and then everything should be just fine ;-))

Run TPCC Workload

Finally the most simple step in the whole story :

$ for users in 1 2 4 8 16 32 64 128 256 512 1024
do
  bash /BMK/sb_exec/sb11-TPCC_1000W.sh $users 300
  sleep 15
done

or for 10x100W data set :

$ for users in 1 2 4 8 16 32 64 128 256 512 1024
do
  bash /BMK/sb_exec/sb11-TPCC_10x100W.sh $users 300
  sleep 15
done

NOTE : just mind to respect the same "NULL or not to NULL" naming as you used in your "Prepare" script initially, and also the data set size, and you're then all good ;-))

New Extensions For "Original" TPCC Workload

I've added few more extensions to "original" TPCC scripts :

--trx-retry=on/off : retry the same transaction again on ROLLBACK / error (def: off)

--trx-debug=N : debug mode - execute only one from 5 transactions (--trx-debug=[1-5]), (def: 0 (off))

--use-fk=N : use foreign keys -- 0:no but sec.idx, 1:yes+sec.idx, 2:none (def:1)

--for-update=N : use FOR UPDATE -- 0:no, 1:yes, 2:enforced (def:1)

--force-primary=0/1 : force PRIMARY INDEX in some queries -- 0:no, 1:yes (def:0)

--mysql-table-compression=name : extra table transparent compression option, ex.: 'lz4'

--mysql-table-partitions=N : use N range(wid) partitions for each table (def: 0)

--mysql-index-options=... : extra index options, ex.: COMMENT 'MERGE_THRESHOLD=40'

--mysql-index-type=... : use the given index type, BTREE or HASH (def: BTREE)

--mysql-session-options=... : extra session options * ex.: --mysql-session-options="set session sort_buffer_size=1000; set session join_buffer_size=256000 * in fact there can be any kind of SQL statement(s) separated by ; which will be executed on the start of each user connection

Same as for "original" Sysbench workloads, you can add any of these options as extended parameter for test scenario scripts.

dbSTRESS

This benchmark test case has a long history and based on a true customer's workload. Surprisingly, in the past, it helped to point on many deep problems in MySQL / InnoDB design. However, over a time systems and MySQL itself became much faster, and the old original implementation started to be too slow and inefficient. So, once Sysbench-1.1 appeared, it was "natural" for me to plan to port dbSTRESS workload to Sysbench. Now it's done ! ;-))

Database schema in db_STRESS is composed only from 5 tables and simulating sort of library (or stock) object(s) placement :

STAT
HISTORY
OBJECT
SECTION
ZONE

And there are the following relations ("by reference IDs" ) between these tables:

[STAT] <--(1:M)-- [HISTORY] --(20:1)--> [OBJECT] --(N:1)--> [SECTION] --(P:1)--> [ZONE]

For each OBJECT record there are 20 records of HISTORY (kind of historical trace of the last 20 changes). While the number of records in STAT, SECTION and ZONE tables is fixed:

STAT: 1000 rows
SECTION: 100 rows
ZONE: 10 rows

STAT records are used as a "tag" (or status) for HISTORY records. Each HISTORY record may have different "tag" than others, and any HISTORY record can change its "tag" over a time.

Then, SECTION with ZONE records are defining OBJECT "placement" -- there are 10 Zones, and each Zone has 10 Sections, so 100 Sections in total. And any given OBJECT belongs to one of these Sections. None of OBJECT records can change its "placement".

Considering the fixed sizes of STAT, SECTION and ZONE tables, and fixed 1:20 relation between OBJECTs and HISTORY -- it's obvious that the database size is driven by the number of OBJECTs you'll create.

All relations between records from different tables are managed by corresponding "reference IDs" from each side (generally referencing "parent" PRIMARY KEY). Historically, to simplify RDBMS comparisons, no foreign key or other constrains were used. For the moment I'm leaving it as it, but over a time probably will add an option to use FK (on/off), let's see..

Similar to other Sysbench workloads you can also create several dbSTRESS "instances" to be able to "split" locks over several OBJECT tables and corresponding to them others.

Preparing Your Dataset

The logic here is pretty similar as with other workloads. For example to prepare a dataset with 10M OBJECTs you can just execute the following :

$ time bash /BMK/sb_exec/sb11-Prepare_dbSTRESS_10M-InnoDB.sh 32

Or to have 10 OBJECT tables with 1M rows each :

$ time bash /BMK/sb_exec/sb11-Prepare_dbSTRESS_10x1M-InnoDB.sh 32

Going little bit ahead, I need to mention you may hit pretty different bottlenecks during dbSTRESS workload if you're using INDEX in HISTORY tables is created as PRIMARY KEY or not (but default it's not, like it can be in many apps) -- "why" there is such an impact I let you yet to discover, but for the testing with schema having PK in HISTORY you can just use scripts having "-HPK" in the name, like :

$ time bash /BMK/sb_exec/sb11-Prepare_dbSTRESS_10M-InnoDB-HPK.sh 32

Test Scenarios

db_STRESS generates OLTP workload to stress database as much as possible. Performance level of workload is measured mainly in TPS (transactions per second), but you have also a freedom to have more or less queries per "transaction", that's why any result should be present with its scenario context to make sense (same for QPS). During any given transaction we are first randomly choosing OBJECT reference and then performing READ or READ and WRITE operations :

READ operation consists of 2 SELECT queries :

SEL1: read related OBJECT --> SECTION --> ZONE data by given OBJECT ID
SEL2: read related HISTORY --> STAT data by given OBJECT ID

WRITE operations consists of :

delete one HISTORY record by OBJECT ID + insert new HISTORY record for the given OBJECT
update one HISTORY record of the given OBJECT

NOTE : delete & insert orders are always present together if executed in WRITE operation. This is done to protect your database from "intentional" constant growing in the size. However, if in your case during dbSTRESS workload you'll have InnoDB Purge lagging, you can easily hit "unintentional" database size growing due increase of UNDO space.

Additionally you can precise the ratio between dbSTRESS transactions having only READ operation, or both READ & WRITE -- this is allowing you to make the test case more or less aggressive on data writes.

By default the following test scenarios are proposed :

sb11-dbSTRESS_10M-RO-notrx.sh : read-only with SEL1 and SEL2 queries
sb11-dbSTRESS_10M-RO-SEL1-notrx.sh : read-only with SEL1 query
sb11-dbSTRESS_10M-RO-SEL2-notrx.sh : read-only with SEL2 query
sb11-dbSTRESS_10M-RW1-trx.sh : read+write, R/W ratio 1:1
sb11-dbSTRESS_10M-RW10-trx.sh : read+write, R/W ratio 10:1
sb11-dbSTRESS_10M-RW1-UPD-trx.sh : read+write, but write = UPDATE only, R/W ratio 1:1
sb11-dbSTRESS_10M-RW10-UPD-trx.sh : read+write, but write = UPDATE only, R/W ratio 10:1

Comments :

hope it was obvious that 10M in the script name means 10M OBJECT records
RO : read-only, and if added SEL1 or SEL2 then it's with inly given SELECT query
trx or notrx : about use or don't use transactions (BEGIN/COMMIT)
RW means read+write followed by R/W ratio (RW1 = 1:1, RW10 = 10:1, and so on)
and if there is UPD i name, then only execute UPDATE query in WRITE operation

And to make it more complex :

xDEAD : don't use "anti-deadlock" limits in ID ranges
xREF : allow READ/WRITE operations to use different OBJECT IDs within the same transaction

Example of dbSTRESS Workload

Preparing 10M dataset :

$ time bash /BMK/sb_exec/sb11-Prepare_dbSTRESS_10M-InnoDB.sh 32

Running Read-Only workloads :

$ for users in 1 2 4 8 16 32 64 128 256 512 1024 2048
do
  bash /BMK/sb_exec/sb11-dbSTRESS_10M-RO-notrx.sh $users 300
  sleep 15
done

$ for users in 1 2 4 8 16 32 64 128 256 512 1024 2048
do
  bash /BMK/sb_exec/sb11-dbSTRESS_10M-RO-SEL1-notrx.sh $users 300
  sleep 15
done

$ for users in 1 2 4 8 16 32 64 128 256 512 1024 2048
do
  bash /BMK/sb_exec/sb11-dbSTRESS_10M-RO-SEL2-notrx.sh $users 300
  sleep 15
done

Running Read-Write workloads :

$ for users in 1 2 4 8 16 32 64 128 256 512 1024 2048
do
  bash /BMK/sb_exec/sb11-dbSTRESS_10M-RW1-trx.sh $users 300
  sleep 15
done

$ for users in 1 2 4 8 16 32 64 128 256 512 1024 2048
do
  bash /BMK/sb_exec/sb11-dbSTRESS_10M-RW10-trx.sh $users 300
  sleep 15
done

Full dbSTRESS options for Sysbench

There are also other options which you can add when you're using the above scripts, or executing dbSTRESS scripts directly with Sysbench :

--obj-table-size=num : number of rows in OBJECT table (def:1000000)
--obj-tables=num : number of OBJECT tables (def:1)
--mysql-storage-engine=name : Storage Engine to use for tables (def:InnoDB)
--mysql-table-options=ops : extra table options, ex.: 'organization=heap'
--ordered-load=on/off : initial Load of data is ordered by PK (def:off)
--SEL1=num : number of SEL1 queries per transaction (def:1)
--SEL2=num : number of SEL2 queries per transaction (def:1)
--updates=num : number of UPDATE queries per transaction (def:1)
--delete-inserts=num : number of DELETE/INSERT combination per transaction (def:1)
--rw-ratio=num : ratio between RO and RW transactions (def:1)
--hpk=on/off : use PRIMARY KEY for HISTORY table (def:off)
--anti-dead=on/off : anti-deadlock, each session is using its own REF range (def:on)
--same-ref=on/off : use the same OBJECT REF value within the same transaction (def:on)
--skip-trx=on/off : execute all queries in the AUTOCOMMIT mode (def:off)
--for-update=on/off : use FOR UPDATE in all SELECTs (def: off)
--mysql-table-compression=name : extra table transparent compression option, ex.: 'lz4'
--mysql-session-options=... : extra session options
ex.: --mysql-session-options="set session sort_buffer_size=1000; set session join_buffer_size=256000
in fact there can be any kind of SQL statement(s) separated by ; which will be executed on the start of each user connection

So, for example, if I want to force my default Read-Write test to also use FOR UPDATE in all SELECT queries, I can just add --for-update=on to the args :

$ for users in 1 2 4 8 16 32 64 128 256 512 1024 2048
do
  bash /BMK/sb_exec/sb11-dbSTRESS_10M-RW1-trx.sh $users 300 --for-update=on
  sleep 15
done

SYNC_file Module

The SYNC_file.lua module was added in BMK-kit to provide a simple way to synchronize the start from many threads, or many Sysbench processes, or many remote client hosts -- all expected to start the given test on the same time.

This module is integrated to all above test workloads and works as the following :

for any test script you can provide additional option --sync-file=filename
NOTE : this "filename" should have full path
if this option is used, the test script will create the given file once all Sysbench threads are finished their initialization and all connected to MySQL Server
after what, all Sysbench threads will spin in wait while the given sync-file is not removed.. -- once removed, all the threads are released to run !
by default, threads are looping with 10ms sleeps on the file check
however, on small systems with many threads started can create high CPU usage
but you can change the default spin time via --sync-wait=N (N milliseconds)

Let's get a look on this by example with 512 users TPCC-1000W workload using /tmp/start file for synchronization and 20ms spin loops :

$ bash /BMK/sb_exec/sb11-TPCC_1000W.sh 512 300 --sync-file=/tmp/start --sync-wait=20
...
=> SYNC-file : synchronization via /tmp/start
=> SYNC-file : ready to start..

first you'll see the confirmation message that your sync-file is considered and used
and once all 512 threads finished their initialization and all are connected to MySQL Server -- you'll get a message "ready to start"
at this point all Sysbench threads are just looping in wait as long as sync-file is not removed

And once the sync-file is gone, the test will be started, and you'll see the following confirmation message:

=> SYNC-file : started !

General Notes About Test Workloads

There are few important points worth to mention, as some of them could be in disagreement with different people, while some others are simply important to keep in mind to avoid to considering "false positive" or simply "fake" results..

Single Table -vs- Many Tables

There are as many opinions on the topic as many people you'll discuss about.. -- there are even voices loudly crying that TPCC workload (just for ex.) is "obsolete" as it's not using hundreds of tables as most of production systems would have it today.. And you already can see many people running Sysbench with hundreds of small tables and so on..

Why all such claims makes no sense for me ?

first of all you need always keep in mind "why" you're running you Benchmark Workload
if your goal is to reproduce Production Scenario => focus your efforts on this, but reproduce it exactly to match your Production
but if you're running "generic" Workloads with already prepared Scenarios, then try to understand each test case first
because every "generic" Workload is here to point you on some potential problems, or simply to help you to evaluate your DB Engine, its config setup, and whole OS/HW stack you're using
the goal of any Single Table Workload is to show you what will happen if one of your tables in your Production environment will become "hot" and get most of the concurrent access by all active users
while Multi Tables Workloads are here to show you how overall processing will change if by some way you could "dispatch" your users to several tables instead (or force them to use different partitions of the same table, and so on)..
however, not all test scenarios used for Benchmarking are really respecting such a "split" of activity with Multi Tables Workloads -- pretty often the choice of table to use can be totally random, which on its turn will periodically "randomly transform" your Workload into Single Table one, then Multi Table, etc.. -- to avoid such kind of "surprises", in all my scripts all users are explicitly "attributed" to given table(s) (e.g. if you run 64 users on 8 tables OLTP_RW workload you'll have 8 groups of users, where each group is using its own table only)

IO-bound Workloads

Curiously, it's very easy to get "fake" or "false positive" results on IO-bound workloads. You should be extremely attentive to what you're doing and to what is going on. And first of all always mind to test all load levels, and not just one of them (like Percona guys are doing pretty often, etc.)..

IO-Bound on Single-Table :

the main biggest problem for Single-Table on IO-Bound will be historical File IO lock
you can observe this as fil_shard mutex contention in MySQL 8.0 and fil_sys contention in earlier versions..
since MySQL 8.0 this historical global lock was "sharded" (so, you'll have small chances to observe this contention in Multi-Table IO-Bound)
however, in earlier versions it's just a global lock involved for every File IO operation, which can quickly hurt you and having faster storage could not help..

NOTE : the same problem can still happen in MySQL 8.0 on Multi-Table IO-Bound if you'll place all your tables to the same file (tablespace) -- this is the reason to use dedicated files for all your tables which are having high IO activity !

IO-Bound on Multi-Tables :

the biggest problem with Multi-Table IO-Bound workloads is to get fake results..
how can you recognize the result is "fake" ?
first of all attentively look on your IO traffic ! -- if you see your TPS going up, but NOT your IO activity, there are huge chances your result is totally fake..
why such can happen ?
it's very easy on Multi-Table workload to get one of the tables to be better "cached" in InnoDB Buffer Pool, so all queries to this table will go from memory and executed way faster than any other ones, which will false the "global picture" giving you impression of improved performance, while it'll be not so at all ;-))
to check if performance was really improved, try to replay the same workload with the same global data volume, but on Single-Table -- this will give you the final verdict
as well mind to test many concurrent users load levels to build a more complete full picture (e.g. 1, 2, 4, .. 1024 users)

A "classic" example of "fake" IO-Bound OLTP_RW results you can see here :

you can see a great "improvement" on 128 users here with a huge TPS jump !
however, this TPS result is simple "fake"
and you can clearly see that during this 128 users load level the IO activity is showing exactly opposite ;-))
as well it's good to test all other users load level as it was done here, as you can also more clearly see that TPS result on 128 users is not matching "overall tendency"..

Since I got few more questions about "fake" IO-bound results :

first of all -- keep in mind that the data access pattern is "uniform"
and data set is much more bigger than BP size
which means that there is a kind of very high probability that most of page access will involve IO read
on the same time this is OLTP_RW workload (e.g. we have writes)
which means that we'll quickly have many dirty pages
so, before to be able to read a new page we'll need to find a free page in BP
and to get a free page we need to evict one from LRU, which will quickly means to flush page first before to evict
e.g. this will be all about IO activity
and the main target here is "IO-bound correctness" in the test case (like if we want to evaluate a new Storage vendor, or some special FS features, etc.)
if "correctness" cannot be confirmed -- the result is "fake" (or "false positive" if you like ;-))

Now :

we can see that at some point TPS is going much higher, but IO activity becomes lower..
how ever this could be possible if we're expecting to have a heavy IO-bound test case ?
because if we're processing with a higher TPS rate, then we have yet more dirty pages, right ?
and again, the access is "uniform", e.g. probability to hit the same data is low !!
we could suspect that just on higher load levels we have more users, so "potentially" some already cached data will be reused, and TPS will become higher, why not ?
ok, but still more changes => more IO writes and at least similar IO reads, but it's not so..
then, why on next load level TPS is going down ?? with more users "probability" to use cached data should be only higher, so what ?
in fact the main reason in this TPS gain is the following :
we're using 8 tables, and in some point of time one of the tables becomes more cached in BP than others
so, users which are working with this table will get higher rate of COMMITs
thus the global TPS will be higher
while users working with less cached tables will get their TPS lower
and this was easy to observe in the past when Sysbench could work only with 1 table, so to work with 8 tables we needed to start 8 Sysbench processes in parallel
all in all, IO-bound "correctness" is not respected here, and the given TPS result is "fake" and should be ignored if the goal is to evaluate Storage of IO related / FS feature..

To Summarize :

to avoid "fake positive" results in IO-bound Sysbench OLTP workloads -- use test scenario with 1 table only
same reason to use TPCC workload with a single dataset (TPCC_5000W is about 500GB data, or TPCC_10000W for 1TB)

IP port -vs- UNIX socket

The results obtained on any test workload are generally always better if UNIX socket is used instead of IP port (even it's "loopback" interface for localhost).. The same is valid for Linux on both Intel/AMD/x64 and ARM64 systems. And the gap in result may go to more than 30% difference ! -- see for ex. here : http://dimitrik.free.fr/blog/posts/mysql-performance-80-ga-ip-port-vs-unix-socket-impact.html

Currently test scripts with "socket" in names are generated only for "original" Sysbench workloads, as they are representing the most aggressive tests. I did not see any need to have it for TPCC or dbSTRESS, but this can be changed if these workloads will also see some speed-up..

NOTE : using "loopback" or UNIX socket makes sense only if you're testing your MySQL Server "locally" on the same system, but is your productions is also running "locally" ? -- and if not, then mind also to test the same, but over a real Network to be sure it'll not be your major bottleneck yet before MySQL.

When use Uniform / Pareto / Zipfian access pattern in Sysbench

Once again, always mind what exactly do you want to test, and then use the "random" option accordingly :

Uniform : totally random access of data, so no concurrent access to the same data to expect

good for any general evaluation of your overall setup config and HW you're using
also very good for IO-bound Workload simulations, as even with not much big dataset can give you some very representative results to evaluate your Storage, Filesystem, etc..
as such access is mostly "predictable", this allowing you to better understand the limits of your system, tuning setup, and so on..

Pareto : this access pattern is opposite to "uniform" and will involve concurrency on data !

very good to evaluate capabilities of DB Engine to manage data locking
as well deadlocks detection and ability to progress even within data hot locking conditions
particularly interesting will be to test --trx-retry=on additional option to see how well your DB Engine will manage users which are trying to replay the same transaction in case of rollback (some Engines are skipping deadlock detection on early stages and aborting active transaction, so if user will start new transaction with different data => this will allow to show higher overall TPS, but could be not "fair" for real-world apps where user will retry the same transaction again and again)..

Zypfian : similar to "pareto", except there will be mostly no data concurrency

good for cases when you want to understand what kid of performance to expect from your DB Engine when not whole db data are constantly accessed, but only part of it
this will not change anything too much if your whole db is cached in memory
however, it'll be totally different in IO-Bound conditions :
you can evaluate if your storage is generally already "good enough" if most of accessed data will fit your memory (BP size)
or if you'll place most accessed data to faster storage
etc..

All Together

To summarize now all explained above General Notes, let's get a look on how the combination of choices from the following list will impact final OLTP_RW workload performance :

Random Pattern :
uniform
pareto
special (default in original Sysbench)
gaussian
zipfian
Scenario :
Single-Table
Multi-Table
Workload conditions :
In-Memory (no IO reads)
Partially IO-bound (more than 60% of data can be cached in BP)
IO-bound (less than 33% of data can be cached in BP)

In-Memory

OLTP_RW, 10Mx8tab (20GB) :

OLTP_RW, 50Mx1tab (12.5GB) :

curiously, the only really impacting rand pattern is "pareto"
the impact is much higher when single tables is used -vs- 8 tables

Still, all workloads are in-memory. What about IO-bound ?..

Partially IO-bound | BP size=64G

OLTP_RW, 50Mx8tab (100GB) :

OLTP_RW, 400Mx1tab (100GB) :

Heavily IO-bound | BP size=32G

OLTP_RW, 50Mx8tab (100GB) :

OLTP_RW, 400Mx1tab (100GB) :

Hope it's all clear now ? -- happy testing ;-))

Sysbench Output

Just in case you never seen Sysbench results output before, here are few examples about. Well, first of all, the default output "format" is exactly the same for all above mentioned workloads, which is extremely useful for understanding of obtained results ;-))

Generally, once the test is started, you'll see corresponding start messages, after what at the end you'll get a result summary reported by Sysbench, similar to the following OLTP_RW result with 1024 users :

sysbench 1.1.0 (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1024
...

Threads started!
...

SQL statistics:
    queries performed:
        read:                            15985872
        write:                           4567392
        other:                           2283696
        total:                           22836960
    transactions:                        1141848 (3801.40 per sec.)
    queries:                             22836960 (76028.07 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

Throughput:
    events/s (eps):                      3801.4034
    time elapsed:                        300.3754s
    total number of events:              1141848

Latency (ms):
         min:                                   19.91
         avg:                                  269.10
         max:                                 4241.68
         95th percentile:                      493.24
         sum:                            307269899.28

Threads fairness:
    events (avg/stddev):           1115.0859/17.93
    execution time (avg/stddev):   300.0683/0.02

The most important summary result's metrics are :

QPS (queries/sec, which is 76K in above example)
TPS (transactions/sec, which is 3.8K above)
Query response times :
avg (269ms above)
95% percentile (493ms above)
max (4.2 sec(!) above)

Which is generally enough to get an idea about delivered performance :

QPS and TPS are saying you about your processing throughput
while QPS -vs- TPS ratio gives an idea how many queries in avg were executed in transaction
(in the above example there is roughly 20 queries per transaction)
and response time says you about overall processing stability :
in the above example the 95% time is x10 times lower than max
means that max time is rather very exceptional, and not common
from the other side, 95% time is nearly x2 times higher than avg
means that most of response times are really around avg
so, overall processing was rather stable, and the only we need to know if 493ms response time is acceptable in the given case, or not

However, personally the most interesting for me is also to see the real-time Sysbench result stats reported for the currently running test workload right now. This feature is involved in Sysbench by using --report-interval=N where N is time interval in seconds.

For example, with OLTP_RW :

$ bash /BMK/sb_exec/sb11-OLTP_RW_10M_8tab-uniform-trx.sh 256 300 --report-interval=1

which will provide you this kind of output during the run :

[ 1s ] thds: 1024 tps: 3148.59 qps: 77199.23 (r/w/o: 56873.00/13033.03/7293.21) lat (ms,95%): 549.52 err/s: 0.00 reconn/s: 0.00
[ 2s ] thds: 1024 tps: 3972.10 qps: 81953.76 (r/w/o: 56304.64/17705.92/7943.21) lat (ms,95%): 442.73 err/s: 0.00 reconn/s: 0.00
[ 3s ] thds: 1024 tps: 4359.29 qps: 84721.70 (r/w/o: 60014.04/15987.07/8720.59) lat (ms,95%): 442.73 err/s: 0.00 reconn/s: 0.00
[ 4s ] thds: 1024 tps: 4057.11 qps: 81225.16 (r/w/o: 57150.52/15964.43/8110.22) lat (ms,95%): 419.45 err/s: 0.00 reconn/s: 0.00
[ 5s ] thds: 1024 tps: 4410.87 qps: 81122.65 (r/w/o: 54418.42/17899.48/8804.74) lat (ms,95%): 475.79 err/s: 0.00 reconn/s: 0.00
[ 6s ] thds: 1024 tps: 3577.92 qps: 79383.33 (r/w/o: 58109.78/14097.70/7175.85) lat (ms,95%): 434.83 err/s: 0.00 reconn/s: 0.00
[ 7s ] thds: 1024 tps: 4176.04 qps: 81865.78 (r/w/o: 55109.52/18416.18/8340.08) lat (ms,95%): 419.45 err/s: 0.00 reconn/s: 0.00
[ 8s ] thds: 1024 tps: 4243.74 qps: 84787.79 (r/w/o: 61201.24/15087.07/8499.48) lat (ms,95%): 411.96 err/s: 0.00 reconn/s: 0.00
[ 9s ] thds: 1024 tps: 4476.26 qps: 85672.03 (r/w/o: 58498.43/18239.07/8934.52) lat (ms,95%): 356.70 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 1024 tps: 4072.91 qps: 83152.23 (r/w/o: 59076.74/15911.66/8163.83) lat (ms,95%): 390.30 err/s: 0.00 reconn/s: 0.00
...

from where you're getting your current QPS and TPS
95% query response times
and the presence of errors, if any

Personally, I'd suggest you to always execute any Sysbench workload with live stats, because :

if your test for some unexpected reasons will crash at the end, you'll not get the summary
while from the live stats output you may still get an overall idea about the final result
as well, I'm always using 1 sec. interval for live reports
this will help you to discover if there were any processing stalls during the test
because having stalls in production can have much more bigger negative impact than anything else !! -- always keep it in mind..

Reports in CSV format

In case you prefer to see Sysbench intermediate and/or final results reports in CSV format, you can enable this by setting the following environment variables:

SYSBENCH_REPORT_CSV=yes -- print intermediate reports in CSV
SYSBENCH_FINAL_REPORT_CSV=yes -- print final report in CSV

which is then will look similar to this like in the following example with a tests with 4 concurrent users and 20sec time duration:

time,thds,tps,qps,r,w,o,lat95,err/s,reconn/s
1,4,71003.67,71003.67,71003.67,0.00,0.00,0.06,0.00,0.00
2,4,70364.54,70364.54,70364.54,0.00,0.00,0.06,0.00,0.00
3,4,70436.97,70436.97,70436.97,0.00,0.00,0.06,0.00,0.00
4,4,68110.06,68110.06,68110.06,0.00,0.00,0.07,0.00,0.00
...
19,4,67351.38,67351.38,67351.38,0.00,0.00,0.07,0.00,0.00

Final report :
time,thds,tps,qps,r,w,o,lat95,err/s,reconn/s
20,4,68132.14,68132.14,68132.14,0.00,0.00,0.07,0.00,0.00

DimitriK's (dim) Weblog

MySQL Performance : Benchmark kit (BMK-kit)

Preface

Setup BMK-kit

Localhost & MySQL

Using SSL

Sysbench "Original" Workloads

Preparing 10Mx8tab data set

OLTP_RO | Point-Selects

OLTP_RO

OLTP_RO | Simple Ranges

OLTP_RO | Ordered Ranges

OLTP_RO | SUM Ranges

OLTP_RO | Distinct Ranges

OLTP_RW

Non-Index-UPDATE (aka UPDATE-NoKEY)

Index-UPDATE (aka UPDATE-KEY)

Write-Only

50Mx8tab data set

Data Access Pattern

New Extensions For "Original" Sysbench Workloads

New Extended Workloads

OLTP_RW+

OLTP_RW++

UPDATE-ranges

index-update-ranges

non-index-update-ranges

OLTP_RO over Secondary Index

OLTP_RO | Simple Ranges over Secondary Index

Re-Connect Workloads

IMPORTANT

TPCC

Preparing TPCC data set

Run TPCC Workload

New Extensions For "Original" TPCC Workload

dbSTRESS

Preparing Your Dataset

Test Scenarios

Example of dbSTRESS Workload

Full dbSTRESS options for Sysbench

SYNC_file Module

General Notes About Test Workloads

Single Table -vs- Many Tables

IO-bound Workloads

IO-Bound on Single-Table :

IO-Bound on Multi-Tables :

Since I got few more questions about "fake" IO-bound results :

IP port -vs- UNIX socket

When use Uniform / Pareto / Zipfian access pattern in Sysbench

All Together

In-Memory

Partially IO-bound | BP size=64G

Heavily IO-bound | BP size=32G

Sysbench Output

The most important summary result's metrics are :

Which is generally enough to get an idea about delivered performance :

For example, with OLTP_RW :

Personally, I'd suggest you to always execute any Sysbench workload with live stats, because :

Reports in CSV format