In MySQL 8.0 we introduced a totally new design for InnoDB REDO Log management. The main difference was about implementing a lock-free solution for user threads, and use dedicated REDO threads for all background IO write work.
for more details, see an excellent and very detailed article by Pawel : https://dev.mysql.com/blog-archive/mysql-8-0-new-lock-free-scalable-wal-design/
However, over a time we also added an option to let users to switch REDO threads=OFF to enforce REDO log processing efficiency in some particular cases. Unfortunately this feature created a lot of confusions for MySQL users, and many ones interpreted this in different ways, providing different and sometimes opposite advices, etc..
My main advice will be always : test each feature yourself and within your own Production environment !! -- and at the end, this is all what matters ! -- regardless what you'll read on the Net (including the following post ;-))
In case you don't have Production workloads for testing (like me), or afraid to break Production and want to do some initial probe experiments -- there are many generic benchmark workloads available around MySQL, so you can play with this to help you get a better understanding of what is going on.
And this is what I'll try to do here -- to give you yet-one-more-advice about REDO Log threads ;-))
REDO Log Threads
So, when it makes sense to keep REDO threads=ON and when switch them OFF ? -- the generic historical advice was the following :
- on high-concurrency workloads -- better to keep REDO threads=ON
- on low-concurrency workloads -- switching REDO threads=OFF may provide a better performance
Sounds correct, except what to do if your Production activity is constantly switching from low concurrency to high concurrency, and back again, etc. ? -- does it mean you need to choose what is more critical for you -- low load or high load, and then configure REDO threads accordingly ? -- I don't think so.. -- I'd rather say "it depends" ;-))
Now, it "depends" on what ? -- I'd say on few, but important things :
- the number of CPU Cores you have on your server
- the processing pattern of your Production Workload
- data safety constraints related to keep Binlog enabled or not
and maybe also on few others, but I'll stop on these 3 for the moment ;-))
Number of CPU Cores
While generally a cost of adding a new thread to your code processing is pretty low, to deliver an efficient processing on the System with Few CPU Cores requires a different strategy comparing to Many CPU Cores System:
- when each thread is doing its independent work, it's all going fine, it'll just scale according your overall CPU capacity
- however, the story will change once you have ping-pong dependency between tasks executed by different threads
- in fact with a low number of CPU Cores you can only run a low number of threads on the same time -- so, you can get higher processing efficiency if you'll allow to execute more work to a thread-A which already got CPU time, rather delegate this work to thread-B which should be yet waked-up, which will do the work and then need to notify thread-A about..
All in all, in short -- on systems with few CPU Cores it could be more efficient to let user threads to do REDO job once they are already got CPU time (rather they send data changes first, and then wait when REDO thread will get CPU time on its turn to be able to do REDO job, and then wait on notification, and so on)..
NOTE : and this is not really either related to long-running or short transactions ! -- there is a work to be done, and this is about overall efficiency to execute this work.
Well, this is all "in theory", which may be still different "in practice", because it'll always depend on many various factors (where major part is still related to the code implementation anyway) -- So, it's a long time I wanted to validate (or reject) the above proposal by real testing, and the following is what I have to share about.
Test Systems
- BM.48 : 48cores-HT Intel Server, 192GB RAM, local NVMe
- BM.24 : 24cores-HT Intel Server, 256GB RAM, local NVMe
- E4.16 : OCI VM, 256GB RAM, BV Storage
And we'll try to simulate various range of small-to-big systems by binding MySQL Server process via "taskset" to the following numbers of CPU cores (whenever it's possible and makes sense):
- 4cores-HT
- 6cores-HT
- 8cores-HT
- 12cores-HT
- 16cores-HT
- 24cores-HT
- 48cores-HT
"-HT" means CPU Core Hyper Threads are enabled
Test Workloads
All test workloads are "generic" Sysbench-based test scenarios, using 100GB data set (1000W for TPCC and 50M x 8 tables for other OLTP tests):
- TPCC-1000W
- OLTP_RW
- OLTP_RW+ (same as OLTP_RW, but without range SELECTs)
- OLTP_RW++ (same as OLTP_RW, but SELECTs = only 1 point-select)
- UPDATEs bombarding
- INSERTs bombarding
you can find all the details about test details and exact execution options from the BMK-kit HOWTO doc : http://dimitrik.free.fr/blog/posts/mysql-perf-bmk-kit.html
All the tests are executed with a progressively growing load levels :
- concurrent users : 1, 2, 4, 8, ... 2048
- each load level remains running for 5min
Test Results
The following test results are obtained with MySQL 8.4 LTS using the new auto-configured defaults:
[mysqld] max_connections = 10000 innodb_dedicated_server = ON ### uncomment for REDO threads=OFF : # innodb_log_writer_threads=0 ### uncomment for Binlog=OFF : # skip_log_bin=1
The same tests are executed with Binlog=ON (default) and Binlog=OFF to clarify the impact of Binlog along with REDO threads=ON/OFF. The tests are executed "locally" on BM.24 server, and "over-Network" on BM.48 and E4.16 to be more close to Production conditions.
The labels used in graphs are :
-
RT0
: REDO threads=OFF -
RT1
: REDO threads=ON -
binlog0
: Binlog=OFF -
binlog1
: Binlog=ON
The results are presented in the following order :
- first all the tests executed with Binlog=OFF
- then the same, but with Binlog=ON
- and finally a single graph for each of the tests showing RT0/1 and binlog0/1 results together to get a full picture for each test case
NOTE : there are many graphs with various results, but you don't really need to focus on the absolute numbers to get an idea about the "tendencies" in the results and understand an overall impact of one or another config setting..
TPCC | Binlog=OFF
4 cores
8 cores
12 cores
16 cores
24 cores
48 cores
OLTP_RW | Binlog=OFF
4 cores
6 cores
8 cores
12 cores
16 cores
24 cores
48 cores
OLTP_RW+ | Binlog=OFF
4 cores
6 cores
8 cores
12 cores
16 cores
24 cores
48 cores
OLTP_RW++ | Binlog=OFF
4 cores
6 cores
8 cores
12 cores
16 cores
24 cores
48 cores
UPDATEs | Binlog=OFF
4 cores
6 cores
8 cores
12 cores
16 cores
24 cores
48 cores
INSERTs | Binlog=OFF
4 cores
6 cores
8 cores
12 cores
16 cores
24 cores
48 cores
TPCC | Binlog=ON
4 cores
6 cores
8 cores
12 cores
16 cores
24 cores
48 cores
OLTP_RW | Binlog=ON
4 cores
8 cores
12 cores
16 cores
24 cores
48 cores
OLTP_RW+ | Binlog=ON
4 cores
6 cores
8 cores
12 cores
16 cores
24 cores
48 cores
OLTP_RW++ | Binlog=ON
4 cores
8 cores
12 cores
16 cores
24 cores
48 cores
UPDATEs | Binlog=ON
4 cores
6 cores
8 cores
12 cores
16 cores
24 cores
48 cores
INSERTs | Binlog=ON
4 cores
6 cores
8 cores
12 cores
16 cores
24 cores
48 cores
TPCC | Binlog=ON/OFF
4 cores
6 cores
8 cores
12 cores
16 cores
24 cores
48 cores
OLTP_RW | Binlog=ON/OFF
4 cores
6 cores
8 cores
12 cores
16 cores
24 cores
48 cores
OLTP_RW+ | Binlog=ON/OFF
4 cores
6 cores
8 cores
12 cores
16 cores
24 cores
48 cores
OLTP_RW++ | Binlog=ON/OFF
4 cores
6 cores
8 cores
12 cores
16 cores
24 cores
48 cores
UPDATEs | Binlog=ON/OFF
4 cores
6 cores
8 cores
12 cores
16 cores
24 cores
48 cores
INSERTs | Binlog=ON/OFF
4 cores
6 cores
8 cores
12 cores
16 cores
24 cores
48 cores
SUMMARY
If you're running MySQL Server without Binlog (Binlog=OFF) :
- decision about switching REDO threads=ON/OFF will directly depend on how your workload is write-intensive
- with high level of writes you can get the benefit from having REDO threads=ON (current default in MySQL 8.4 LTS) even on small systems !
- and this is regardless of high or low concurrency
Then, if you're running MySQL Server with Binlog (Binlog=ON) :
- on small systems, up to 16cores, you may "generally" see better performance with REDO threads=OFF
- while on bigger systems the gap between REDO threads=ON/OFF becomes more and more lower, and you're more safe with keeping REDO threads=ON
- and this is again regardless of higher or lower concurrency, and just about the size of your system
- interesting that this is also mostly regardless if your workload is write-intensive or not -- Binlog impact aligns this to the same range
Keeping in mind that Binlog=ON by default since MySQL 8.0, the future auto-config implementation of MySQL defaults can be extended to auto-adaptive switching of REDO threads=ON/OFF according to the system size.
IMPORTANT : once again, defaults are defaults and results are results, while your own Production Workload can still be totally different ! -- so, always base your decision on your own experience !! -- and adapt your config settings only according to your own test results on your systems ! -- and if not in Production, then with test workloads which are really representative for your Production ! Rather consider any kind of external or generic test results as "information for your interest" and something to validate first by yourself, and only then apply if it's really valid for you..
Thank you for using MySQL !