MySQL Performance : more in depth with latin1 and utf8mb4 in 8.0 GA

Looking on my previously obtained results on Read-Only (RO) tests for latin1 and UTF8 charsets, one question continued to turn in my mind :

- if MariaDB 10.3 is hitting a so deep drop on "distinct-ranges" workload :

- why then this is not impacting the "mixed" OLTP_RO workload results (which is containing "distinct-ranges" query too) :

The answer was within the test title :

I've missed one zero in my scripts while preparing initial tests.. ;-))
so, the "distinct-ranges" test was using range size=10 (instead of 100 by default)
while "mixed" OLTP_RO remained with default settings, and used range size=100 for all range tests..
was the use of a smaller range size which that much impacted MariaDB ?..
(generally people are expecting to see wider range queries to be more impacting than the smaller ones, right ?)..

To clarify this all -- I decided to replay RO tests with a more detailed analyze..

However, few remarks :

I'll still use the same 48cores-HT Skylake server as in the previous tests
but this time I'll only replay the tests on full 48cores-HT config
(e.g. running only on 2CPU Sockets (2S), skipping 1CPU Socket (1S) tests)
this will no more "hide" scalability issues (if any)
(e.g. before, if on 1S QPS was higher than on 2S => I was always presenting a higher possible QPS for a given database version -- but this time there will be only 2S results)..

For ex. this is the case of MariaDB for the above test :

Comments :

this is the same mentioned "distinct-ranges" workload in question
the left side of the graph representing the result on 1S
and the right side - on 2S
as you can see, QPS result on 1S is higher than on 2S
so, MariaDB is not scaling on this workload up to 2S
MySQL 5.6 is in the same situation.. -- but it was so much done already since 5.6, right ? ;-))

But well, the most important in the whole story that you understand the "tendency" -- what is scaling, what is not ; what is really impacting and how much ; how MySQL 8.0 is now is fixing this all together and why ;-))

Re-Tested workloads scenarios :

Sysbench workloads : OLTP_RO "mixed", point-selects, simple-ranges, ordered-ranges, distinct-ranges
range size in "ranges" tests : 10, 100
dataset : 8 tables of 10M rows each
concurrent user sessions : 1, 2, 4, .. 1024
charsets labels :

"latin1" : latin1 with latin1_swedish_ci collation
"utf8mb4-generic" : utf8mb4 with utf8mb4_general_ci collation
"utf8mb4-unicode" : utf8mb4 with utf8mb4_unicode_ci collation

Why there are 2 test cases for utf8mb4 charset ? -- the difference is in "collection" details, because for "generic" use of utf8mb4 configuring utf8mb4_general_ci collection could be "good enough", however NOT "accurate" ! -- and for "accurate" support you have to use utf8mb4_unicode_ci collation (which represents a higher overhead)..

You can find all the details about from here :

- https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-sets.html

Fortunately, with MySQL 8.0 for utf8mb4 charset you can use utf8mb4_0900_ai_ci collation which is both "accurate" and much more efficiently implemented !

For my.conf details see the previous article.

Results with latin1

OLTP_RO 10Mx8tab "mixed" latin1 :

OLTP_RO 10Mx8tab "point-selects" latin1 :

OLTP_RO 10Mx8tab "simple-ranges" range=10 latin1 :

OLTP_RO 10Mx8tab "simple-ranges" range=100 latin1 :

OLTP_RO 10Mx8tab "ordered-ranges" range=10 latin1 :

OLTP_RO 10Mx8tab "ordered-ranges" range=100 latin1 :

OLTP_RO 10Mx8tab "distinct-ranges" range=10 latin1 :

OLTP_RO 10Mx8tab "distinct-ranges" range=100 latin1 :

Comments :

biggest InnoDB Read-Only scalability issues were fixed in MySQL 5.7
all vendors who adopted InnoDB 5.7 code base got a benefit from these changes (MariaDB included)
so the gap comparing to MySQL 5.6 here is just "as expected"
seeing lower QPS on all "ranges" tests with a bigger range size is also "as expected" (reading x10 times more rows within a single query for sure has an additional cost, right ?)
however, what is not expected for me, is to see MariaDB not scaling at all on "distinct-ranges" workload..
using range size=100 just helping to "hide" the problem, while range size=10 is rather pointing on it clearly !
seems like I have to add the range size=10 to all my further tests since now, and also execute "mixed" OLTP_RO both ways (range size=100 (default) and 10)..
NOTE : what kind of queries are sensible to be impacted by pointed problem ? -- at least :

any "SELECT distinct ..." queries
and "SELECT ... group by ..."

not cool..

Results with utf8mb4-generic

OLTP_RO 10Mx8tab "mixed" utf8mb4-generic :

OLTP_RO 10Mx8tab "point-selects" utf8mb4-generic :

OLTP_RO 10Mx8tab "simple-ranges" range=10 utf8mb4-generic :

OLTP_RO 10Mx8tab "simple-ranges" range=100 utf8mb4-generic :

OLTP_RO 10Mx8tab "ordered-ranges" range=10 utf8mb4-generic :

OLTP_RO 10Mx8tab "ordered-ranges" range=100 utf8mb4-generic :

OLTP_RO 10Mx8tab "distinct-ranges" range=10 utf8mb4-generic :

OLTP_RO 10Mx8tab "distinct-ranges" range=100 utf8mb4-generic :

Comments :

indeed, moving to UTF8 is changing the game..
however, as soon as your buffers are configured big enough to not involve temp.tables for row processing, most of queries may still run not much slower than latin1 if they are not doing row comparisons (sorting, grouping, etc.) -- this is a clear case for "point-selects", and pretty similar for "simple-ranges"..
while for "ordered-ranges" and "distinct-ranges" the story is different -- and you can see much more bigger impact there ! -- as well, with a bigger range size more rows are processed, which may only yet more amplify the overhead..
and again, it was "generic" case.. -- if you need a real "accuracy", the impact then on these "sensible" queries is even more big ! -- just look on the results below..

Results with utf8mb4-unicode

OLTP_RO 10Mx8tab "mixed" utf8mb4-unicode :

OLTP_RO 10Mx8tab "point-selects" utf8mb4-unicode :

OLTP_RO 10Mx8tab "simple-ranges" range=10 utf8mb4-unicode :

OLTP_RO 10Mx8tab "simple-ranges" range=100 utf8mb4-unicode :

OLTP_RO 10Mx8tab "ordered-ranges" range=10 utf8mb4-unicode :

OLTP_RO 10Mx8tab "ordered-ranges" range=100 utf8mb4-unicode :

OLTP_RO 10Mx8tab "distinct-ranges" range=10 utf8mb4-unicode :

OLTP_RO 10Mx8tab "distinct-ranges" range=100 utf8mb4-unicode :

Comments :

as expected, point-selects and simple-ranges are not more impacted than before..
however, we cannot say the same about all other workloads ;-))
and MySQL 8.0 is the only one doing the best in all test cases !

INSTEAD OF SUMMARY

the biggest InnoDB scalability issues on Read-Only workloads were fixed since MySQL 5.7
all vendors who adopted InnoDB 5.7 code base, benefit the same improvements
MariaDB 10.3 still has scalability issues, the most dramatic ones are on distinct-ranges (impacting your "SELECT distinct .." and "SELECT .. group by .." queries).. -- and yet more bigger impact to expect if you're planning to use UTF8..
MySQL 8.0 is going further by defining a new gold standard on keeping highest QPS not only for latin1, but also for utf8mb4 charset !

Thank you for using MySQL ! -- stay tuned ;-))

Rgds,

-Dimitri

DimitriK's (dim) Weblog

MySQL Performance : more in depth with latin1 and utf8mb4 in 8.0 GA