Last modified: 2009-04-18
MySQL Performance: Perf Version (zero build)
by Dimitri |
SSC Team, 2009
Sun Microsystems Inc.
Ces informations sont données à titre indicatif et n'engagent pas Sun Microsystems.
Table of contents
Customer Name(s): SSC
Keywords: MySQL, MySQL PerfVersion, InnoDB, Percona, XtraDB, db_STRESS
- M8000: 16CPU SPARC64-VI 2200Mhz bi-core bi-thread, 256GB RAM, 2x HBA FC-port 2Gbit
- ST6140: 2x LUNs (RAID1 1TB each), 2Gb Fiber Channel connection, each LUN is connected by its own fiber channel to the server
- MySQL v.5.1.30 64bit
Overview: Recently Sun/MySQL Perf Project announced a probe release of the very first MySQL Perf version code. New features are experimental, but looked promising and solving performance problems slightly differently comparing to others...
So, I absolutely wanted to try it and see how well it'll keep a db_STRESS workload comparing to default InnoDB plugin, as well previously tested XtraDB engine.
- Compare Perf version vs other InnoDB "variants"
- Check new scalability limits
- Find if any new way to improve performance again
Result(s): - see SUMMARY :-)
db_STRESS test scenarios will be still the same (for details get a look at
However redo logs will be placed on the separated LUN of the storage box (as it was done on the final configuration of XtraDB testing (see:
From the first probe test the result on the read-only workload is absolutely exciting! - Perf version is easily outpassing InnoDB plugin as well XtraDB by 50% !!
On the same time more investigation seems to me will be needed on the read+write load...
Read-Only Probe Stress Test: XtraDB vs Perf
- 9,000 TPS max for Perf version
- 6,100 TPS max for XtraDB (from previous tests on this machine)
- Perf version easily outpassing XtraDB! (and default InnoDB as well)
Read+Write Probe Stress Test: XtraDB vs Perf
- On first steps Perf versions performs better vs XtraDB
- Then since 4 concurrent sessions performance level is suddenly dropping down...
- Where is the problem?..
- More in depth investigation is needed here...
Investigating on Read+Write performance...
Analyzing more in details probe read+write tests, I was surprised to see better result within a "cold" run comparing to the "warm" one (means with cold and worm InnoDB cache)...
Until now, to avoid any I/O and other external "secondary effects" in comparing test results, I've executed all tests in the following order:
- restart MySQL server to free buffer cache
- run "cold" read-only (RW=0) test with empty cache
- run "cold" read+write (RW=1) test with zero dirty pages
- run "warm" read-only (RW=0) test with partially filled cache
- run "warm" read+write (RW=1) test with partially filled cache
As usually "warm" tests presented better results, I always used only them to compare engines... But in this case "cold" is outperformed "warm", how?...
To reach a common "Compatible Base" for all tested InnoDB variations there are at least 2 straight forward solutions:
Increase execution time of each read+write test:
- (+) free buffers will be out in all cases on longer test
- (-) still no guaranty it'll generate the same conditions :-)
- (-) but also global execution time will be increased...
Vary size of buffer pool:
- and I prefere this one, even it'll still generate more work again :-)
- my idea is to replay all tests again with buffer pool = 6GB and 12GB
- 12GB pool will give enough space to run all tests still having free pages
- 6GB will force all tests to run without free buffers for sure and in the same bad conditions
- comparing performance impact between 12G and 6G buffers may give new ideas for further optimizations ;-)
All tests are running with a big enough number of free buffers.
Perf version is outperforming all other candidates!
Read-Only Stress Test @Pool=12G: Perf vs XtraDB vs InnoDB
- Perf version is a true winner! :-)
- near 100% gain comparing to the default InnoDB plugin!
Read+Write (RW=1) Stress Test @Pool=12G: Perf vs XtraDB vs InnoDB
- Perf version is still a strong winner here even TPS gap is not so important as on read-only workload :-)
- NOTE: RW=1 means there is one write transaction for every read
Read+Write (RW=10) Stress Test @Pool=12G: Perf vs XtraDB vs InnoDB
- Perf version is still a strong winner here!
- TPS gap is mostly seen due better response time on read operations on Perf version
- NOTE: RW=10 means there is one write transaction for every 10 reads
All tests are running with buffere pool out of free buffers (except read-only workload, so no need to present again)
Read+Write (RW=1) Stress Test @Pool=6G: Perf vs XtraDB vs InnoDB
Perf version is only partially outperforming all other candidates here, then with higher load it's more equal and comparing with XtraDB..
Read+Write (RW=10) Stress Test @Pool=6G: Perf vs XtraDB vs InnoDB
However, on RW=10 ratio Perf version is still making a huge performance gap due its more efficient concurrency management which decreasing significantly read transaction response time.
Analyzing more and more in depth I may summarize now my observations on free buffer impact:
- note1: all read+write tests are running with innodb_flush_log_at_trx_commit=2
- note2: disk storage is not even 100% busy, there is no truly storage-related I/O bottleneck
- however severe performance drop is observing one buffer pool is out of free pages
- all "classic" explanations of this degradation I've found are saying "it's normal - once buffer is out of free pages, engine is starting to flush dirty pages to make a room, and usually it's meeting I/O bottleneck and performance is dropping..."
- according my observations it's not really true:
- there is no increased I/O write operations (they even decreasing)
- storage box is not overloaded...
- HOWEVER: the read I/O operations come back!!
- and on my understanding what's going here:
- instead to flush dirty pages, engine removes cache pages!
- which is resulting in physical read from disk (probably mostly random)
- random read here will be much more costly then writes! (storage has a write cache protected by battery)
- and I think these reads here is the main source of performance degradation!
Read+Write (RW=1) Stress Test Perf version: impact 12G vs 6G pool
Read+Write (RW=1) Stress Test XtraDB: impact 12G vs 6G pool
More about the "free buffer issue".
I think there is a wrong logic somewhere (probably a bug or something
is missing in design, etc.) - I know it sounds strange, but let's
- initially buffer pool is free (800.000 pages)
(see graph above)
- after 20min of read-only activity 50% of buffer space is used
to cache data; no reads anymore from the disk; reached max performance
(database-pages = 400.000 pages)
- now writes are added to reads:
- modified-db-pages value is increasing with growing workload (up to 100.000 pages)
- free-buffers value is decreasing nearly proportional
- database-pages value is slightly increasing
- once load is finished:
- dirty pages are fully flushed
- modified-db-pages value is going to zero
- free-buffers value is remaining the same
- database-pages value is remaining the same too
- I still may accept "modified-db-pages" is just a marker, and once it set to zero
it doesn't mean it'll free any buffers, etc. Why not, even the increased value of
database-pages is not completely matching the gap, BUT! During my workload mostly
the same pages are modified all the time! So, whatever happening it should become
stable with time, no?.. However, if I repeat the same read+write load 2 or 3 times
again - I'm out of free buffers!
- On my feeling, there is something wrong going with buffer management generally
and probably garbage is not removed on time (or something going wrong with counters?)
How do you explain it?..
Honestly, today's the most big I/O problem for any database will be a random read .
We have tons of solution to optimize log writes and any writes generally, but
there is nothing to optimize random read except to keep in in the database
cache! otherwise you have to really read data from the storage and it's the most
long I/O operation you may request just because the storage box has exactly
the same problem! :-))
- innodb buffer is out of free buffers
- getting free pages by removing some data from the "read cache" is the most worse idea here...
- it brings new random reads and killing global TPS level (5ms read is killing comparing to 0.2ms writes)
As you may see from graphs, stable 5,000 TPS vs ~avg 3,000 TPS is a huge difference!
Probably we may try an option here saying "I prefer to flush harder and keep
my cache worm"? (what is innodb_max_dirty_pages have to do normally, but it's
not a case)... - and again, there is seems to be something to do with a "priority"
of pages to be removed...
InnoDB Concurrency Management
Due high number of locks, having a concurrency limit within InnoDB is a good thing (see: http://dimitrik.free.fr/db_STRESS_BMK_2008.html#note_5231 )
On the same time all InnoDB concurrency implementations are only taking care about "internal contentions" and forgetting about "external" factors:
- the default concurrency model is pretty well explained here: http://www.mysqlperformanceblog.com/2006/06/05/innodb-thread-concurrency/
- Perf version avoiding mutexes and based on self-sleeps-timers (Mikael may explain it in details much more better then anyone :-))
- but all models anyway seems to do not take care about waits for external events
- from my observations, any thread being "active" is not removed from "active list" when it's doing physical read from disk!
- that's why active and waiting on read thread are breaking performance level and blocking other threads to work...
- it's clearly seen on read-only workload with a "cold" buffer cache
- on the same time with innodb thread concurrency = 0 there is no performance "jumps"
Probably I need to give here more details about performance "jumps" :-)
Let's take an example:
- Let's say we have already 16 concurrent sessions running and reading from
the database; over a time most of the data is sitting in the buffer pool (probably
it's a very optimistic case, but let's keep it), and our innodb thread concurrency = 16
- Now 16 new concurrent sessions arriving and starting to read too; buffer pool
is plenty free yet, so new sessions have enough space to cache their data;
- What may we expect in this situation:
- Avg response time may increase due high initial resp.times for the new sessions
- TPS level should stay at least the same as all "old" sessions are reading
from cache and should not get any penalty from the "new" ones
- Instead TPS level is dramatically dropping down... (The same situation is observed
with XtraDB as well original InnoDB plugin.) - and only with time it's increasing and reaching
- NOTE: it's not happening if concurrency = 0 (but there are other problems with 0 :-))
- I'm pretty sure if thread starting disk read() will be placed off from the active list - it should improve global performance a lot!
What do you think?..
Example of COLD Read-Only Test: XtraDB vs Perf version
1600 users test is more likely a "weakness test":
- number of concurrent sessions is growing with time
- each session is sleepping 1 sec. between transactions
- ideally all 1600 users should get their processing time
- on read-only workload it's ok with limited innodb concurrency (1600TPS reached at the end)
- on read+write the load is twice higher and it's quite hard to obtain 3200TPS at the end
- however, some engines are performing here better comparing to other...
- XtraDB seems to be the most stable here!
- Perf version is loosing for the moment...
- Concurrency management is still needed to be improved
- As read-only workload performs even better at 16 cores with Perf version, I've tried to see if having more cores may help here: as you may see it's even worse...
Well, there is still a room for improvement! and let's take it as a challenge :-)
Test 1600 Users Read+Write (RW=1): InnoDB default @8cores
Test 1600 Users Read+Write (RW=1): XtraDB @8cores
Test 1600 Users Read+Write (RW=1): Perf version @8cores
Test 1600 Users Read+Write (RW=1): Perf version @16cores
Here you may find more detailed information about all previously described tests.
- see InnoDB CPU Usage: Perf version having the smaller Sys% CPU usage!
- Perf version has less spin locks
- InnoDB Log Writes/sec is quite impressive in some cases (and give you a real idea what you have to expect from your storage once innodb_flush_log_at_trx_commit will be set to 1 :-))
- 24 cores is seems to be too much for the moment :-)
- redo logs are placed into the separated storage LUN, so you may easily see the level of I/O activity on the data and redo logs.
STATs Tests 12G Pool, 8 cores, tx=2
STATs Tests 12G Pool, 8 cores, tx=2
STATs Tests 6G Pool, 8 cores, tx=2
STATs Tests 6G Pool, 8 cores, tx=2
STATs Tests with 16/24 cores, and concurrency= 16/32
STATs Tests with 16 cores
- NOTE: XtraDB was already better comparing to default InnoDB plugin (see:
- Perf version outpass XtraDB by 50% on read-only workload at 8 cores!
- Perf version outpass XtraDB by 20% on read+write workload at 8 cores!
- On 16 cores Perf version continues to increase performance while XtraDB and default InnoDB are dramatically dropping down! (x2-3 times slower)
- On 16 cores and read-only workload Perf version is twice(!) better comparing to XtraDB on 8 cores!
- Perf version is not keeping workload on Test 1600 users (while it's less a problem for XtraDB and InnoDB)
- enabling default concurrency management (available via my.conf option) may be used as a workaround in this case for the moment...
Need to investigate:
- Free buffers impact is quite severe - need to investigate why cached pages are removed prior (or instead) to flush dirty pages (or probably flushing is not going fast enough?? - on the same time storage is not used fully yet)..
- Concurrency model should be improved to keep 1600 users at least as well as default InnoDB engine.
- Concurrent throughput may be improved a lot if active thread starting a read() from disk will remove itself before from the "active list" and back to the queue - read operation takes much more time comparing to reading from cache, and it may unlock another thread waiting for CPU time.