<< | [up] | >> |
dim_STAT User's Guide. by Dimitri |
Add-On Statistics |
One of the most powerful features of dim_STAT is the ability to integrate your own statistic programs with the tool. Once added, they will be considered by dim_STAT as being the same as the standard set of STAT(s) and give you the same kind of service: Online Monitoring, Up-Loading, Analyzing, Reporting, etc. However, the choice of external stat programs is so wide that it's quite impossible to design a wrapper for each and every format. Therefore, I've decided to limit the input recognizer to just 2 formats (which covers maybe 95% of needs) and leave it to you to write, if necessary, your own wrapper and modify the output to one of the supported formats. Formats supported by dim_STAT:
- SINGLE-Line: with one output line per measurement (ex: vmstat)
- MULTI-Line: with several output lines per measurement (ex: iostat) To be correctly interpreted, your stat program should produce a stable output. This means the same format for data lines, at least one line in case of MULTI, keep the time-out interval constant, etc. Lines not containing data have to be declared, so that they can be ignored by dim_STAT. NOTE: lines shorter than 4 characters are considered as "spam" and will be ignored! Let's look at some examples...
Example of SINGLE-Line command integration |
Let's assume we want to monitor a read/write cache hit on the system. This information can be retrieved using "sar":What we are interested in are the "4"-th and "7"-th columns from the sar output, and ignoring any lines containing "*SunOS*" or "*read*". Folowing the "Integrate New Add-On-STAT" link:$ sar -b 1 1000000000000000 SunOS sting 5.9 Generic_112233-05 sun4u 07/09/2004 18:10:13 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s 18:10:14 0 1 100 0 0 100 0 0 18:10:15 0 14 100 0 0 100 0 0 18:10:16 0 7 100 0 0 100 0 0 18:10:17 0 0 100 0 0 100 0 0 18:10:18 0 0 100 0 0 100 0 0 18:10:19 0 135 100 0 0 100 0 0 18:10:20 0 0 100 0 0 100 0 0 18:10:21 0 69 100 0 2 100 0 0 18:10:22 0 86 100 0 2 100 0 0 18:10:23 0 0 100 0 0 100 0 0 18:10:24 0 0 100 0 0 100 0 0 18:10:25 0 0 100 0 0 100 0 0 ...
Step 1: FIRST INFO |
Let's give the new Add-On the name CacheHIT. We need only 2 columns from the output line (4th and 7th value). This is a "Single-Line" output... Click on "New"...
Step 2: INTEGRATION |
During this step we need to explain what we want to run and which information we'll need: Description: CacheHIT via SAR Shell Command: sar -b %i 1000000000000000Ignore Lines: we should ignore any lines containing "*SunOS*" or "*read*" Data Descriptions:
- During execution of sar %i will be replaced with the time interval in seconds.
- The command name doesn't matter here because it is only used as an alias for STAT-service. Have a look at the "access" file section, it's possible to name the shell command "toto" and put in it /usr/bin/sar as an alias.
Create!!
- ColumnName - leave it as it is, if you don't need to access the database directly. Note: there are 2 reserved columns for Collect-ID and measurement No.
- Data Type - if you're not sure, set it to "Float", otherwise it will be "Int"
- Column# on input - in our case we need columns 4 and 7
- Short Name - single word descriptions, here %rcache and %wcache
- Full Name - description to be used where detailed information is needed
- Use in Multi-Host - if you choose "Yes" the corresponding value will be automatically enabled in Multi-Host mode for analyzing of several hosts at once.
Created! |
What's Next? Will it work now? Yes! IF YOU DID NOT FORGET to give your STAT-service access to this new command! This is a very common error. If you want to collect "CacheHIT" data from server "S" be sure that the STAT-service on "S" is given execution permissions for the "sar" command. Add the following lines to your /etc/STATsrv/access file:And now it'll work! :-)) NOTE: for security reasons and for a cleaner "stat to command" relationship, it is preferable to create for our new add-on a specific script 'CacheHIT.sh', and then use that instead of the direct access to the 'sar' command. Example:# CacheHIT Add-On command sar /usr/sbin/sar #And the Add-On shell command needs to be changed to: "CacheHIT %i"$ cat /etc/STATsrv/bin/CacheHIT.sh #bin/ksh exec /usr/sbin/sar -b $1 1000000000000000 $ CacheHIT.sh 5 ... $ tail -3 /etc/STATsrv/access # CacheHIT Add-On command CacheHIT /etc/STATsrv/bin/CacheHIT.sh #
Anti-Spam Filter |
IMPORTANT: There is an anti-spam filter feature, that is always active during data collecting. It rejects any input line having shorter than 4 characters in length. If your newly made stat command prints only one small column of numbers, you need to add leading spaces to take care that the data is accepted by dim_STAT.
MULTI-Line Add-On command integration |
Multi-Line integration is quite similar to Single-Line, except few additional things:
- Line Separator pattern: this is by default "new-line", but in some cases it can be a header (like iostat)
- Attribute Column: very important! As you have several lines per measure you need to distinct these by something (like the "diskname" column in iostat).
- Use In Multi-Host: is more than simply Yes/No, you should use SUM and/or AVG for collected values.
REAL LIFE EXAMPLE... |
To probably even better feel a new Add-On integration process in dim_STAT, let me tell you a one real life story happened this year with one of our customers.. So well, once understood with dim_STAT what goes on the system and storage, customer also decided to bring more light on what is going wrong (or well) on their application too (finally).. Initially they wrote a lot of debug messages into their log files, but nothing useful really to understand what's going wrong.. Also, more data they wrote to the log files - more slower worked application :-) normal, no? So, as the first step they simplified logging and got a single file: /var/tmp/appstats.log. Every N seconds a new line was added into this file and containing just 3 numbers, and the las one (we're interested in) is an avg TPS during the last time period (M seconds (bigger vs N)):And then customer is creating a simple monitoring script AppStats.sh:# tail -5 /var/tmp/appstats.log 10:17 5 20 10:20 7 30 10:23 2 50 10:26 8 30 10:30 1 10 #In few minutes customer integrated this new stat command as dim_STAT Add-On, but... 15 minutes later it still did not collect any data... WHY?...# AppStats.sh 5 10 50 40 20 30 ^C #
Common Error #1 |
The first problem: the output line is very short! and lines shorter than 4 characters are ignored by anti-spam filter (as mentioned before)! All we need is just to add 3 blank characters in the begin of the line. Let' get a look on the script source:Just add 4 spaces into {printf( "%d\n", $3 )} before %d and it'll be ok!#bin/bash #================================================ # AppStats #================================================ while true do v tail -1 /var/tmp/appstats.log sleep $1 done | awk '{ printf( "%d\n", $3 ) }' #================================================The script output now is:#bin/bash #================================================ # AppStats #================================================ while true do tail -1 /var/tmp/appstats.log sleep $1 done | awk '{ printf( " %d\n", $3 ) }' #================================================# AppStats.sh 5 10 50 40 20 30 ^C #
Common Error #2 |
But that's not all! It'll still not work!... Why?.. - the output of this script is not regular yet!... To check it (as well with any other script) just execute it in the same way but piped to the 'more':... 10 minutes later there will be still no any output!... - and it exactly what's happening when STAT-service is trying to send data to the dim_STAT server via process pipe... What is wrong here?.. - the problem is inside of the script its output is self-piped into 'awk' program, and 'awk' itself is not flushing its output - data will stay buffered until the whole 'awk' buffer is not filled.. and only then data will be flushed to the pipe... How to fix it?.. - add fflush instruction into the script (depending on 'awk' version) - change the script in way to have 'awk' call inside of the loop Updated script :# AppStats.sh 5 | moreAs 'awk' is finished on each loop passing, data will be always flushed and entered into the pipe with each iteration.#bin/bash #================================================ # AppStats #================================================ while true do tail -1 /var/tmp/appstats.log | awk '{ printf( " %d\n", $3 ) }' sleep $1 done #================================================
Continue improvement... |
So well, customer copied the new script into /etc/STATsrv/bin on all needed servers and added into the end their /etc/STATsrv/access files:On the dim_STAT the Add-On was integrated as:# AppStats add-on command AppStats /etc/STATsrv/bin/AppStats.shAnd we started to collect some first data... Within first 40 minutes, once customer fully enjoyed to graph their application TPS levels, one of the developers said it will be fine to see on the same time an avg response time!.. And within one hour they extended their log file line with additional value showing avg RespTM. The new script showing one value more:
- Single-Line
- name: AppStats
- 1 column
- shell command: "AppStats %i"
- value: integer, 1st position, name: TPS
And we reintegrated again the same script but describing now 2 columns from output. And it worked just fine!.. Should I say during the next few hours they already wanted to add 3 other new columns! :-))#bin/bash #================================================ # AppStats #================================================ while true do tail -1 /var/tmp/appstats.log | awk '{ printf( " %d %d\n", $3, $4 ) }' sleep $1 done #================================================
And finally... |
Finally it was hard for developers to decide how many stat values they will need on each server, because it depends on application deployment as well on server role.. So, they understood hos to extend their script with any other values, but preferred to avoid Add-On integration step every time they added a new value into their log file.. Well.. nothing impossible :-) The only way to have "dynamic" stat list is to improve AppStats script in way it working like a Multi-Line stat command (like 'iostat' may show more or less disks according your server configuration).. The idea is simple, this output:into multi-line:# AppStats.sh 5 TPS AvgTM Users Active 30 20 200 40 40 20 200 50 ^C #And according to needs, log file may contain on the same time the value names, as well values itself:# AppStats.sh 5 Name Value TPS 30 AvgTM 20 Users 200 Active 40 Name Value TPS 40 AvgTM 20 Users 200 Active 50 ^C #The new script version:# tail -2 /var/tmp/appstats.log 11:12 33 TPS 30 AvgTM 20 Users 200 Active 40 11:22 33 TPS 40 AvgTM 20 Users 200 Active 50This scrips may be integrated now as Multi-Line Add-On, having 2 columns on the output... And even if script will be extended again with other values - they will just extend a list of lines with names and values.#bin/bash #================================================ # AppStats #================================================ while true do echo " Name Value" tail -1 /var/tmp/appstats.log | awk '{ printf( " %-8s %3d\n %-8s %3d\n %-8s %3d\n\n", $3, $4, $5, $6, $7, $8 ) }' sleep $1 done #================================================
Pre-Integrated Add-Ons |
To make your life easier, there are several additional already pre-integrated stat programs (Oracle, Java, Linux, etc).
They are all already installed by default in your dim_STAT server, BUT! not all of them enabled in your STAT-service by default - only commands not needing any additional checking are enabled!... As a rule, check first if the add-on works correctly, by starting it directly from the STAT-service bin-directory on the client side (/etc/STATsrv/bin), and only then enable it via access file (usually a simple uncomment in /etc/STATsrv/access)...
ProcLOAD / UserLOAD |
There are 2 additional psSTAT wrappers:These stats are very useful when you have hundreds or thousands of running processes and you want to study groups of processes or users, instead of the activity of a single process. Example of output :
- ProcLOAD: all output information on-the-fly summarized by process name
- UserLOAD: all output information on-the-fly summarized by user name
# /etc/STATsrv/bin/ProcLOAD.sh 5 PNAME NTOT NACT UsrTM SysTM %CPU VSZ SYSC NLWP VCTX ICTX SIGS InputBLK OutputBLK I/O_CHR STATcmd 312 58 0.00 0.00 0.0 594112 1472 312 180 2 0 0 0 198874 WebX.mySQL 312 58 0.70 0.04 3.4 1142968 8307 312 1066 82 0 0 0 398649 fsflush 1 1 0.00 0.03 0.4 0 0 1 7 2 0 0 155 0 httpd 7 1 0.00 0.00 0.0 18008 10 7 14 0 0 0 0 0 in.rlogind 1 0 0.00 0.00 0.0 2240 0 1 0 0 0 0 0 0 inetd 1 1 0.00 0.00 0.0 5304 1 4 4 0 0 0 0 0 init 1 0 0.00 0.00 0.0 2400 0 1 0 0 0 0 0 0 java 2 2 0.00 0.00 0.1 455448 255 50 413 1 0 0 0 12 mysqld 1 1 0.24 0.12 2.0 62216 21258 315 1058 30 0 0 342 4448475 nfs4cbd 1 0 0.00 0.00 0.0 2360 0 2 0 0 0 0 0 0 picld 1 1 0.00 0.00 0.0 4632 33 6 3 0 0 0 0 0 psSTAT64 1 1 0.02 0.08 0.3 5856 5006 1 3 2 0 0 0 3146 rpcbind 1 0 0.00 0.00 0.0 2880 0 1 0 0 0 0 0 0 sendmail 2 1 0.00 0.00 0.0 15456 10 2 3 0 0 0 0 0 svc.startd 1 1 0.00 0.00 0.0 10200 9 13 4 0 0 0 0 672 syseventd 1 0 0.00 0.00 0.0 2552 0 14 0 0 0 0 0 0 ttymon 2 0 0.00 0.00 0.0 4648 0 2 0 0 0 0 0 0 utmpd 1 1 0.00 0.00 0.0 1280 0 1 1 0 0 0 0 0 vold 1 0 0.00 0.00 0.0 2912 0 6 0 0 0 0 0 0 wrapper-solari 1 1 0.00 0.00 0.1 3040 237 2 168 2 0 0 0 0 xntpd 1 1 0.00 0.00 0.0 2320 25 1 5 0 5 0 0 0 ypbind 1 0 0.00 0.00 0.0 2360 0 1 0 0 0 0 0 0 ^C
Special Solaris 10: ZoneLOAD / PoolLOAD/ TaskLOAD/ ProjLOAD |
Four psSTAT_10 wrappers were added, that are specific to Solaris 10 and later:These stats give you more extended information comparing to the standard 'prstat'. Following some more details about output columns (given for ZoneLOAD, but valid for others too :-)) ZoneLOAD.sh - a shell script wrapper for psSTAT command to collect all data pre-grouped per Solaris Zone (psSTAT option: -M zone). Description of values printed per zone (each value is printed per a given time period):
- ZoneLOAD : all output information on-the-fly grouped by zone id
- ProjLOAD : the same, but grouped by project id
- TaskLOAD : the same, but grouped by task id
- PoolLOAD : the same, but grouped by pool id
The last 3 values are very curious :-) because on time I've needed it I did not find any document describing what they are meaning, so I've based my naming on the description given within a /proc structure header files - these values are helping in some cases without involving any DTrace script to understand which process (or Zone in the current case) is doing more I/O operations than others...
- N_total -- current number of all processes running within a zone
- N_activ -- current number of processes being *activewithin a zone per a given time period
- UsrCPU -- total User CPU *timeconsumed within a zone per a given time period
- SysCPU -- total System CPU *timeconsumed within a zone per a given time period
- CPU% -- percent of CPU Busy% within a zone - this value will depend on were or not some CPU assigned to the zone, so it's still better to monitor a CPU% usage within a zone via "vmstat" command!
- VSize -- total "virtual memory size" in KB of all processes running within a zone (be aware each process within its VSZ value may already include several shared libraries or shared memory segments (SHM), and these *same* shared objects may be accounted several times within a total VSize...
Currently there is no any "simple" way to say you how much memory is used by a group of processes (for ex. Oracle processes, etc.) - even there is still possible to write a script which will account each shared object only once, such script will use a significant amount of CPU time..
So, nobody is perfect, but there is a room for improvement! :-))
- SysCalls -- total number of all system calls/sec within a zone
- N_lwp -- current number of LWP (kernel threads) running within a zone
- Vol_CTX -- total number of all volоntary context switch/sec within a zone
- InVol_CTX -- total number of all involоntary context switch/sec within a zone
- Sigs -- total number of all signals/sec within a zone
- I_Blks -- total number of all input I/O blocks/sec within a zone
- O_Blks -- total number of all output I/O blocks/sec within a zone
- IO_Chrs -- total number of all I/O character operations/sec within a zone
netLOAD |
The netLOAD wrapper is to monitor Solaris network activity. This tool is already for a long time included into dim_STAT's STAT-service. And since v.8.0, netLOAD monitors all network interfaces present in the system (including virtual and loopback). If some indicators are not populated by device drivers, a '-1' value is presented instead. Also, a new '-I' option is added: You may give a fixed list of network interfaces you want to monitor (run '/etc/STATsrv/bin/netLOAD' for more details). In STAT-service, netLOAD is integrated via a 'netLOAD.sh' script, to provide an easy way to change an option. Example of output :# /etc/STATsrv/bin/netLOAD.sh 5 Name IBytes/s OBytes/s Ipack/s Opack/s Ierr/s Oerr/s Col/s Bytes/s Pack/s Nocanput lo0 -1.0 -1.0 0.4 0.4 0.0 0.0 0.0 0.0 0.8 0 ce0 26300.6 3840.0 105.2 64.0 0.0 0.0 0.0 30140.6 169.2 0 ce1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 Name IBytes/s OBytes/s Ipack/s Opack/s Ierr/s Oerr/s Col/s Bytes/s Pack/s Nocanput lo0 -1.0 -1.0 0.8 0.8 0.0 0.0 0.0 0.0 1.6 0 ce0 27624.4 2688.0 77.2 44.8 0.0 0.0 0.0 30312.4 122.0 0 ce1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
UDPstat |
The UDPstat is a wrapper around of "netstat -s" command on Solaris, and made to monitor a UDP traffic on the system. While it's printing all main counters (In/Out traffic, In/Out errors), it's particularly interesting to analyze Input Overflows (and Input Checksums as well). option. Example of output :# /etc/STATsrv/bin/UDPstat.sh 5 UDP-stat Tot# Delta Val/s udpInDatagrams 65700 0 0.00 udpInErrors 0 0 0.00 udpOutDatagrams 68321 0 0.00 udpOutErrors 0 0 0.00 udpNoPorts 3514281 0 0.00 udpInCksumErrs 0 0 0.00 udpInOverflows 0 0 0.00 none 0 0 0 UDP-stat Tot# Delta Val/s udpInDatagrams 65900 200 40.00 udpInErrors 0 0 0.00 udpOutDatagrams 68321 0 0.00 udpOutErrors 0 0 0.00 udpNoPorts 3514281 0 0.00 udpInCksumErrs 0 0 0.00 udpInOverflows 0 0 0.00 none 0 0 0
HAR |
HAR - is the Hardware Activity Reporter tool for Solaris 8 and up. Starting with Solaris 8, Sun had begun to deliver public interfaces for the SPARC and x86 hardware performance counters --libcpc, to access CPU counters and libpctx, to track a process. HAR differs from other tools in the fact that it combines the low-level counts into higher-level metrics more useful to application programmers. Application programmers are typically interested in the following metrics: CPI, FLOPS, MIPS, address bus percentage utilization, cache miss rates, branch and branch miss rates, and stall rates. These metrics help in assessing the fair usage of available processing units, locating bottlenecks and guiding tuning efforts, when needed... Check this valuable article to discover everything about this powerful tool!..
- NOTE : by default HAR add-on is disabled within a Solaris STAT-service, why? - to get a CPU counters data Solaris library functions requiring an exclusive access to the chip - for a very short time, but exclusive anyway - so any other process running on the requesting CPU will be moved to another CPU and get some unwanted side effects.. That's why I'm not suggesting to run HAR for a long period on your production system until you're not fully understanding how it works..
Oracle Add-Ons |
NOTE : Originally all these scripts were made as examples to show how easily we may collect data even from Oracle. But with a time people started to use them more and more (while I still expected, inspired by examples, they'll add something more optimal :-)). For example, current scripts all the time connecting/disconnecting to/from the database, and collector keeping connection opened will be more optimal, etc... But well - it's still better then nothing! :-)) Anyway, all following wrappers are needing a correctly setting of Oracle environment for the "Oracle" user. By default the user's name is oracle , but it may be changed inside of the scripts. It means that:should work correctly and give you a SQL> prompt for the right database instance. Then you may check that:# su - oracle -c "sqlplus /nolog"prints you the current number of Oracle sessions and current exec/commit activity. If it doesn't work - fix it before to go further :-)) (BTW, there is a dim_STAT user group where you may always ask questions - http://groups.google.com/group/dimstat ) Oracle Add-Ons:# /etc/STATsrv/oraEXEC.sh 5By default all these Add-Ons are already enabled within dim_STAT database, and all you need is just to uncomment them within a STAT-service access file (/etc/STATsrv/access) and start a new collect including Oracle stats :-)) And of course you may add any other one. Some people even collect statspack reports directly into dim_STAT!
- oraIO : Oracle I/O stats for data/temp files
- oraEXEC : Oracle SQL QueryExecutions/sec, Commits/sec, Number of Sessions
- oraLATCH : Oracle latch stats
- oraSLEEP : Oracle latch sleeps stats
- oraENQ : Oracle enqueue stats
MySQL Add-Ons |
mysqlSTAT - is monitoring a "show status" output. Each output variable is presented with 3 values:And it's up to you to choose from the list of variables what kind of information you're interesting in :-) To work properly this add-on needs to be configured - edit your /etc/STATsrv/bin/mysqlSTAT.sh file to setup user/password and host/port information.
- current value of a variable
- delta between current and previous value
- value of delta/sec
mysqlLOAD - is oriented multi-host monitoring and presenting a compact list of data from "show status" output:This add-on also needs to be configured to work properly - edit your /etc/STATsrv/bin/mysqlSTAT.sh file to setup user/password and host/port information.
- On -- MySQL Server On-Line flag (0 or 1)
- Sessions -- number of currently connected user sessions (threads)
- InnDirty -- amount of dirty pages in InnoDB
- InnoFree -- amount of free pages in InnoDB
- KeyDirty -- amount of dirty pages in MyISAM Key buffer
- OpFiles -- number of currently open files
- OpTables -- number of currently open tables
- ByteRx/s -- received bytes/sec via network
- ByteTx/s -- sent bytes/sec via network
- Commit/s -- number of COMMIT requests/sec
- Delete/s -- number of DELETE requests/sec
- Insert/s -- number of INSERT requests/sec
- Select/s -- number of SELECT requests/sec
- Update/s -- number of UPDATE requests/sec
- InnDsy/s -- InnoDB Data Sync/sec
- InnDrd/s -- InnoDB Data Read/sec
- InnDwr/s -- InnoDB Data Write/sec
- InnLwr/s -- InnoDB Log Write/sec
- InnLsy/s -- InnoDB Log Sync/sec
- Key_Rd/s -- MyISAM Key Read/sec
- Key_Wr/s -- MyISAM Key Write/sec
- Query/s -- Query/sec execution
- AbrtClnt -- aborted clients (delta)
- AbrtConn -- aborted connections (delta)
- Connects -- number of recent connects (delta)
- SlowReqs -- number of slow requests (delta)
- TabLckWt -- table lock waits (delta)
- Rollback -- called rollbacks (delta)
innodbSTAT - is monitoring a "show innodb status" output (or "show engine innodb status" since MySQL 5.5). Working similar to "mysqlSTAT", but list of variables is based on InnoDB status only. To work properly this add-on needs to be configured - edit your /etc/STATsrv/bin/innodbSTAT.sh file to setup user/password and host/port information.
innodbMUTEX - is monitoring a "show mutex status" output (or "show engine innodb mutex" since MySQL 5.5). Printing the InnoDB MUTEX related stats, already ready to print not only "waits" (as a standard), but also more detailed data (available via compiling of InnoDB with debug options or just hacking (like counters, spins, real waited time on each mutex, etc.)). To work properly this add-on needs to be configured - edit your /etc/STATsrv/bin/innodbMUTEX.sh file to setup user/password and host/port information. Example of output :NOTE: the -1 is printed if information is not available.# /etc/STATsrv/bin/innodbMUTEX.sh 5 MUTEX count count/s spin_waits spin_waits/s spin_rounds spin_rounds/s os_waits os_waits/s os_yields os_yields/s os_wait_times os_wait_times/s db-server-online 1 1 1 1 1 1 1 1 1 1 1 1 buf/buf0buf.c:1122 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 fil/fil0fil.c:1535 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 srv/srv0srv.c:973 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 combined_buf/buf0buf.c:818 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 log/log0log.c:830 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 btr/btr0sea.c:181 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 combined_buf/buf0buf.c:820 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 MUTEX count count/s spin_waits spin_waits/s spin_rounds spin_rounds/s os_waits os_waits/s os_yields os_yields/s os_wait_times os_wait_times/s db-server-online 1 1 1 1 1 1 1 1 1 1 1 1 buf/buf0buf.c:1122 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 fil/fil0fil.c:1535 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 srv/srv0srv.c:973 -1 -1 -1 -1 -1 -1 2411 482.200012 -1 -1 -1 -1 combined_buf/buf0buf.c:818 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 log/log0log.c:830 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 btr/btr0sea.c:181 -1 -1 -1 -1 -1 -1 411 82.199997 -1 -1 -1 -1 combined_buf/buf0buf.c:820 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 ^C
innodbIOSTAT (deprecated, works only with old InnoDB) - is an adoption of DTrace script published by Neel but with one additional feature: it detects automatically if mysqld is not running anymore or started/restarted again. And of course you may run it only on the system supporting DTrace :-)
PostgreSQL Add-Ons |
pgsqlSTAT is monitoring a "pg_stat_bgwriter" and "pg_stat_database" output. Each output variable is presented with 3 values:And it's up to you to choose from the list of variables what kind of information you're interesting in. To work properly this add-on need to be configured - edit /etc/STATsrv/bin/pgsqlSTAT.sh file to setup user/password and host/port information. pgsqlLOAD is oriented multi-host monitoring and presenting a compact summary (single line) from "pg_stat_bgwriter" and "pg_stat_database" output:
- current value of a variable
- delta between current and previous value
- value of delta/sec
- some values are also presented per database name
Please, read an excellent howto written by Greg Smith to see how analyze this data - http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm To work properly this add-on also need to be configured - edit /etc/STATsrv/bin/pgsqlLOAD.sh file to setup user/password and host/port information.
- On -- Server On-Line flag (1/0)
- Sessions -- number currently connected user sessions (backends)
- Commit/s -- number of executed COMMITs/sec
- Rollback -- number of executed rollbacks (delta)
- B_Read/s -- Block reads/sec
- B_hit/s -- Block read hit/sec
- RowSnd/s -- Rows sent/sec
- RowFch/s -- Rows fetched/sec
- RowIns/s -- Rows inserted/sec
- RowUpd/s -- Rows updated/sec
- RowDel/s -- Rows deleted/sec
- ChpTimed -- Checkpoints involved by timeout (delta)
- ChptReqs -- Checkpoints involved by request (delta) - probably out of checkpoint segments
- BuffChpt -- Buffers written by checkpoint (delta)
- BufClean -- Buffers cleaned by background writer (delta)
- MxWClean -- number of times Max Written level was reached by background writer (delta)
- BufBkend -- Buffers written by backends (delta)
- BufAlloc -- Allocated buffers (delta)
jvmSTAT |
This is a wrapper to bring information from the "jvmstat" package. This jvmstat is now officially integrated with the JVM 1.5 distribution or later (and called "jstat" now). The jvmSTAT wrapper is giving a way to monitor ALL running JVMs on your server on the same time! To run jvmSTAT properly you need first of all to have jdk 1.5 (or later) installed on your host and check it works correctly on your server:If you don't see your running JVM(s) within "jps" output - try to fix it first before continue on next steps :-) - normally it should work with any JVM since Java version 1.4.2. To get the 'jvmSTAT.sh' wrapper working:# cd /usr/jdk15/bin # jps ... #Then start to collect JvmSTAT data :-)
- edit the /etc/STATsrv/bin/jvmSTAT.sh file (from STAT-service) on each client machine, to set the right path environment for JAVA_HOME pointed to the jdk 1.5 home. (ex: JAVA_HOME=/usr/jdk15)
- enable jvmSTAT in STAT-service on each client (uncomment jvmSTAT in /etc/STATsrv/access file)
- before starting any new collect, including jvmSTAT, be sure that the jvmSTAT Add-On is already installed (Add-On interface from Main Page)
jvmGC |
This one still exists, but I don't see any reason why anyone would still use it, jvmSTAT is the better solution for any kind of "GC" collection. This wrapper collects on-the-fly information about GC (garbage collector) activity of any JVM running with the "-verbose:gc" option. Before JVM 1.4.2 the only possible way to get information on the GC activity of the standard JVM was dump of the log output, so this wrapper is simply based on log file scanning. Usage: If you want to see GC activity of one of your JVMs, running on server "J". 0) Install "jvmGC" via the Add-Ons page. 1.) jvmGC uses the $LOG file for data input (you may change name and permissions according to your needs (default filename: /var/tmp/jvm.log), modify if needs on the server "J" STAT-service side (/etc/STATsrv/bin). 2) use the web interface to start the collect including "jvmGC" 3) on server "J" add the "-verbose:gc" option to java in your starting application script and redirect output into the application log file (for ex. app.log) 4) once you want to monitor your JVM:
$ tail -f app.log | /etc/STATsrv/bin/grepX GC >> /var/tmp/jvm.log 5) observe jvmGC output data and have fun!
LINUX specific STATs |
Linux Add-Ons:For details, see the following special Linux note...
- LvmSTAT (Linux vmstat)
- LcpuSTAT (Linux mpstat)
- LioSTAT (Linux iostat)
- LnetLOAD (Linux netLOAD)
- LpsSTAT (Linux psSTAT)
- LprcLOAD (Linux ProcLOAD)
- LusrLOAD (Linux UserLOAD)
Administation tasks |
At any moment you can: Edit Add-On Description - in case you make a mistake in any value name, or in a shell command corresponding to your Add-On you may quickly repair it via Edit interface (however you cannot change anymore MySQL table column names or datatypes - if the error was here, you're better to recreate this Add-On one again ;-)) Save Add-On Description - this will give you an ASCII text file which may be reused for another database. This way you may share with others any new findings and any new tools you found useful! Restore Add-On Description - from information on a given Description file, re-create all Add-On required database structures and fill all information required for it to function correctly. WARNING: if you're already using the same Add-On in the current database, all previous data will be destroyed! Delete Add-On - removes the Add-On and all corresponding data from the current database...
<< | [up] | >> |