by Dimitri dimitri.kravtchuk@france.sun.com |
Overview... |
dim_STAT is a tool for general/detailed performance analyze/monitoring of Solaris and Linux systems.Main features are:
- Web based interfaceBy default dim_STAT interfaces Solaris stats (SPARC ans x86):
- vmstat
- mpstat
- iostat
- vxstat
- netstat
- psSTAT, ProcLOAD, UserLOAD (processes an users)
- netLOAD (extended network stats)as well add-on extentions for both Solaris and Linux/x86:
- MEMSTAT (Solaris)
- har2, har3 (Solaris SPARC only)
- jvmSTAT (JVM GC Activity)
- oraEXEC, oraIO, oraSLEEP (Oracle activity)
- LvmSTAT (Linux vmstat)
- LcpuSTAT (Linux mpstat)
- LioSTAT (Linux iostat)
- LnetLOAD (Linux netLOAD)
- LpsSTAT (Linux psSTAT)
- LprcLOAD (Linux ProcLOAD)
- LusrLOAD (Linux UserLOAD)
- and any other program you want to add...CPU usage of dim_STAT is very low and even less important than standard proctool, top, or perfbar. So given performance vision is more closest to reality...
General View |
Just to get an idea about how it works. So, each machine you want to monitor in real-time should run a special STAT-service daemon (client). Via your prefered Web browser you start collectors to communicate with client(s). All collected information saved in database and may be analyzed later or on the same time as data arriving. Generally all analyzing/ reporting/ administration/ etc. tasks are doing via Web browser. All Web interface is developed an running on WebX (my own tool)...
Freeware End User License |
LICENSE This software is released as "freeware". You are encouraged to redistribute unmodified copies of this software, as long as no fee is charged for the software, directly or indirectly, separately or as part of ("bundled with") another software product, without the express permission of the author. You may not attempt to reverse compile, modify or disassemble the software in whole or in part. SUPPORT, BUG REPORTS, SUGGESTIONS You are encouraged to send bug reports and suggestions. This software is not supported. Hence, your technical questions may or may not be answered. Questions, bug reports, comments and suggestions should all be sent to: Dimitri KRAVTCHUK (dimitrik@free.fr or dimitrik@sun.com) or into actual dedicated mail-list (dim_STAT@sun.com). DISCLAIMER ANY USE BY YOU OF THE SOFTWARE IS AT YOUR OWN RISK. THE SOFTWARE ARE PROVIDED FOR USE "AS IS" WITHOUT WARRANTY OF ANY KIND. TO THE MAXIMUM EXTENT PERMITTED BY LAW, THE AUTHOR (**) DISCLAIMS ALL WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. THE AUTHOR (**) IS NOT OBLIGATED TO PROVIDE ANY UPDATES TO THE SOFTWARE. **Dimitri KRAVTCHUK
Installation |
dim_STAT installation package is delivered generally as TAR archive (dim_STAT.tar) or already "untared" on CD support.Before install: Verify your disk space - you will need ~50MB for initial install, mostly to host Web Server and Database Server data. Database volume will grow according size of your future STAT collections and Web directory may grow with your reports, so be large at once and reserve enough space for your data...
During installation: new user "dim" and group "dim" will be created. User "dim" is the owner of dim_STAT database and Web server. In case you have some special rules or restrictions on your system, you may create it by yourself before, as well choose another user and group names according your system policy.
INSTALL.sh |
As root user, unload the tar archive somewhere and start installation script:# cd /tmp # tar xvf /path_to_tar/dim_STAT.tar # cd dim_STAT-INSTALL # # INSTALL.shDuring installation you'll be asked to confirm your host IP address (found automatically), host and domain name, verified that user "dim" already exist on the system (in other case it'll be created), asked about WebX and home directory (Web Server, Database Server, Administration and Client scripts, etc.), Port numbers...NOTE: Since v.7.0 installation is simplified:
- no more /apps link created
- dim_STAT user owner may be different from 'dim'
- only 3 application directory:
- WebX (def: /opt/WebX)
- Home (def: /apps)
- Temp (def: /tmp)
If you are not sure about meaning of some values - leave them by default, the main parameter during installation is directory names.
NOTE: WebX is a main interpreter (or execution engine, etc.), so it interprets all application script files and _absolutely_ need fixed/trusted root directory. Otherwise, anyone may execute whatever you want from your machine (ex. /etc/passwd to crack logins, etc.). So, it's a first step protection for its root directory: you may choose one from 3 available names (hey, 3 choices anyway! better then one :)) Also, WebX engine itself is very small (few MB) and not growing, so I don't think it'll makes any problem for you...
After (default) installation, dim_STAT software will be distributed in the following way on your system:
- /WebX, /opt/WebX or /etc/WebX - WebX main directory: you choose from 3 possible placements - /apps - default dim_STAT home directory | +-- /ADMIN - administration scripts (start/stop dim_STAT Server, BatchLOAD, etc.) | +-- /mysql - MySQL database server main directory | +-- /httpd - Apache Web server directory | +-- /client - client collect script(s) | +-- /Java2GIF - Java applet graph to GIF convertor | +-- /htmldoc - HTML to PDF converting tool | +-- ... - there may be other directories depending on dim_STAT release :))
NOTE: in all next examples let's suppose your home directory is '/apps' to simplify things.
Starting Web and Database servers |
As you saw before, administation scripts are placed in /apps/ADMIN:# cd /apps/ADMIN # dim_STAT-Server startTo stop servers:# cd /apps/ADMIN # dim_STAT-Server stopNOTE: since v.7.0 global dim_STAT-Server script replaced separated httpd/mysql scripts. This global script now checks before stop/start action if there are any active collect present and restart them automatically during next startup. Also, if shutdown was not done properly, datafile check will be executed on all databases automatically before any action...At any moment you may see any active connections in the database:
$ su - root # /apps/mysql/bin/mysql -S/apps/mysql/data/mysql.sock.YOUR_PORT_NO mysql> mysql> show processlist; +------+------+-----------+----------+---------+-------+-------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +------+------+-----------+----------+---------+-------+-------+------------------+ | 3 | dim | localhost | Mind | Sleep | 18 | NULL | NULL | | 4 | dim | localhost | Mind | Sleep | 17 | NULL | NULL | | 5 | dim | localhost | Mind | Sleep | 2 | NULL | NULL | | 6 | dim | localhost | Mind | Sleep | 1 | NULL | NULL | | 7 | dim | localhost | Mind | Sleep | 2 | NULL | NULL | | 8 | dim | localhost | Mind | Sleep | 16 | NULL | NULL | | 9 | dim | localhost | Mind | Sleep | 104 | NULL | NULL | | 10 | dim | localhost | Mind | Sleep | 1 | NULL | NULL | | 11 | dim | localhost | Mind | Sleep | 0 | NULL | NULL | | 53 | dim | localhost | UPC | Sleep | 108 | NULL | NULL | | 54 | dim | localhost | UPC | Sleep | 103 | NULL | NULL | | 56 | dim | localhost | UPC | Sleep | 115 | NULL | NULL | | 57 | dim | localhost | UPC | Sleep | 118 | NULL | NULL | | 58 | dim | localhost | UPC | Sleep | 112 | NULL | NULL | | 59 | dim | localhost | UPC | Sleep | 105 | NULL | NULL | ...and even kill any of them (be very careful, however :))mysql> kill 57; mysql> quit Bye # #
MySQL is very easy in administration. However, based on the past user's experience, here are some tips...
Use separated databases as soon as you can: it's much more easier for administration, avoids any possible future activity conflicts, etc.
Limitation in number of connections: each MySQL connection uses 5 file descriptors. It means, with default (often) 1024 per process file description limitation we can't create more than ~200 connections on multi-threaded MySQL server. (Note: each STAT is a single connection). In case you need more connections (several hosts, many stats, etc), check first values of your /etc/system parameters : rlim_fd_cur and rlim_fd_max, next in file /apps/mysql/mysql.server replace default 200 value by a new one.
Accident Power OFF on your machine: It's very possible some of MySQL index files will be corrupted. Since v.7.0, dim_STAT-Server script will run data check automatically. But it's always useful to know how to do it by hand. Stop mysql and recheck/rebuild your indexes first and restart your database server next:
# cd /apps/ADMIN; # stop.MySQL # cd /apps/mysql/data/Your_database_name # /apps/mysql/bin/myisamchk -r *.MYI # cd /apps/ADMIN; # start.MySQLNo more disk space: just add space if it possible :)). Collect part in dim_STAT is done in way to "keep flow" in any case, so nothing will be stopped in case of errors. Once you added space, collects will continue (you probably will get just some holes during this period :))
Get a backup/copy of your collects in the fastest way: one of the great features of MySQL is its support of cross-platform data compatibility. It means, the same database files may be moved from Solaris machine and successfully reused on Linux laptop (for example). And in any case, copy of the whole database to another machine will be much more faster than export and following import collects via flat files (except you want to move just a very small amount of data from a very big database).
Fine, but can we do it ONLINE? - Yes! :)) In case nobody use your database, you may just stop and start database server to be sure to flush all data and close files properly (there are, of course, some less radical solutions, but this one is more evident for anybody :)). In case your database is under continuous usage - just stop database server (as it was described below, no connection will be lost), finish your copy, and restart the server next:# cd /apps/ADMIN; # stop.MySQL # cp -rp /apps/mysql/data/Your_Database_Name /your_backup # start.MySQLDelete database: there is no way to delete database via Web interface (I don't like deleting generally :)), delete by error is so common thing... So, if you really need to delete your database the only way is:
# rm -rf /apps/mysql/data/Your_Database_NameSeveral MySQL instances on the same host: was one of the hot problem as conflicting with already installed and running servers you make new troubles in existing system. I've found solution to completely isolate dim_STAT database from existing instances but the price is - more complexity in simple things. Tool uses its own parameters for TCP/IP port and UNIX sockets, so, for example, to connect locally to your database server instead of usual:
you should use:# /apps/mysql/bin/mysql DatabaseName
where PortNO is TCP/IP port number given to your database server during installation.# /apps/mysql/bin/mysql -S/apps/mysql/data/mysql.sock.PortNO DatabaseName
Migration from any old dim_STAT version to the new one |
Migration procedure is quite easy:
- Stop all activity on your current dim_STAT installation
- Stop HTTPD and MySQL servers
- Backup all your databases from '/apps/mysql/data/' (see below) except: dim_00, mysql and dim
- mysql: system database, please, don't play with it :)
- dim_00: is a reference database and changing with every release
- dim: is a 'Default' database, and if you _really_ need it - rename it before backup!- Install new dim_STAT distribution
- Restore your backuped data into '/apps/mysql/data'
- Start dim_STAT-Server
- Enjoy :))NOTE: old database should be seen as before and work correctly, but if you want to get an advantage of the all new features coming within new version - create new database and start new collect(s).
On-Line STAT Collecting Service |
STAT-service was introduced in dim_STAT since v.3.0. and provides a simple, stable and secure way for on-line STAT collecting from Solaris/SPARC, Solaris/x86 and Linux/x86 servers. It's shipped within dim_STAT distribution (dim_STAT-INSTALL/STAT-service directory) as normal Solaris packages or TAR archives for manual integration. STAT-service should be installed on every machine supposed to be under live-monitoring (you should be "root" user, of course :)) Package install (".pkg" file):# pkgadd -d STATsrv.pkgManual install (".tar" file):# cd /etc # tar xvf /path_to/STATsrv.tar # ln -s /etc/STATsrv/STAT-service /etc/rc2.d/S99STATsrv # ln -s /etc/STATsrv/STAT-service /etc/rc1.d/K99STATsrv # ln -s /etc/STATsrv/STAT-service /etc/rc0.d/K99STATsrv # ln -s /etc/STATsrv/STAT-service /etc/rcS.d/K99STATsrvIncluded software installed into special /etc/STATsrv directory (home directory of STAT-service). The contents of this directory:/etc/STATsrv/ STAT-service -- script to start/stop service daemon, also defines port number to listen (def:5000) access -- access control file /bin -- contains extended STAT programs/scripts /log -- contains all logged information about service demandsNext step: start service daemon:# /etc/STATsrv/STAT-service startSchema of communication with STAT-service is very simple:
1.) dim_STAT connecting to STAT-service of the machine under monitoring...
2.) if service is not available: wait a time-out and go to 1.) or exit if STAT collect is stopped during this time...
3.) dim_STAT asking about a stat command it needs...
4.) if there is no permissions for this command or command is not found: close "command" connection with error message...
5.) dim_STAT collects data keeping any time-shifting due previous time-outs...
6.) if TCP connection is broken: go to 1.)
7.) if STAT is stopped: close connection and exit... *.) if there was no activity during "auto-eject" timeout - close connection and goto 1.)
As you see, this schema is quite robust and will work after cluster switching, network corruptions, rebooting, etc. Collections may be started once and for a long period. In case you need collect only during specific time intervals - you may just start and stop STAT-service via "cron" or any other similar tool...
Note: it seems during halt of system (ex:power off of working machine) TCP/IP connections stays sticked and never receive error code... In this case collect should be broken via "auto-eject" timeout. However, auto-eject may happens also due mini-hang on system or simply on the stat program, in this case you'll see holes in your collects, so take care during interpretation :))
Here is an example of STAT-service access control file. As you see, you may limit the number of stat commands accessible for each machine. This task may be done by host administrator and may be completely independent.Note: access file all the time checked by STAT-service daemon, so you never need to restart service to activate your modifications.
# # STAT-service access file # # Format: # ... # command name fullpath # ... # access IP-address # ... # command name fullpath # ... # # By default all machines in the network may access to STAT-services # # Keyword "access" make access restriction by IP-adress for all following # commands till next "access" section. # # For example: # # ==================================================================== # # # # Any host may access to vmstat and mpstat collections # # # command vmstat /usr/bin/vmstat # command mpstat /usr/bin/mpstat # # # # Only machines 129.157.1.[1-3] may access netLOAD collections # # # access 129.157.1.1 # access 129.157.1.2 # access 129.157.1.3 # command netLOAD.sh /etc/STATsrv/bin/netLOAD.sh # # # # Only machine 129.157.1.1 may access psSTAT collections # # # access 129.157.1.1 # command psSTAT /etc/STATsrv/bin/psSTAT # # # ==================================================================== # # command vmstat /usr/bin/vmstat command mpstat /usr/bin/mpstat command netstat /usr/bin/netstat command vxstat /usr/sbin/vxstat command memstat /etc/STATsrv/bin/memstat command tailX /etc/STATsrv/bin/tailX command ioSTAT.sh /etc/STATsrv/bin/ioSTAT.sh command netLOAD.sh /etc/STATsrv/bin/netLOAD.sh command psSTAT /etc/STATsrv/bin/psSTAT.sh command bsdlink /etc/STATsrv/bin/bsdlink.sh command bsdlink.sh /etc/STATsrv/bin/bsdlink.sh command harSTAT.sh /etc/STATsrv/bin/harSTAT.sh command harSTAT /etc/STATsrv/bin/harSTAT.sh command harSTATus3 /etc/STATsrv/bin/harSTATus3.sh command harSTATus3.sh /etc/STATsrv/bin/harSTATus3.sh command T3stat /etc/STATsrv/bin/T3stat.sh command T3stat.sh /etc/STATsrv/bin/T3stat.sh command sysinfo /etc/STATsrv/bin/sysinfo.sh command SysINFO /etc/STATsrv/bin/sysinfo.sh
First-Level Security |
Main point: ANY SECURE SYSTEM IS NEVER SECURE ENOUGH...
The question is only in what you'll consider ENOUGH for you :))
Any way, security point was so often involved during discussions with our engineers and customers than I cannot leave it without attention...
For paranoia-users: there is Linux version of dim_STAT, and if you really need maximum protection you can always spend money for small dedicated Linux PC, run dim_STAT on it and protect any access with firewalls, etc. etc.
On my experience, I may just suggest to protect Web server access in case if somebody just by error stop or suspend you some active collects. For this kind of first-level access protection good candidate should be Apache's ".htaccess". For more detailed information, please, refer to Apache documentation. But in few words, just to make it work for dim_STAT:
1.) via /apps/httpd/bin/htaccess create /apps/httpd/etc/.htpasswd file and add any pairs user/password you need
2.) create ".htaccess" file with context:
AuthName "Welcome to dim_STAT Host" AuthType Basic AuthUserFile /apps/httpd/etc/.htpasswd require valid-user3.) copy ".htaccess" file into /apps/httpd/home/docs and /apps/httpd/home/cgi-bin
4.) try to connect to your web server now and check access user/password - that's all! ;-)
Example:
$ /apps/httpd/bin/htpasswd Usage: htpasswd [-c] passwordfile username The -c flag creates a new file. $ /apps/httpd/bin/htpasswd -c /apps/httpd/etc/.htpasswd login1 Password: ... $ vi /tmp/.htaccess $ cat /tmp/.htaccess AuthName "Welcome to dim_STAT Host" AuthType Basic AuthUserFile /apps/httpd/etc/.htpasswd require valid-user $ $ cp /tmp/.htaccess /apps/httpd/home/cgi-bin $ cp /tmp/.htaccess /apps/httpd/home/docs
Main Page |
Now, installation is finished, Database and Web servers are running. Be sure STAT-service is installed and running on all servers you want to monitor... (Note: you'll be surprised, but 90% of cases with initial troubles are due to this stupid thing - people just forgetting to start STAT-service :)) Once it's done, you are ready to open your preferred Web browser (Java enabled or not - it's up to you) and connect to the dim_STAT Web server. Index page contains some links to documentation, presentation, tool history, etc., but the link you'll need to click is: "Main Page"... (do you believe some people meet a problem to find it? :))
Here is a small snapshot of the screen you'll see (Note: blue/red histogram has nothing in common with current host activity, it's just an example of working Java applet (since v.7.0 it's converted into image to simplify and speed-up main interface, if you want to be sure Java plug-in is working correctly within your browser and you may click on the 'Preference' link and then on Java Applet testing link). Since v.7.0 you have a choice for graph generation: interactive Java applet or static PNG image. Until v.8.0 for Solaris/SPARC users there was an old Netscape Navigator 4.5 shipped in /apps/httpd directory - I don't ship it anymore as there are at least FireFox and Opera are available for free to download from Internet. So well - 'Main Page' :))
As you already supposed, the Main Page will re-group all main actions... And you're right! :))
I'll not present action by action, but rather functionality by functionality, in order of operation. However, the most short working cycle should be composed at least of:
- Starting STAT collect
- Analyze/Monitor collecting data
- Stop STAT collect
Few words about User Interface - don't be surprise you did not find any "Back" button once you enter somewhere from the Main Page: there in no one! You have to use your browser's navigation back button for it, and it's not because I'm just lazy :)) The reason is very simple: dim_STAT may use Java applet to present data in graphical mode, but it seems for every showing graph-applet Web browser is running dedicated JVM for each one, and if you never come back in navigation - all JVMs will stay keeping suspended in the browser memory till it will crash with out of memory error... To prevent crashing I'm forcing you to use browser's
button. Since v.7.0 you'll see a small toolbar in the top of your page presenting:
- Currently used Database Name
- Short links into Home/ Preferences/ Log AdminNavigation become more simple, but be aware about running applets in case you're using them!
ERROR: No X_ROOT configuration for SERVER: ... |
Sometime instead of Main Page you can see this kind of error message...Don't worry, nothing going wrong, but it seems your DNS translation simply did not match configuration settings. Just go into WebX home directory (ex: /opt/WebX) and open "x.config" file with your preferred text editor. Find the line containing in first column your host name. Duplicate this line and replace in one of them your hostname:port pair by the string given in error message after "SERVER:". Save file and try to connect again. It should work immediately!
Example: Error Message: "No X_ROOT configuration for SERVER: harms.France.Sun.COM:88"
- vi /opt/WebX/x.config - duplicate line with "harms:88" - replace in on of the lines "harms:88" by "harms.France.Sun.COM:88" - save file - reload the Main Page in your browserNote: X_ROOT is a one of WebX configuration parameters. As WebX is an interpreter, there should be a way to protect of "interpreting" something else than application pages (ex: /etc/passwd). X_ROOT points WebX into its main "root" directory, so only pages coming from this given directory tree may be executed, nothing else...
Web Browsers |
Since v.7.0 you may use any web browser having PNG image format support (mostly true for all available browsers).However, if you prefer interactive graphs with dim_STAT's Java Applet, you'll need to use a browser supporting Java (plug-in or integrated). Here are few notes about already used/observed programs...
FireFox - most stable web browser for today, works perfectly with Java applets and maybe the best choice actually!There are some others, but general rule is if instead of graphics you'll see error message "Browser BUG" - upgrade/patch your browser or move to any working one...Opera - seems to work perfectly since v.5 (and I'm using a lot), there is even working(!) and free(!) version for Solaris! In v.8.5 sometime loosing mouse focus with applet, but generally ok.
Konqueror - working out of the box (generally), maybe the best choice for KDE-lovers :))
Mozilla - you should upgrade to at least to v.1.7 to get fully functionally working tool (in previous versions there was a bug starting an applet before receiving all given parameters), also v.1.7 and later is MUCH more faster comparing to all previous!...
IE - never used it myself, but seems to work for customers, etc. so may also work for you...
Preferences |
Preferences page contains a set of key options used by different part of application. Most critical of them are grouped here, all other options (if supported) are "auto-keeping" last given value (if you already used dim_STAT before you'll see there is no more any graph settings here - all graph values are auto-saved every time when you use graph view)...Note: your browser must accept Netscape cookies to make this features work!
Also, there is no global setting button here, and I did not want to create too much links. So, each option has its own validation button - don't forget to click on it to apply your modifications.
Database - Without any special setting, all collected data stored in the "Default" database (real name is "dim"). However, to avoid possible contention and simplify further administration, it's highly recommended to use different databases for different projects/users/centers/etc. So, within Database section you may choose the name of the database you want to use or create+choose a new one. As reminder, current (working) database name is always present in browser title and toolbar in every dim_STAT window as [db-name]. Free and Used disk space showed for the currently used database. (Note: MySQL has a quite small storage footprint, so disk space usage will be almost reasonable, but it's a good habit to check time to time if no one of your datafiles out-pass 2GB in size, as shipped database server is a 32bit program).
Host Name List - Here you may give a pre-defined hostname list of your servers you are usually monitor, in this case instead of repetitive (and possible wrong) typing of the same names you'll be able simple to choose right name from fall-down list in your browser.
Bookmark Term - if you never used dim_STAT before just leave it as it for the moment. For others - this option was specially created to satisfy everyone who prefer a different name for "Bookmark" functionality :)) (introduced in dim_STAT since version 4.0, after long discussions we still did not get any agreement on name :)) So, you're free now to name it as you like! :))
LOG Messages option - give you a way to:
- enable/disable auto-generated time slice messages for easier time interval selection
- message list size setting (in lines)
- max message visible length (in characters)Page Colors - you're free to play with page colors if you're not happy with default setting or simply prefer to change something time to time :))
Check Java support - simple way to check if dim_STAT applet is working for your browser...
Example |
Starting collecting |
Before starting any STAT collect, check first STAT-service is up and running on every server you want to monitor!!! At least you will be sure you are not in case of the most common error :))Another point - if you want to monitor any Linux server: be sure you're installed Linux STAT Add-Ons before start any collect (see special Linux section in this document).
Now from dim_STAT Main Page you may just follow Start New Collect link. (Note: since v.8.0 there is no more separation on single or multi host collect).
IMPORTANT:
- Any STAT collect for any host is independent of all others, so it can be stopped and/or restarted at any moment independently to others...
- Your collect options always saved into special script(s) with name based on the "Collect Base Name", so using customized names you may pre-load different set of options according your needs...
- Any collect you may start via browser on-line, or just make a starting script (to run by hand, via cron, as batch, etc.)
Main Steps |
There are 4 main points in Starting STAT Collect:
- - choose/give host name(s)
- - set collect attributes (title, id, etc.)
- - choice of collected statistics
- - start now, or prepare a script for manual/delayed execution
1.) Host name(s)
Since v.8.0 you choose host(s) first. You may easily setup a list of frequently used host names via 'Preferences' intrefece (host list). This list as well all other used host names are kept via browser coockies. Before you start any STAT collect, for each given host name tool will indicate you host STAT-service state by LED color - I hope it'll avoid any potential misconfiguration issues for new users for experimented as well. For the moment there are 3 LED colors:
- Red: host is not running STAT-service on the default port, or host is inaccessible from the network, or host is down, etc.
- Orange: host is running STAT-service but the old version
- Green: ok! STAT-service is running and has all needed features!
NOTE: since v.8.0 STAT-service has a new 'stat publish' feature, and tool knows exactly now which kind of STATs you may or not collect from each given host and protect you from choosing wrong or unavailable data.
2.) Set Collect Attributes
Collect BaseName -- all selected options are saved in special start script; the name of this script is composed of BaseName + some context extentions; when you start new collect next time you may pre-load previously selected options by giving previous BaseName + click on "Preload" (by default last given BaseName keeping via cookies in your browser)
Stat ID -- all data in database referenced to this ID, the ID is not assigned automatically to give a choice to use personalized range numbers (your project id; etc.)
Stat Title -- title description you give for starting collect
Time Interval -- frequency interval (in sec.) you want to be used by statistics programs (default 30sec is quite good in many cases)
Client Log File -- name of file on the "hostname" side you want to watch: all text lines appearing in this file will be automatically copied into STAT database and timestamped; during analyze of collected STATs you may at any time visualize all Log messages corresponding to analyzed interval (may be very useful to trace auto-starting jobs, night batches, etc...). As well they give you a simple and fast way to find a right time position during data analyzing (ex: show N minutes before/after/around of selected message).
STAT-service Port -- "hostname" port number on which STAT-service is running (by default tool will used the port number given during installation, and it's a good rule to use then the same port on every host to avoid complications :))
3.) Choose Statistics
Simply select all statistics you want (and may) to collect (help bullets showing a full description of each STAT (if you have JavaScript enabled in your web browser)). Please, think before to select - probably you don't need everything! :))
Generally good rule may be to start:
- VMSTAT
- MPSTAT
- IOSTAT
- netLOAD (avoid to use 'netstat')
- ProcLOAD
these STATs will already give you a quite useful general view on your system, and once having and analyzing them - you may go more in depth according needs.
4.) Start Mode
Make Start script only -- don't start collect, just make a script
Start Now! -- start the new STAT collect right now
Show Debug output -- in case you want to see debug messages from starting collect output...
Few screenshots... |
You may see here several servers:
- neel, fourrier - Solaris hosts running upgraded STAT-service
- localhost - Linux box, upgraded STAT-service
- sting - Solaris host, old STAT-service
- fudji - Solaris host, powered off
I select neel, fourrier, localhost and sting and click on [Continue] button...
So well, hosts chosen, let's select STATs to collect now!
You may remark for hosts:
- Linux stats are not proposed for any 'green' Solaris hosts
- Solaris stats are absent for Linux 'green' host selection
- not configured or disabled stats are not present for any 'green' host
- 'orange' host (sting) has all stats present and it's up to you to keep in mind which commands will run or not on this host (as it was before v.8.0)
Hope new feature will make pleasure to everybody ;-))
Load collect from output files |
If by any reason you cannot collect data directly from your hosts and all you have is a set of several statistics output files - you may still download them via Web interface as one STAT collect and analyze your data later. Just fill need parameters and go! :))However, if your output files representing a quite big volume - it'll be better to vote for "BatchLOAD" solution (next section), because big files will take more time to load, so your browser may simply loose connection by timeout and you'll never see the final result...
BatchLOAD |
The idea of BatchLOAD came (as all other things) from day to day needs: sometime you are facing customers/users who want to know what happens on their machines, but they don't agree to install any additional software on them... (very constructive approach :)). So, all you can do is to ask them to run some stat commands on their systems and send you the output files. And every day loading their files via Web interface you'll think harder and harder if there is any way to do it automatically... Are you ready for BatchLOAD? :))
Once decided to add a new component into dim_STAT, I've kept in mind also some other tools already existing/coming around and collecting output from stat commands on the machine. All such of tools keeping data in their own format, so I've tried to design the input format for BatchLOAD to be easily adaptable. Of course, I did not think to create something universal :)), but hope it should be not too hard to write a script converting from already existing format to BatchLOAD...
Some words about BatchLOAD internals: there is no dependency or something else on the name of loaded files. All needed information is given by command options and inside of the loaded file. Loaded file must have special TAGs, at least two: to give STAT name and confirm the END.USAGE:
Usage: /apps/ADMIN/BatchLOAD -cmd NEW/ADD options Options [NEW]: -base DBname -- database name -ID id -- Collect ID, if 0 use max+1 automatically -title Title -- Collect Title -host Hostname -- Collect Host Name -isec sec -- Collect STATs Interval (sec) -start datetime -- Collect Start DateTime in format YYYYMMDDHHMISS -skip1 yes/no -- Yes/No skip first STAT measurement (often wrong values) -file Filename -- Full path to file with STATs outputs -verbose on/off -- verbose output on/off Options [ADD]: -base DBname -- database name -ID id -- Collect ID, if 0 use max (last) automatically -skip1 yes/no -- Yes/No skip first STAT measurement (often wrong values) -file Filename -- File with STATs outputs -verbose on/off -- verbose output on/off
Example:$ /apps/ADMIN/BatchLOAD -cmd NEW -ID 0 -base ANT -file `pwd`/vmstat.out -skip1 no -title "Test BatchLOAD" -host V880 -isec 20 -start 20031024100000 $ /apps/ADMIN/BatchLOAD -cmd ADD -ID 0 -base ANT -file `pwd`/iostat.out -skip1 no $ /apps/ADMIN/BatchLOAD -cmd ADD -ID 0 -base ANT -file `pwd`/mpstat.out -skip1 no -verbose on
in this example first line will create new STAT Collect using automatically new ID (max+1) with title "Test BatchLOAD" and load first file: "vmstat.out" second & third lines just load into newly created Collect next data: "iostat.out" and "mpstat.out"; once it's finished - we may connect dim_STAT web server and start analyze.Note: several "-file" options may be used on the same time, for ex:
$ /apps/ADMIN/BatchLOAD -cmd NEW -ID 0 -base ANT -skip1 no -title "Test BatchLOAD" -host V880 -isec 20 -start 20031024100000 -file `pwd`/vmstat.out -file `pwd`/mpstat.out -file `pwd`/iostat.outFile Format of STAT output
==========================File format id designed in way to give more flexibility on data grouping + processing. Key TAGs:
==> STAT StatName -- after this point all following data corresponds to given STAT command (StatName) Actually supported STAT names: VMSTAT MPSTAT IOSTAT (iostat -x) IOSTAT-xn (iostat -xn) VXSTAT (vxstat -v) psSTAT And all other Add-On STAT you are able to create! :)) like some already shipped: netLOAD T3stat oraEXEC oraIO ... ==> END -- end of STAT dataAt any time following TAGs may be present:
==> DTSET yyyy-mm-dd hh:mi:ss -- set date+time point for next STAT data ==> LOGMSG message -- add log message into database corrsponding to the currently loading dataOutside of "STAT" - "END" blocks any other lines are ignored.
Note: TAGs are exactly as it shown: "==> STAT", "==> END", "==> DTSET", "==> LOGMSG". Don't miss any characters, please :))
Small example, let's say you have 3 vmstat and 3 iostat files corresponding to, say: "morning", "day" and "night" activity during some special tasks. So, you can make 6 load files each one containing its own "STAT", "DTSET", "END" TAGs, OR! put ALL in one:
... ==> DTSET 2004-01-19 10:30:00 -- set "morning" point ==> LOGMSG Morning workload ==> STAT VMSTAT -- load vmstat ... output of vmstat.out1 ==> LOGMSG Strange CPU activity -- mark a to analyze (example) ... continue ... ==> END -- end of first vmstat ==> STAT IOSTAT-xn ... output of iostat.out1 ==> END ==> DTSET 2004-01-19 14:30:00 -- set "day" point ==> LOGMSG Day workload ==> STAT VMSTAT ... output of vmstat.out2 ==> END ==> STAT IOSTAT-xn ... output of iostat.out2 ==> END ==> DTSET 2004-01-19 23:30:00 -- set "night" point ==> LOGMSG Night workload ==> STAT VMSTAT ... output of vmstat.out3 ==> END ==> STAT IOSTAT-xn ... output of iostat.out3 ==> ENDSo, ALL information is placed in one single file ready to load:
$ /apps/ADMIN/BatchLOAD -cmd NEW -ID 0 -base ANT -skip1 no -title "Customer Workload" -host V880 -isec 20 -start 20040119100000 -file `pwd`/all_stat.outIn the same way you may group by single file all data of the same STAT command, or all outputs corresponding to the same collecting time period...
NOTE: don't forget to create your database _before_ starting any load! (in this example database name is 'ANT').
Please, take care - there is no option to give a name of loaded stat command! That's why "STAT" and "END" tags are mandatory!. Even you want to load just one vmstat file, tool have no idea about your file contents till it'll meet a "STAT" tag inside!
EasySTAT |
Since dim_STAT v.7.0, EasySTAT script make a part of STAT-service for Solaris. EasySTAT is designed to simplify BatchLOAD interfacing with stat collecting from "very remote" or "highly secured" hosts.In few words all you need is:
- install STAT-service on the host
- run EasySTAT
- backup output directory
- restore output directory on your dim_STAT server
- execute 'LoadDATA.sh' script from output directoryEasySTAT Usage:
$ /etc/STATsrv/bin/EasySTAT.sh OutDIR Intreval NbHours [ Title Host Base Batch ]
- OutDIR - Output directory for stat collects (def: /var/tmp)
- Interval - measurement interval for stat commands in sec. (def: 30)
- NbHours - execution duration in hours (def: 8h)
- Title - title to use during BatchLOAD processing
- Host - hostname to use during BatchLOAD processing
- Base - database name to use during BatchLOAD processing
- Batch - full path to BatchLOAD binary on your server (def: /apps/ADMIN/BatchLOAD)
EasySTAT Config: bu default script collects 5 main stats
- VMSTAT (runqueue, memory, CPU)
- MPSTAT (per CPU usage, interrupts, mutex, etc.)
- IOSTAT-xn (per disk I/O stats)
- netLOAD (network per interface stats +nocanput)
- ProcLOAD (processes stats summarized by process name)
- you may add any other Add-On commands by editing /etc/STATsrv/bin/EasySTAT.sh fileExample:
On 'Very Remote' Host: ==> copy STATsrv.pkg somewhere (ex: /tmp) # pkgadd /tmp/STATsrv.pkg # mkdir /var/tmp/Easy # cd /var/tmp/Easy # nohup /etc/STATsrv/bin/EasySTAT.sh /var/tmp/Easy 30 24 & <== collect data every 30sec. for 24 hours ... # cd /var/tmp # tar cf - Easy | compress > /tmp/Easy.tar.Z ==> copy /tmp/Easy.tar.Z into your laptop/flash/CD/etc. # rm /tmp/Easy.tar.Z; rm -rf /var/tmp/Easy; pkgrm STATsrv <== remove all staff if no more needOn dim_STAT server: ==> restore Easy.tar.Z somewhere (ex: /home/tmp) # cd /home/tmp # uncompress < Easy.tarZ | tar xvf - # cd Easy/* # vi LoadDATA.sh <== edit if you need to modify default settings # sh LoadDATA.sh <== load all data into your database (don't forget to create this database before!!!) ==> Analyze data via web interface & enjoy :))
GUDs integration |
If you already worked with Sun support or you're Sun employe - you may know or already used GUDs (shell script collecting various system information + stats and saving them into special case archive). GUDs was created by Sun France engineer, and another French engineer made an integration script to load GUDs data into dim_STAT via BatchLOAD - 'guds2dim.sh'. This script is shipped now with dim_STAT and may be found in /apps/ADMIN directory. To obtain GUDs script - please, contact directly Sun support.
Standalone configuration |
Before thinking about collecting stats via any kind of scripts, don't forget about 'standalone' dim_STAT possibility: there is _no_ restriction to:
- install dim_STAT on host A
- starts STAT-service on host A
- collect data from host A into host A
- be aware: on 4CPU machine (very small Sun machine) 20sec intervall collect of [vmstat + mpstat + iostat + psSTAT + netLOAD] will generate only 0.2% CPU Usage! (yes!)'Collecting' CPU usage of dim_STAT is very low. It highly uses CPU only during 'analyze' requests or global export/import/etc. actions. So, don't forget about such simple solution: install dim_STAT on the same host you want to collect from, collect locally all data you need, then simply backup whole collected database and restore it on the another machine for further analyze!
Analyzing |
Analyzing is quite intuitive, but let's just give some snapshots and few words about...
So, once you click on Analyze link you have two choices:
- Single Host Analyze
- Multi Host Analyze
Let's take a Multi-Host option for the moment, as it's quite easy for the first look.Also, you can see some other additional options:
- Active ONLY - show only currently running collects
- STATs Status - in Single Host mode shows high numbers of already collected stats (very important to see if something is really collecting)
- Title matching - filter collects on title pattern
- LOG matching - filter LOG messages on text pattern
Welcome Analyze! |
LOG Messages |
Few words about LOG Messages...As we saw already, during starting any new STAT collect you may use an optional parameter, Client Log File, to catch any new text messages from this file during all collect time. All messages are saved with time-stamp in the same database as running collect. However, at any moment you may add such a kind of messages manually via Web interface: there is a special link for LOG Messages Admin or special input field under any graph view to add a new message.
But, in what it'll be helpful?...
First: it'll help you to choose right time intervals for analyzing without keeping in mind time slices for one or another activity on the machine.
Second: at any time analyzing activity on the machine you'll be able to get a list of every registered events corresponding to the same time interval.
Example 1
Let's say you DBA in vacations and you're acting for few days :))
User claims something happens on the machine time to time and slows down his work. You're starting to monitor the system, and yes time to time you observe strange activity on Oracle side. So, instead to note on post-it time slices corresponding to the problem, you simply add two messages: "Something strange" and "Ok now" while you're analyzing activity graphs. Once your DBA come back, you may just point him to your messages. Also, if somebody else will analyze time slices entering in the same perimeter, he will be also warned by your messages!Example 2
Let's say every night you're starting some batches while nobody else working on the system. There are several important parts and you're trying to optimize them or simply check nothing goes wrong...
Let's say your main batch script is looking like:
#!/bin/sh start_batch01 start_batch02 start_batch03 start_batch04 ... start_batch20 exitNow, simply adding log messages:
#!/bin/sh echo "Start Night Batch" >> /var/tmp/log echo "start batch01" >> /var/tmp/log start_batch01 echo "start batch02" >> /var/tmp/log start_batch02 echo "start batch03" >> /var/tmp/log start_batch03 echo "start batch04" >> /var/tmp/log start_batch04 ... echo "start batch20" >> /var/tmp/log start_batch20 echo "End Night Batch" >> /var/tmp/log exitAfter that, every time you'll start new stat collect to monitor this machine, just give "/var/tmp/log" as Client Log File name. In this case every time you'll start your main batch script, every messages sent into /var/tmp/log file will be saved on the same time within collect database. To select a right time interval to analyze workload during, say, batch04 you'll need simply click between right messages: "start batch04" and "start batch05"...
There are two special "Task" tags may be used with log messages:
===> TASK_BEGIN: Unique_Task_Name --Marking begin of task execution
===> TASK_END: Unique_Task_Name --Marking the endThe Unique_Task_Name should be one word up to 40 characters and unique within current collect. For example, for 4 batches started in parallel we can add into script:
( echo "===> TASK_BEGIN: batch1" >> /tmp/log; batch1.sh; echo "===> TASK_END: batch1" >> /tmp/log ) &
( echo "===> TASK_BEGIN: batch2" >> /tmp/log; batch2.sh; echo "===> TASK_END: batch2" >> /tmp/log ) &
( echo "===> TASK_BEGIN: batch3" >> /tmp/log; batch3.sh; echo "===> TASK_END: batch3" >> /tmp/log ) &
( echo "===> TASK_BEGIN: batch4" >> /tmp/log; batch4.sh; echo "===> TASK_END: batch4" >> /tmp/log ) &Once you'll analyze activity graphs later you may with "Show Tasks" button get a short summary about all executed tasks during observed period with their total execution time (if they are finished). It may be very useful in case you're starting a big long jobs in parallel and they are all executed by the same processes, so there is no way to know which one running which job...
Multi-Host Analyzing |
Multi-Host analyzing is most simple to understand and good point to get started. Let's go!NOTE: some screenshots may be not up to date and don't matching exactly to the newest dim_STAT version. Hope it'll not make too much troubles for you (they are here only to give a general idea about interface and choice of actions)...
Main point: as we want to see several hosts on the same time and on the same graph, we cannot observe more than one single stat-value per graph, however there may be several graphs viewed on the same page.
General steps:
- Choose STAT collects
- Choose time interval you are interesting in
- Choose Graph size/mode attributes
- Choose STAT data you want to analyze
- Go!
Select Multi-host |
Choose Collect(s) and Time interval |
Collects - There are three hosts I want to see together (Sorry, those collects are used only as examples and not given as demo data)Time Interval - I described before an advantage of using of LOG messages, here is on of the good examples. I've simply selected the begin and the end of the interested me time slice of production workload...
NOTE: you may select several(!) (multi) intervals and compare them all together on the same graph! (ex: compare today's and last week activity during similar workload, etc.)
Choose STATs |
Graphics - quite intuitive section, no? :))You simply choose style of graphical presentation:
- Java Applet/ PNG Image -- graph output format
- Histogram -- no comments :)) (Limitation: histogram is not supported with PNG output!)
- Real Graph -- in case during any time period there was no data for some stat components, graph line will be stopped for this period and will continue once this component came back (for ex: during collecting one user was disconnected for a while and re-connected again). So, graph will represent something like "real" activity and "inactive" periods will be represented by holes, the only problem if the observed component switch too often from "live"-"dead"-"live" states instead of graph you may obtain a set of dots and it's much less fun :))
- Continuous Graph -- the line "holes" in case of Real Graph, ContGraph will replace by zero, so there will be never "dots" on your graphics and each graph line will stay perfectly continuous. However, there is no more visual difference between "inactive" and "dead" component :))
Finally for drawing each Graph may use Normal or Bold lines according your preference...
Note: all Graphics parameters are saved and kept via cookies to be reused and pre-selected on the next time.
Next, you just choose STAT values you want to see on your graph (ex: CPU and Net packets/sec)...
Go! |
Once you set "content" and "presentation", you may optionally set some other parameters:Show LOG: in case you want to see LOG messages on the same time as graphs to better situate your analyzing in logic of events. There are also two Modes of log viewing: Static and Dynamic. In Static Mode all messages are presented inside of a simple HTML table. In Dynamic Mode they are all inserted into small scrollable window, and if you click on any message in window you will set/unset a red bar crossing all present graphs in place corresponding to message timestamp...
Show Tasks: print table of all running/finished tasks corresponding to current time period
Refresh: you may activate an auto-refresh of the result page every "given" seconds (very useful for on-line monitoring, as well you may do the same via browser options in Opera or Firefox)
Let's START! :))
Result with Static Log |
(Sorry, there was no more place on screen for LOG :)))
Result with Dynamic Log |
If you use dynamic log + applet output you (like in this example), single click on any message line will set on/off a vertical red bar on every presented graph. This bar points you exactly on the graph place corresponding to the message timestamps. Also, as you see, at any moment you may add another Log message.
Single-Host Analyzing |
Single-Host is quite similar to Multi-Host, but gives wider variety of parameters as it working only with one given STAT collect. Let's use as example Demo collect (given by default with dim_STAT database) and analyze IOSTAT" data...So, you may open your browser now and just follow me step by step connecting to your dim_STAT server.
Choose Collect and STAT |
Example IOSTAT: Choose Disks criteria |
Choice of options is much more wide in Single-Host mode, you may really go in depth of collected data and obtain more an more detailed views adapting them to your needs...Disks - several possible combinations :) (but quite similar for all multi-line STATs)
- nothing selected (as it) means just using all data without any select refine
- you may refine your criteria by selecting only needed disk(s)
- you may exclude selected disk(s) by selecting 'Inversed Selection' checkbox
- you may use value-oriented selection (ex. Top-10 Busy% disks)
- you may exclude disks with unwanted data values
- or simply give a select pattern (very useful if you want to avoid SDS metadevices, etc.)Interval is similar to Multi-Host. To simplify, let's see last 100 measured data per disk (there are few)
Values Special Operations may be used on demand. You can see them per disks basis, or SUM/AVG for all of them, or even group values by first N characters from the disk name (very useful if you want to analyze I/O activity per controller), as well by last N characters (negative N value will mean 'last').
Example IOSTAT: Choose STAT Variables |
Data Presentation may vary from three forms:
- Graphics - graphical representation (as we already saw before)
- Table of Results - collect's raw data presented as HTML or Text output (table) and printed on screen or into temporary file
- Top-N values - just in few clicks check the MAX/MIN values of any STAT variables during given time period (for ex: if there was no any disks more busy than 30% - it's ok, and you even don't need to look at graphs, or if there are any - you know at once the time slices you need to analyze for possible activity jump)...So well, here I want to see:
- Graph
- with disk Busy%
- and Bookmark LinksBookmark Links may be inserted on the bottom of every viewed graph. Clicking on one of these links will show you another statistics view for exactly the same time period!
Start!
Example IOSTAT: Result Graph |
Some new points here: under graph you'll see a list of Bookmark links, if you click on "CPU" (for ex.) new graph will appear with CPU activity during the same time period you're observing (it's important, because even 3 days later it'll point to the same time slice).You'll also find "Add LOG-Massage" field as for Multi-Host.
And the new one: Save Graph as Bookmark.
Save Graph as Bookmark... |
Is a really cool feature to save your time: right now you may simply give a short and long names for obtained graph view and save it as a new "Bookmark". Once it done, all your options selection will be saved (booked) under given name and instead of clicking on all checkboxes to get similar data but for another time period or another STAT collect - all you'll need is just click on one button with your "Bookmark Name"!Next section is about Bookmarks...
Bookmarks |
Most of bookmarks are pre-defined to save your time. Their number may vary from release to release, but never forget - you may always create your own and keep them as your specific kit, and keep safe from one base to another. Also, people very quickly starting to have a habit to use only bookmarks and sometime are lost - oh, there is no way to see per network interface activity! or, no way to see a single process, only top-10!... Don't forget all values are here, just go directly to the STAT interface and you'll find them :)) Then create new bookmarks covering other needs and enjoy! :))dim_STAT will just give you a way to easily play with any collected data, but drive wheel is in your hands! :))
Choose Collect and click on Bookmarks... |
Choose Time interval and Graphics style |
Select all Data you want to see and GO! |
Result Page |
Note: There was a lot of discussions about "Bookmark" as name for this feature. And I'm quite agree the term is not really right to express this functionality, but the problem is I never got any new name which make pleasure to everybody :))So, I've simply decided to put this term in preferences page, in this way everybody is free to rename "Bookmark" to something else, even in "X-Files" :))
Administration actions |
From "Main Page" you may go directly to the "Bookmsrks" management page and:
- Rename
- Export
- Import
- Deleteany Bookmark, as well as Restore "Standard Kit" if you lost them for any reason (standard kit contains most popular collections of data view)...
Administration |
Several administration points were already covered in previous sections. Let's speak about some other, more oriented on day to day management...
Active/Finished Collect |
Each STAT-collect may be only in 2 states:
- Active
- FinishedThe collector's "state value" is keeping all the time in the database. So, any state changement action via Web interface are only updating a corresponding database record, that's all. Each collector itself time to time check its record for state changement, and if so doing corresponding action...
Since v.7.0, any finished collect may be restarted again at any time.
Active: collector brings data from demanded server via STAT-service, and while service is up - continue data inserting into your database; once STAT-service is down, trying to re-connect every 20sec.
Finished: collect is stopped as well all its stat commands (normally), and no more data inserted into the database...
Delete/Recycle Collects |
Finished collects may be completely removed from the database, or recycled - you may remove, for example, all data previously collected to the last N days. Actually only manual recycle is possible.Note: any delete operation frees space in database index/data files, but not reduces file sizes itself! Freed space will be simply reused for next collects.
Database deleting was covered previously in "MySQL Admin Tips"...
Export/Import collects |
Collect Export and Import is an easy way to save/copy/restore small amount of data in compressed form. In case you need to copy a big amount of data - copy the whole database and don't loose your time! (get a look on "MySQL Admin Tips")
Modify Collect parameters |
** You should be VERY CAREFUL with these actions! **Changing Title and Hostname are just a part of decoration :))
Changing Collect-ID, as global operation, will lock during modifications all corresponding tables!
Changing Time Interval have sense only with wrongly loaded data from output files, otherwise be aware you're changing your time scale and totally loosing synchronization with real world events...
Changing Start Time have sense when you want to compare similar workloads collected on different elapses of time. You may bring them to the same time scale and analyze via Milti-Host mode. However, if you have any LOG messages corresponding to the same collect - don't forget to move them in time also to keep timestamp synchronization...
LOG Messages operations |
In case there are too much messages, or you want to share them with other collects, or you want to move them slitely in time, etc. - you can do all that and much more via "LOG Messages Admin".
Add-On Statistics |
One of the most powerful features of dim_STAT is ability to integrate your OWN statistic programs into the tool. Once added, they will be considered by tool as all other from standard set of STAT(s) and give you the same kind of service: Online Monitoring, Up-Loading, Analyzing, Reporting, etc.
However, choice of external stat programs is too wide and it's quite impossible to design a wrapper adapting to any format. So, I've decided to limit input recognizer to just 2 formats (but covering maybe 95% needs) and leave you to write your own wrapped if necessary to bring the output to one of supported formats.
Formats supported by dim_STAT:
- SINGLE-Line: with one output line per measurement (ex: vmstat)
- MULTI-Line: with several output lines per measurement (ex: iostat)
To be correctly interpreted, your stat program should produce a stable output (same format for data lines, at least one line in MILTI case, keep time-out interval correctly, etc.). Lines different of data have to be declared as ignored by dim_STAT.
NOTE: lines shorter than 4 characters considered as spam! and ignored!
Let's take some examples to get it more clear...
Example of SINGLE-Line command integration |
Let's say we want to monitor a read/write cache hit on the system. This information we may get via "sar" (for example):$ sar -b 1 1000000000000000 SunOS sting 5.9 Generic_112233-05 sun4u 07/09/2004 18:10:13 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s 18:10:14 0 1 100 0 0 100 0 0 18:10:15 0 14 100 0 0 100 0 0 18:10:16 0 7 100 0 0 100 0 0 18:10:17 0 0 100 0 0 100 0 0 18:10:18 0 0 100 0 0 100 0 0 18:10:19 0 135 100 0 0 100 0 0 18:10:20 0 0 100 0 0 100 0 0 18:10:21 0 69 100 0 2 100 0 0 18:10:22 0 86 100 0 2 100 0 0 18:10:23 0 0 100 0 0 100 0 0 18:10:24 0 0 100 0 0 100 0 0 18:10:25 0 0 100 0 0 100 0 0 ...And what we are interesting in are "4"-th and "7"-th columns from sar output, and ignoring any lines containing "*SunOS*" or "*read*"..Folowing "Integrate New Add-On-STAT" link:
I give a name CacheHIT for the new Add-On.
We need only 2 columns from each line (4 and 7).
And this is a "Single-Line" output...Click on "New"...
In this step we need to explain what we want to run and which information we'll need:
Description: CacheHIT via SAR
Shell Command: sar -b %i 1000000000000000
- where %i on execution will be replaced by time interval sec.
- Note: command name is doesn't matter here because it uses only as alias for STAT-service (look at "access" file section, e.g. it's possible to name shell command "toto" and put in access file /usr/bin/sar as alias resolving)...Ignore Lines: we should ignore any lines containing "*SunOS*" or "*read*"
DATA Descriptions:
- ColumnName - leave it as it if you don't think access database directly (Note: there are 2 reserved columns for Collect-ID and measurement No.)
- Data Type - set "Float" if you're not sure it'll be always "Int"
- Column# on input - in our case we need columns 4 and 7
- Short Name - one word value descriptions, so %rcache and %wcache
- Full Name - description used in all cases detailed information needed
- Use in Multi-Host - if you choose "Yes" corresponding value will be automatically enabled in Multi-Host mode for analyzing on several hosts on the same time
Create! :))
What's Next? Will it wrok now?...Yes! IF YOU DID NOT FORGET to give an access to this new command on your STAT-service! (Most common error...)
So, if you want to collect "CacheHIT" data from server "S" be sure STAT-service on "S" giving an execution permissions for command name "sar" (as we put in shell command description). Add following lines into your /etc/STATsrv/access file:
# CacheHIT Add-On command sar /usr/sbin/sar #Now it'll work! :))
MULTI-Line Add-On command integration |
Multi-Line integration is quite similar to Single-Line, except few additional things:
- Line Separator pattern: new-line by default, but in some cases may be a header (like iostat)
- Attribute Column: very important! as you have several lines per measure you need to distinct them by something (like "diskname" column in iostat)
- Use In Multi-Host: has more than simple Yes/No options, you should use SUM or/and AVG for collected values...
Pre-Integrated Add-Ons |
To make your life easier, there are several additional stat programs already pre-integrated (Oracle, Java, Linux). They are not installed by default because not everybody need all features on the same time and have too much buttons on the screen is quite stressful :))So, to install any standard Add-On into your database all you need is simply click on "Install" button. All these programs normally already present in STAT-service package for corresponding platform (of course, don't look for Linux iostat inside of Solaris package :)). However, some of them may need some additional conditions/parameters for correct execution...
The good rule will be checking first the add-on works correctly starting directly from the STAT-service bin-directory on the client side (/etc/STATsrv/bin). And only after collecting from add-on output may be involved.
There are 2 new psSTAT wrappers added:
- ProcLOAD: all output information on-fly summarized by process name
- UserLOAD: all output information on-fly summarized by user nameThese stats become very useful when you have hundreds/thousands running processes and it's no more matter the activity of the one single process rather the groups of processes or users.
netLOAD - Solaris network activity monitoring. This tool is included into dim_STAT STAT-service for a long date now. And since v.8.0 netLOAD monitoring all present network interfaces in the system (including virtual and loopback). If some indicators are not populated by device driver - '-1' value is present instead. Also, new option '-I' is added: you may give a fixed list of network interfaces you want to monitor (run '/etc/STATsrv/bin/netLOAD' for more details). In STAT-service netLOAD is integrated via 'netLOAD.sh' script to give an easy way to change any option if needed.
Needs correct Oracle environement for "oracle" user (default: oracle, but may be changed inside of scripts),
it means:
# su - oracle -c "sqlplus /nolog"should work correctly and enter you into "SQL>" prompt for the right database instance.
These Add-Ons given just as examples, better solution should be made via compiled binary and keeping database connection. But sometime, it's better than nothing :))
- oraIO: Oracle I/O stats for data/temp files
- oraEXEC: Oracle SQL Executions/sec, Commits/sec, Sessions
- oraLATCH: Oracle latch stats
- oraSLEEP: Oracle latch sleeps stats
As well you may add any other new one (some customers even collecting statspack reports directly into dim_STAT! :))
Is a wrapper to bring information from "jvmstat" package developed by Sun engineers. jvmstat is officially integrated now within JVM 1.5 distribution. Wrapper is able to monitor all running JVMs on your server.To run jvmstat you need first of all to have jdk 1.5 installed on your host.
To get 'jvmSTAT.sh' wrapper working:
- edit "/etc/STATsrv/bin/jvmSTAT.sh" file (STAT-service) on each client machine to set right PATH environment for JAVA_HOME.
- enable jvmSTAT in STAT-service (edit /etc/STATsrv/access file)
- Before to start any new collect including jvmSTAT, be sure jvmSTAT Add-On is already installed (Add-On interface from Main Page)
NOTE: this wrapper is still here, but I don't see any reason why one should use old JVM version today, no? jvmSTAT is more preferable solution for any kind of GC collection now.Wrapper to collect on-fly information about GC activity of any JVM running with "-verbose:gc" option (before JVM 1.4.2 the only possible way to get GC activity infos of standard JVM was a log output dumping, so this wrapper is simply based on log file scanning).
Usage: Let's say you want to see GC activity of one of JVM running on server "J".
0.) Install "jvmGC" via Add-Ons page.
1.) jvmGC using $LOG file for data input (you may change name & permissions according your needs (default filename: /var/tmp/jvm.log), modify if needs on the server "J" STAT-service side (/etc/STATsrv/bin).
2.) Start collect including "jvmGC" via Web interface
3.) on server "J" add "-verbose:gc" option to java in your starting application script and redirect output into application log file (for ex. app.log)
4.) once you want to monitor your JVM:
$ tail -f app.log | /etc/STATsrv/bin/grepX GC >> /var/tmp/jvm.log
5.) observe jvmGC output data & have fun! :))
- LvmSTAT (Linux vmstat)
- LcpuSTAT (Linux mpstat)
- LioSTAT (Linux iostat)
- LnetLOAD (Linux netLOAD)
- LpsSTAT (Linux psSTAT)
- LprcLOAD (Linux ProcLOAD)
- LusrLOAD (Linux UserLOAD)
For further details, see following special Linux note...
Administation tasks |
At any moment you can:Save Add-On Description - this will give you an ASCII text file which may be reused for another database by you or other users. In this way you may share with other any new findings and any new tools you found useful!
Restore Add-On Description - from information on given Desc.File re-create all Add-On needed database structures and fill all needed information for its correct functionality. WARNING: all previous data will be destroyed if you're already using the same Add-On in the current database!
Delete Add-On - Removes Add-On and all corresponding data from the current database...
Linux Special Notes |
I don't know if I'll surprise you by saying: all dim_STAT binaries for Solaris SPARC are still compiled on the old and legendary SPARCstation-5 under Solaris2.6 and working on every next generation Sun SPARC machines, as well as last generation + Solaris10 included? Some unchanged binaries are still here and are even 10 years old! This calls a TRUE binary compatibility! :))Now, may I say the same thing about Linux? :)) Even sometime the SAME vendor breaking its binary compatibility between previous and next distribution version!...
Firstly, when the main problem was only in the different implementation of shared libraries, I've recompiled all main dim_STAT programs as static binaries to be sure they will run on every distribution. With time things are more worse: static binary may do a core dump on some distros... So, current dim_STAT Linux version shipping both dynamic and several static versions of the same binary generated on the different distros. Current v.8.0 is reported to work out-of-the-box on: MEPIS 3.3.1-1, MEPIS 6.0, RHEL 4.x, CentOS 4.x, SuSE 9/10, Fedora Core 3/4/5. Anyway, if you meet any problem during installation or execution of dim_STAT - please, contact me directly and we'll try to fix this issue together...
NOTE: PC boxes are quite chip today, so always ask yourself if simple buying of $200 PC, installing MEPIS 6.0 (10 minutes), installing dim_STAT (5 minutes) and START to collect from all your servers! - will not be chipper/easier/simpler rather trying to fix issue after issue :))
Linux STAT-service |
If generally there is no problem with Solaris stat programs, people always have a lot of questions about Linux stats integration...Main point: keep in mind - the most important part to collect stats from Linux box is the working STAT-service! If it starts on your box - you may integrate now _any_ existing or new stat commands (there are so many available via internet)...
Pre-integrated stats are already coming with STATsrv-Lux.tgz package. It doesn't mean it will work on your system at once (linux distribution compatibility is another issue :)) Some of them I got from 'sysstat' kit and recompiled on MEPIS 6.0 (so you may recompile them yourself if needs, these stat programs are coming from sysstat (http://perso.wanadoo.fr/sebastien.godard/). And some developed myself - as I was tired by seeing different output on different distros even with standard commands like 'vmstat'!... So, now STAT-service is shipped with its own vmstat, netLOAD and psSTAT! :))
Wrappers may be needed for some stat commands to skip unused information or just transform input data into expected form. Following commands already have wrappers (if need) and pre-integrated into packaged STAT-service.
NOTE: sometime the same command may give a different output on a different Linux distribution! So, be ready to create new Add-Ons in this case or create common wrappers to adapt command output...
LvmSTAT |
Source: Linux "vmstat", shipped with STAT-service since v.8.0Output example:
dim$ /etc/STATsrv/bin/vmstat 1 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 434384 691948 9708 220592 3 4 32 28 36 47 3 1 95 1 0 0 434384 691948 9708 220592 0 0 0 0 347 913 2 0 98 0 0 0 434384 691948 9708 220592 0 0 0 0 396 1083 2 1 97 0Wrapper: no need anymore, the same output is guaranteed on the all systems now (if it runs ;-))dim$
LcpuSTAT |
Source: "mpstat" from SysstatOutput example:
dim$ /etc/STATsrv/bin/cpuSTAT.sh 1 Linux 2.6.15-26-386 (dimitri) 11/16/06Wrapper: not really needed, but simplifying usage, just ignoring "*Linux*||*CPU*||" lines and use "*all*" as separator.16:45:15 CPU %user %nice %system %idle intr/s 16:45:16 all 0.00 0.00 0.00 100.00 115.00 16:45:16 0 0.00 0.00 0.00 100.00 115.00 16:45:17 all 1.00 0.00 0.00 99.00 147.00 16:45:17 0 1.00 0.00 0.00 99.00 147.00 16:45:18 all 0.00 0.00 0.00 100.00 162.00 16:45:18 0 0.00 0.00 0.00 100.00 162.00
dim$
LioSTAT |
Source: "iostat" from SysstatOutput example:
dim$ /etc/STATsrv/bin/ioSTAT.sh 1 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util hda 11.05 1.24 0.84 0.37 64.54 56.72 32.27 28.36 100.59 0.21 175.97 6.16 0.74Wrapper: ioSTAT.sh - to ignore CPU-related part, devices/partitions list may vary from system to system.Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util hda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
^C $dim
psSTAT for Linux |
I was tired by strange/wrong 'top' output which in many cases just not showing or ignoring low loaded processes, and finally give you a wrong vision about your system. So I adapted my Solaris psSTAT idea to the Linux /proc structures...
So well, there are few similar options:
psSTAT (dim) v.2.0 Nov.2006 Usage: psSTAT [options] -l Long output -O active Only processes/users -T sec Timeout sec seconds between outputs -N name[,name2[,...]] only proc Name containing name, or name2, or ... -M mode Use Special Mode output: proc - output is grouped by process name user - output is grouped user name ref - reference: process name combined with pid dim$Output example:
dim$ /etc/STATsrv/bin/psSTAT -O -T 1 PID PNAME UsrTM SysTM CPU% MinF MajF PRI NI Thr VmSIZE 1 init 0.00 0.00 0.0 0 0 16 0 1 1568 3153 dbus-daemon 0.02 0.00 2.0 0 0 17 0 1 2324 3166 hald 0.01 0.00 1.0 0 0 16 0 1 6916 3761 Xorg 0.01 0.00 1.0 0 0 5 -10 1 100680 3879 konsole 0.02 0.00 2.0 2 0 16 0 1 29416 24904 kpowersave 0.01 0.00 1.0 0 0 16 0 1 32720 28035 psSTAT 0.02 0.00 2.0 336 0 16 0 1 1812 PID PNAME UsrTM SysTM CPU% MinF MajF PRI NI Thr VmSIZE 1 init 0.00 0.00 0.0 0 0 16 0 1 1568 28035 psSTAT 0.03 0.00 3.0 336 0 17 0 1 1812 PID PNAME UsrTM SysTM CPU% MinF MajF PRI NI Thr VmSIZE 1 init 0.00 0.00 0.0 0 0 16 0 1 1568 3761 Xorg 0.03 0.00 3.0 0 0 5 -10 1 100680 22726 java_vm 0.01 0.00 1.0 0 0 16 0 21 231760 28035 psSTAT 0.03 0.00 3.0 336 0 17 0 1 1812 PID PNAME UsrTM SysTM CPU% MinF MajF PRI NI Thr VmSIZE 1 init 0.00 0.00 0.0 0 0 16 0 1 1568 3761 Xorg 0.02 0.00 2.0 0 0 5 -10 1 100680 3879 konsole 0.01 0.00 1.0 0 0 15 0 1 29416 28035 psSTAT 0.03 0.00 3.0 336 0 16 0 1 1812 ^C dim$
There are 3 Linux add-ons based on psSTAT:
- LpsSTAT - process stat using 'ProcName-PID' pair as unique process reference (mode: ref)
- LPrcLOAD - grouped by process name activity stats (mode: proc)
- LUsrLOAD - grouped by user name activity stats (mode: user)
NOTE: data are collected in live from '/proc' data but by given time interval, so be aware - if during this interval some processes are forked and dead very quickly - they're simply not seen by tool as there will be no trace about them in any '/proc' data...
LpsSTAT (psSTAT) |
Source: psSTAT for Linux, mode: refOutput example:
dim$ /etc/STATsrv/bin/psSTAT.sh 1 PNAME-PID UsrTM SysTM CPU% MinF MajF PRI NI Thr VmSIZE VmLCK VmRSS VmData VmSTK VmEXE VmLIB VmPTE init-00001 0.00 0.00 0.0 0 0 16 0 1 1568 0 84 160 88 28 1256 12 dbus-daemon-03153 0.03 0.00 3.0 0 0 17 0 1 2324 0 820 308 84 328 1540 12 hald-03166 0.01 0.00 1.0 0 0 16 0 1 6916 0 2016 3312 580 204 2732 12 Xorg-03761 0.02 0.00 2.0 0 0 5 -10 1 100680 0 29688 88740 276 1472 6200 248 konsole-03879 0.01 0.00 1.0 0 0 16 0 1 29416 0 6684 2980 88 40 24820 44 opera-13455 0.01 0.00 1.0 0 0 15 0 1 84380 0 52596 49804 84 9788 21844 92 java_vm-22726 0.01 0.00 1.0 0 0 16 0 21 231760 0 23960 182852 116 12 48192 108 psSTAT-27995 0.01 0.00 1.0 336 0 16 0 1 1816 0 836 420 88 16 1256 12 ^C $dimThis STAT should be used if you're looking for a single process activity and go in detail for PID, etc.
LPrcLOAD (ProcLOAD) |
Source: psSTAT for Linux, mode: procOutput example:
dim$ /etc/STATsrv/bin/ProcLOAD.sh 1 PNAME UsrTM SysTM CPU% MinF MajF Nmb Act Thr VmSIZE VmLCK VmRSS VmData VmSTK VmEXE VmLIB VmPTE NetworkManager 0.00 0.00 0.0 0 0 1 0 1 3928 0 1048 324 88 264 3140 16 Xorg 0.01 0.00 1.0 0 0 1 1 1 100680 0 29688 88740 276 1472 6200 248 konsole 0.01 0.00 1.0 0 0 5 1 5 148032 0 30780 15852 440 200 124100 220 psSTAT 0.03 0.00 3.0 338 0 1 1 1 1816 0 836 420 88 16 1256 12 PNAME UsrTM SysTM CPU% MinF MajF Nmb Act Thr VmSIZE VmLCK VmRSS VmData VmSTK VmEXE VmLIB VmPTE NetworkManager 0.00 0.00 0.0 0 0 1 0 1 3928 0 1048 324 88 264 3140 16 Xorg 0.01 0.00 1.0 0 0 1 1 1 100680 0 29688 88740 276 1472 6200 248 konsole 0.01 0.00 1.0 0 0 5 1 5 148032 0 30780 15852 440 200 124100 220 psSTAT 0.01 0.00 1.0 338 0 1 1 1 1816 0 836 420 88 16 1256 12 ^C $dimThis STAT should be used if you're looking for global per 'process name' activity and don't really need to go in detail - specially when you have a lot of processes running (!)
LUsrLOAD (UserLOAD) |
Source: psSTAT for Linux, mode: userOutput example:
dim$ /etc/STATsrv/bin/UserLOAD.sh 1 UNAME UsrTM SysTM CPU% MinF MajF Nmb Act Thr VmSIZE VmLCK VmRSS VmData VmSTK VmEXE VmLIB VmPTE root 0.01 0.00 1.0 420 0 62 1 62 256312 3576 44224 33216 3208 5700 201456 616 dim 0.03 0.00 3.0 46 0 92 2 124 1774180 0 393556 795244 8176 60672 838516 2632 UNAME UsrTM SysTM CPU% MinF MajF Nmb Act Thr VmSIZE VmLCK VmRSS VmData VmSTK VmEXE VmLIB VmPTE root 0.02 0.00 2.0 338 0 62 1 62 256312 3576 44224 33216 3208 5700 201456 616 dim 0.02 0.00 2.0 46 0 92 2 124 1774180 0 393556 795244 8176 60672 838516 2632 ^C $dimThis STAT should be used if you're looking for global per 'user' activity and don't really need to go in detail - specially when your tasks are grouped per user or you have a lot of users using the system (!)
LnetLOAD (netLOAD) |
Source: my netLOAD script for LinuxOutput example:
/etc/STATsrv/bin/netLOAD.sh 1 Name IBytes/s OBytes/s IPack/s OPack/s IErr OErr IDrp ODrp Bytes/s Pack/s none 0 0 0 0 0 0 0 0 0 0 lo 66070356 66070356 130181 130181 0 0 0 0 132140712 260362 eth0 32074500 19059001 236433 218784 0 0 0 0 51133501 455217 eth1 3766140 1544506 93950 56325 60 0 60 0 5310646 150275 Name IBytes/s OBytes/s IPack/s OPack/s IErr OErr IDrp ODrp Bytes/s Pack/s none 0 0 0 0 0 0 0 0 0 0 lo 0 0 0 0 0 0 0 0 0 0 eth0 0 0 0 0 0 0 0 0 0 0 eth1 0 0 2 3 0 0 0 0 0 5 Name IBytes/s OBytes/s IPack/s OPack/s IErr OErr IDrp ODrp Bytes/s Pack/s none 0 0 0 0 0 0 0 0 0 0 lo 0 0 0 0 0 0 0 0 0 0 eth0 0 0 0 0 0 0 0 0 0 0 eth1 0 0 2 3 0 0 0 0 0 5 ^CSTAT-service Wrapper: no need, should work as it on any Linux system
Report Tool |
This User's Guide is completely written using Report Tool :))
As usually, this tool was created to cover my day to day needs :))Quite often I have to write reports to explain performance findings, present observed system/application activity, etc. etc. etc. ... etc. Yes, etc. because sometime we have to write too much to make things work or simply protect people from doing stupid things :))
So well, once you've started to write your document for French customer (so in French), and it appears majority of the development team speaking English only (or not only, but not French)... And you start to keep 2 parallel copies from the same document: FR/EN... Then you discover something very important and cannot say it to customer (yet) but absolutely need to communicate it internally! So you split once more again: FR/EN Customer/Internal = 4 different documents! next split give you 8 documents (still based on the same source of information!)... And more again: looking on people spending hours (or whole day) doing copy+paste of activity graphs from browser/teamquest/best1/patrol/etc into their wordprocessor ... makes me cry :))
So I was really tired by this situation and tried to imagine something different :))
Overview |
First point was with format choice: at least everybody on any platform able to read HTML! So it was easy :)) Also, you can easily convert HTML into any other formats (like PDF, etc.) if you need.Next point is more hard: SO WHAT?... :))
SO, my idea was to find a solution to generate any different kind of document from the same main data source!
When you get a look on any document, how its contents is organized? You'll see:
- Document= N x Chapters
- Chapter= M x Sections
- Section= P x Paragraphs
- so on...
- Smallest part = Smallest partAll depends on what is your smallest level (if your smallest part = 'letter' - you went too far and have a tendention to complicate things around :))
So, I've named my smallest part as Note and Document/Report is presented simply as ordered tree of Notes!
Main points:
- position of each Note in Report decided by its parent-ID (level+1) and order number (same level)
- any Note may have no one, one or several attributes on:
- Language (French, English, ...)
- Confidentiality (Personal, Customer, ...)
- ... (any other may be added into the system very easily)- each Note has:
- its Data Type
- Title
- text comments
- attachment (if supported by Data Type)- list of Data Type is fixed:
- Text
- Image
- HTML
- Binary
- dim_STAT Collect
- SysINFO
- HTML.tar.Z ArchiveAny Note may be created/edited/deleted at any time. During Report generation you only needs to choose right criteria according your requirements and obtain a valid document with criteria-corresponding parts!
Datatype: Text, HTML, Image, Binary |
These data types are quite similar - you may create any note with any text/ html/ image/ binary file in attachment with or without your comments. Except binary, any other file may be presented In-Line or LinkedIn-Line means your file will make part of the main document page and be part of the visible contents (ex: text directly included, image showed, etc.)
Linked means linked :)) means main document page will only include a link to your attachment (however, this attachment will be always included with document!)
Note: same idea is applied to other type of Notes too.
Datatype: SysINFO |
This is a special type to get on-line a system information from any host in the network running STAT-service (if you have permission to access this service + SysINFO).
Datatype: HTML.tar.Z |
Special type in case you want to integrate into your Report any other already written documents (converted to HTML and archived into one single tar.Z file). As you may have several files in the archive, the tool will ask you for the name of the 'main' file (file keeping references to all other).
Datatype: dim_STAT-Snapshot |
Special type in case you've saved any graph pages with Java applets during analyzing with dim_STAT. You may integrate them 'as it', tool will extract applets data from it and insert them as Note contents.Probably should be deprecated now as any graph you may save in PNG format very easily, or simply convert it into PNG or GIF....
Datatype: dim_STAT-Collect |
This is a VERY special type - it helps you to generate you all STAT graphs automatically and save your so costly time!!! (Follow example below).
Preview / Generate / Publish |
At any moment you may 'Preview' your Report or 'Generate' current/final version to be accessed on-line, or saved and shared as tar.Z archive, or single PDF file. Also, your document may be published on other site (this part is only limited to the same physical host actually).
Export / Import |
These features explains why Report Tool is called 'Mobil' :)) At any time you may export your Report and import to any other (central?) dim_STAT server (means: you edit/prepare everything on your laptop, and time to time synchronize your work with central repository). Also, it gives you a simple way to prepare your own templates! Instead to start new report every time, just import your template (old report) and continue!
Let's try! New Report |
Now relax, take your coffee, be sure you've 20 minutes of free time (while nobody stressing you), your GSM if off, you're ready to listen... So, go to the dim_STAT Main page and click on 'Report Tool'.
Click on Report Tool |
As you may expected - nothing here for the moment :))
So, let's click on the New Report!
New Report |
All you need here is just to fill new report form:
- ID: unique digital number
- Title: Main title
- Owner: owner information
- Chart: any additional comments to be present on the cover page
- Use: choose a pre-configured Report templateand click on 'Create' :))
Edit Report |
Wow! It works! :)) So well you may with 'big' buttons:
- Hide/Show Note comments
- Preview your report
- Generate report
- go Home (back to the main Report page)But if you'll walk your mouse over currently pre-generated notes you'll see explicit pop-up messages explaining each action...
Edit Actions |
As you may see:
- click on 'down' icon -- create new note 'after' current one (same parent level)
- click on 'right' icon -- create new 'son' note 'under' current one (parent level+1)
- click on 'cut' icon -- cut following by paste action (may go to 'trash' if need to be deleted (end of screen))
- click on 'data' -- edit/view NoteSo let's edit 'General Information' (click on 'data' icon)...
Edit Note |
From here you may see current Note preview and edit Note comments or attributes. If you change only attributes - click on the corresponding button to apply changes. If you want to modify Note comments - click on 'Edit Note' (BTW, you may also do it with any external editor!)
Edit Note, continue... |
Add what you want in the text fields (you may use any HTML tags, etc.)
Edit Note, continue2... |
Note: if you choose Text-format option your text is auto-formatted:
- any empty line is seen as 'new paragraph'
- any 3 blanks in the beginning of the line are replaced by 'tabulation' (like here :))Save Note! :))
Edit Note, continue3... |
You may re-edit again or open the door :))
Edit Report, continue... |
Let's fill other notes in the same way...
Edit Report, continue2... |
Pretty good :))
Now, I want to add SysINFO Note for both hosts: 'tahiti' and 'java'. (SysINFO data received on-line on the moment you're asking for, it's an easy way to keep your document updated on the moment you're writing. BTW, look into STAT-service package to know how it configured on the host side, you may extend it with any other information you need!).So, new SysINFO note under 'Software Configuration'... (right icon)...
Add Note |
New Note -- SysINFO |
As tool has no idea which kind of Note you want to add - it ask you here to choose one before continue :)) (also, I did not want to bring too much complexity on interface, no? :)) So, just click on 'SysINFO' here...
New Note -- SysINFO Form |
Here you need to fill SysINFO form: usual data (title/comments/attributes) and SysINFO specific:
- host name
- host's STAT-service portAs SysINFO output is quite wide (usually) it's preferred to keep it 'As External Link'.
Save Note! If you gave a right hostname, port, and STAT-service is up and running on this host - you'll receive your data in few seconds as me from my 'tahiti' domain :))
New Note -- SysINFO Result |
As I asked for 'Linked' contents - there is only link to SysINFO data from 'tahiti'. Let's click on it to see if it's correct...
New Note -- SysINFO Link Contents |
Edit Report, continue3... |
As you see, I've my new SysINFO note under 'Software Configuration'. Let's get SysINFO from 'java' host now and place it 'under' current tahiti SysINFO...
Edit Report, continue4... |
OK! :)) Now, under 'Hardware Configuration' I want to add an image representing my platform diagram (very simple image, just for ********* not able to imagine 2 hosts with one storage box :)) but what do you want: if people time to time doing presentation is because we are still more lazy in reception of words vs images :)))
So: 'Hardware Configuration' -> Image -> ...
Add New Note -- Image |
Once again, similar info to fill, except you may give a name of your image file to upload [Browse]. Let's fill it and save as 'In-Line' attachment.
Add New Note -- Image Inline |
Wow, it's TOO BIG! And it's not because it's so big you see it better! :)) So, I prefer to keep 'linked' any big images (except if they are VERY important :)) So, [Edit Note] -> 'As External Link' (no more need to give image file again) -> [Save Note]
Add New Note -- Image Linked |
Ok, it's better! :)) Now, let's do the most complex task here: we'll add 'dim_STAT Collect' note!
Leave this page [Door], go to the end of Report and click on [Right] icon on 'Report' note, then choose 'dim_STAT Collect'...
Add New Note -- dim_STAT Collect, Step1 |
dim_STAT Collect Note has several steps during its full creation:
1. - setup dim_STAT server database parameters, [Next]
2. - select STAT collect you want to use, [Next]
3. - select STATs you want to see and time interval, [Next]
4. - [Finish] or select STATs you want to see and time interval, [Next] (goto 4)
5. - Check graph titles, choose graph parameters, [Save]So, we are on the Step-1 here, and if you don't have any data collected you may get them from the 'Default' demo collect:
- server: localhost
- port: Default
- database: Default[Next]...
NOTE: interface become more optimized and more extended with each new release, so screen shots are probably not up to date every time :))
Add New Note -- dim_STAT Collect, Step2 |
Choose STAT collect here and Search mode (I've already log messages on 'java' host, each message was added before any of my tests started, so it's quite easy to find now corresponding time interval to each test). Otherwise we may always choose 'Date and Time' search, but you'll quickly understand it's much more painful comparing to LOG messages :))
NOTE: since v.8.0 more options added to simplify reporting:
- replay the same time slices for N days (in Date and Time)
- auto include time/date into generated graph titles
- replace on-fly some part (max 5) of LOG message by something else
Add New Note -- dim_STAT Collect, Step3 |
So far so well :)) Now we need to choose kind of graphs we want to see and during which time interval...
NOTE:
- all proposed Per-Host STATs are Bookmarks! More you created Bookmarks during analyzing - more data you may generate in your report!
- as I selected 2 hosts, tool give me also Multi-Host STATs (depending on stat commands to be present here or not), each STAT (like in Multi-Host Analyze) will put all given hosts into single graph.
Add New Note -- dim_STAT Collect, Step3 continue |
So, I'm choosing:
- per host: CPU busy%, Run queue, Mutex spin, System calls/s
- multi-host: CPU busy%, Network load bytes/s and packets/sTime interval: as I know each my test run for ~15min, I choose time interval '15 min. After' each LOG message...
[Next]...
Add New Note -- dim_STAT Collect, Step4 |
So, it's OK for me now! I've got my STATs selected with pre-populated graph title (from LOG message). BTW, you may see all your previously selected STATs are pre-selected here (selection is saved via cookies and specific to each database name)... [Finish]...
Add New Note -- dim_STAT Collect, Step5 |
Here you've to specify your future graphs parameters:
- Main title
- per graph title
- order generation
- graph mode, style, size, etc.
- Auto-AVG: good to select if you have too large time intervals and your graph become too dense
- Show LOG/TASK (as during analyze)
- Show processing: get generation output on the browser (not all browsers working correctly with it, some are waiting to get EOF before show something... If you don't choose this option, processing output is always printed into /tmp/.report.log file on Report Tool server side).[Save]...
Finally you're free now to do something else, because now your machine is working for you and all you have to do is sit and wait... Once you'll have a habit and feel well the tool - you'll ask it to generate A LOT OF graphs on the same time and got much MORE TIME TO DO SOMETHING ELSE!!! :))
Add New Note -- dim_STAT Collect Result |
Here is the final result with all my graph generated! Let's click on any link to see graph results...
NOTE: If you remember, I've selected generating order 'by Collect', and what I see now is a list of collects first, and each collect link will show me all selected STAT graphs for the same given STAT collect. Now, if I select 'by STATs' order generation - I'll see here a STAT list, and each link will show me the same STAT metric for different collects on single page...
To understand this difference:
- click on any collect links here and look on graphs (same collect, different graphs)
- now just push [Back], [Back] button (back to Step5), re-select order as 'by STATs', and [Save] again!
Add New Note -- dim_STAT Collect Contents, ordered by:Collect |
Add New Note -- dim_STAT Collect Result per STATs |
As you see, now single STAT link contains all given collects inside, so if you want to compare network usage in different cases now - just click on bytes/sec or packets/sec link!
Add New Note -- dim_STAT Collect Contents, ordered by:STATS |
Edit Report, next... |
Edit Report -- Cut |
Last thing now: I don't want to see my 'per STAT' first in Report section, just let's move it at the end... Click on [Cut] icon, then [Paste] where you want ([Trash] icon does delete operation!)
Edit Report -- Paste! |
Edit Report -- Pasted... |
Edit Report -- Preview |
Edit Report -- Preview Output |
Edit Report -- Preview Output2 |
Generate Report |
Generated Report documents |
Report Tool Home |
THAT'S ALL, folks! :)) The export file of this demonstration report may be found within dim_STAT distribution as 'ExpReport_15.tar.Z'. You may import and play with it as long as you want! :))
Also, for good first exercise you may try to generate your first graphs from 'Demo collect' giving by default in your dim_STAT database!...
Additional Tools |
Since v.5 additional tools were shipped within package, but it seems I forgot to present them explicitly and a lot of users were not informed about...
Java2GIF Tool |
Tool converting any HTML pages containing dim_STAT applet into HTML pages with GIF images instead, very useful for reporting, printing, etc.Installed in: /apps/Java2GIF
Requirement:
- JRE or JDK installed on the system
- X11 DISPLAY positioned for image outputConfiguration: edit "j2gif.sh" script to point into right PATH for "java" binary
Usage:
$ j2gif.sh /full/path/to/dir/with/your/html/filesExample:
- Analyzing dim_STAT graphs time to time you "Save As" your pages into /Report/J
- Once finished, make backup first of your files!
- $ /apps/Java2GIF/j2gif.sh /Report/J
- That's all!
Java2PNG Tool |
Same as Java2GIF, but with few important differences:
- don't need X11 server for output
- processing execution is much more faster vs Java2GIF
- use PNG image format
- doesn't support histogramm modeInstalled in: /apps/ADMIN
Requirement: -
Configuration: -
Usage:
$ cd /apps/ADMIN
$ Java2PNG /full/path/to/dir/with/your/html/files
HTMLDOC Tool |
Installed in: /apps/htmldocUsage: $ cat /apps/htmldoc/README
This is a short README about "htmldoc" program. This program is free and I've found it very useful for making printable and well presented HTML ==> PDF documents. Of course, HTML is great for screen viewing, but when you should bring a printed version - it's not so simple to obtain something presentable in easy way... Also, I like to send PDF documents, they are small and very portable :)) The home page of "htmldoc" tool is: http://www.easysw.com/htmldoc You may download and compile the last version from this site. But as people are lazy by defenition :)) , I've pre-installed not last, but well working binary of this great tool... For detailed description you may start to read the htmldoc manual, but if you are lazy as me :)), you may just start: /apps/htmldoc/bin/htmldoc --webpage --header t.D -f Report.pdf *.html Report/*.html to get PDF document (Report.pdf) from collection of HTML files... That's all! :)) -Dimitri
FAQ |
Sizing of dim_STAT Instance... |
the problems is simple: there is no sizing rules at all...Disk space: it depends only on size of collected information... In Preferences page you can see used space by the current database and size of the biggest file: each file should be less 2GB in size, if it's not so - you have to create a new database!
CPU: during collect your CPU power is not used at all... However, once you start a query via Web interface you may access a really big amount of data! Your query became DSS (decisional) and may load 100% one CPU during execution... Normally query execution time is quite short, but directly depends on demanded amount of data.
Separated databases is fine when you may need different administrative tasks regarding to collected data. For example, it may be annoying if somebody loading a big amount of data on the same time while you're trying to analyze something... You will create additional locks and slow down work for others. MySQL (in used version) uses "table locking", so only one single writer on the same time, and write operation is exclusive (no read during it). If you use your own database you have less reasons to blame others :))
So, desktop running dim_STAT server may be heavy used as not used at all - it depends only on your activity...
I've started my collects but there is still nothing collected... |
First of all be sure:
- you've installed STAT-service package on this host and _started_ it!
- if you're collecting from Linux host: don't forget it generating different stat output and you need to activate linux Add-Ons first!If everything seems to be correct for you, chack the output of '/etc/STATsrv/log/access.log' file.
Full Working cycle Example |
TBD...