Created: 2002-01-21
Last modified: 2011-08-20




dim_STAT User's Guide






by Dimitri


dimitri.kravtchuk@free.fr






Table of contents



Overview...


dim_STAT is a tool for both high-level and detailed, monitoring and performance analysis of Solaris, Linux, and other UNIX systems.

The main features of dim_STAT are:

All STAT data is collected from standard UNIX tools like vmstat, iostat, etc. (or some special ones, like psSTAT for monitoring users and processes activity) and saved in the MySQL database. Collected data is accessed via a web interface and can be presented in several manners (interactive or static graphs, text, HTML tables). Since v.8.1 there is also a way to collect data from other UNIX systems (HP/UX, AIX, MacOSX, etc.)

dim_STAT can be used for the on-line monitoring of one or several hosts at the same time. As well, data can be post loaded from output files of stat commands and analyzed in the same manner. At any time data collection from new stat commands can be added to the tool (via Add-On interface) to enlarge your view on application workloads, RDBMS, your personal STAT program, etc.

By default, dim_STAT interfaces with the following Solaris stats (SPARC and x86):

as well as the following Add-On extensions for both Solaris SPARC/x86 and/or Linux/x86:

The CPU utilization of dim_STAT during collect is very low and even less than standard tools like top or perfbar.


LICENSE
Since v.8.3 dim_STAT is moving to GPLv2 license!
But all old stuff which I have only as binary or other binaries shipped without sources will stay under freeware license.

Installation
The dim_STAT installation package is either delivered as a TAR archive (dim_STAT.tar) or, when on CDs, already "untarred".

Before install: Verify your available diskspace - you will need ~60MB for the initial install, mostly to store Web Server and Database Server data. The database volume will grow according to the number of (future) STAT collections and the web directory may grow with your reports. So reserve enough space for your data ...

During installation: a new user "dim" and a group "dim" will be created. User "dim" is the owner of the dim_STAT database and the web server. In case your system has special rules or restrictions, you may create these manually beforehand, or you may choose other user and group names that are following your system policies. Please, after installation, don't forget to set a password to this user! (otherwise cron is not allowing execution of regular clean-up tasks via 'crontab')...



STAT-service
STAT-service was introduced in dim_STAT since version 3.0 and provides a simple, stable and secure way for on-line collecting of STAT data from Solaris/SPARC, Solaris/x86 and Linux/x86 servers. Since v. 8.1 it's distributed under GPL with source code, so you may compile it now yourself on other platforms to collect data from other UNIX platforms. As a pilot example, a package for HP/UX is provided. And any newly ported kits are of course welcome! Since Jun.2009 there is also available a version of STAT-service daemon rewritten in Perl by Marc KODERER: http://search.cpan.org/~mkoderer/stat_agent-0.09/stat_agent.pl - feel free to try this version too and don't forget to send your comments and RFE to Marc! :-)



Main Page
Now, the installation is finished, the database and the web servers are running. Be sure that the STAT-service is installed and running on all servers you want to monitor. You'll be surprised, but when people are having trouble, in 90% of cases it is just forgetting to start the STAT-service.

Once it's done, you are ready to open a web browser (doesn't matter if it is Java enabled or not) and connect to the dim_STAT web server. The first page contains some links to documentation, presentation, tool history, etc., but the link you'll need to click is "Main Page".

As you already supposed, the Main Page will group all main actions ... and you're right!

I will not present this action by action, but rather functionality by functionality, in order of operation. However, the shortest working cycle is probably still:

  • Starting STAT collect
  • Analyze/Monitor collecting data
  • Stop STAT collect

A few words about the User Interface. Don't be surprised if you will not find any "Back" button once you leave the Main Page. There isn't one! You have to use your browser's navigation back button for it. And it's not because I'm just lazy :)) The reason is simple: dim_STAT uses Java applets to present data in graphical mode, but it seems for every Java applet instance the web browser instantiates a dedicated JVM. And all JVMs will stay in the browser's memory until it will crash with an "out of memory" error. To prevent that, I unfortunately have to force you to use your browser's button.

Since version 7.0 you'll see a small toolbar at the top of your page representing:
- Currently used Database Name
- Short links into Home/ Preferences/ Log Admin



Preferences
The preferences page contains a set of key options used by different parts of the application. The most critical of them are grouped here. All other options (if supported) are "auto-keeping" their last value. If you used dim_STAT before you will notice that there are no graph settings anymore, all graph values are auto-saved each time when you use the graph view.

Note: your browser must accept cookies to make some of the following features working!!

There isn't a global "settings" button, and I didn't want to create too many links. So, each option has its own validation button, don't forget to click it to apply your modifications.

Database - Without any special settings, all collected data is stored in the "Default" database (the real MySQL name is "dim"). However, to avoid possible contention and simplify further administration, it's highly recommended to use different databases for different projects/ users/ centers/ etc. Within the Database section you can choose the name of the existing database you want to use or you can create a new one and use it instead. Since v.8.3 there is a possibility to add an Admin password while creating a new database - all administration action then will require giving this password (start/ stop/ restart of collects, data drop, etc.). As a reminder, the current database name is shown in the browser's title and the toolbar of every dim_STAT window.

Free and Used disk space - Showed for the current database. ( Note: MySQL has a quite small storage footprint, so disk space usage will be most reasonable, but it's a good habit to check from time to time if you still have a disk space! (since v.8.2 datafiles are configured to be able to reach 2TB in size (seems enough, no? ;-))...

Host Name List - Here you can specify a pre-defined list of the servers you usually monitor. This list is saved within a database, so every person using the same database may reuse it; as well if you switch databases time to time in your browser your host list will be changed automatically! Since v.8.2 the host "aliasing" is added: the complete syntax for the host name is [alias/]hostname[:port]

Example:

Bookmark Term - If you have never used dim_STAT before, just leave it as it is. For others, this option was created to satisfy everyone who prefers a different name for "Bookmark" functionality. Bookmarks were introduced in version 4.0, but after long discussions we still have no agreement on the right name. So, now you're free to name it as you like! :))

LOG Messages option - Gives you a way to set:

Page Colors - You're free to play with page colors if you're not happy with the default settings or simply prefer to change it from time to time.

Check Java support - A simple way to check if the dim_STAT applet is working correctly with your browser.



Start On-Line Collecting
Before starting any STAT collect, first check if the STAT-service is running on every server you want to monitor. This is the most common error!!

Another point, if you want to monitor a Linux server, be sure you've installed the Linux STAT Add-Ons, before starting any collect (see the special Linux section in this document).

Now, from the dim_STAT Main Page you may just follow the Start New Collect link. (Note: since version 8.0 there is no distinction anymore between single and multi host collect).

IMPORTANT:



EasySTAT
Since dim_STAT version 7.0, the EasySTAT script makes part of the STAT-service for Solaris. EasySTAT is designed to simplify the combination of collecting STATs on "very remote" or "highly secured" hosts with BatchLOAD.

In a few words all you need to do is:

EasySTAT Usage (v.1.9)
   $ /etc/STATsrv/bin/EasySTAT.sh  OutDIR IntervalSec NbHours [Title [Hostname [DBname [Batch [Log]]]]]

options: OutDIR - Output directory for stat collects (def: /var/tmp) Interval - measurement interval for stat commands in sec. (default: 15) NbHours - execution duration in hours (default: 8 hours) Title - title to use during BatchLOAD processing Hostname - hostname to use during BatchLOAD processing DBname - database name to use during BatchLOAD processing Batch - full path to BatchLOAD binary on your server (default: /apps/ADMIN/BatchLOAD) Log - log file name (if given, all processing output is forwarded into this file) NOTE: may also be enabled via LOG environment variable (see EasySTAT.sh for details)



EasySTAT Config

By default script collects 5 main stats:

Additional Options

NOTE : since v.8.3-1 both COMPRESS and TIMER options are included within EasySTAT.sh script by default !!! - it's preferable to have compressing and timestamps out of the box to avoid any space overflow as well a faster text file analyzing. However be aware you have to edit EasySTAT.sh file to disable them (but at least you know what you're doing :-))

BatchLOAD
The idea for BatchLOAD came (as many things) from day to day needs. Sometimes you are facing customers/users who want to know what happens on their machines, but then they don't allow the installation of any additional software (a very constructive approach :-)).

All you can do now is to ask them to run some stat commands on their systems and send you the output files. While loading their files every day via the Web interface, you start to think harder and harder if there isn't a way to do this automatically. Are you ready for BatchLOAD??

I decided to add a new component to dim_STAT, but I kept in mind that other tools already exist that are collecting output from stat commands. All these tools are keeping data in their own format, so I've tried to design the input format for BatchLOAD to be easily adaptable. Of course, I didn't think to create something universal :)), but I hope it shouldn't be too hard to write a script that can convert from an existing format into BatchLOAD.

Some words about the internals of BatchLOAD. There is no dependency on the name of loaded files. All needed information is given by command options and in the contents of the loaded file. The loaded file must have special TAGs. At least two: to give the STAT name and to confirm the END.

USAGE:
Usage: /apps/ADMIN/BatchLOAD -cmd NEW/ADD options 

Options [NEW]: -- force new collect creation -base DBname -- database name -ID id -- Collect ID, if 0 use max+1 id automatically -title Title -- Collect Title -host Hostname -- Collect Host Name -isec sec -- Collect STATs Interval (sec) -start datetime -- Collect Start DateTime in format YYYYMMDDHHMISS -skip1 yes/no -- Yes/No skip first STAT measurement (often wrong values) -file Filename -- Full path to file with STATs outputs -verbose on/off -- verbose output on/off
Options [ADD]: -- add to existing collect whenever possible -base DBname -- database name -host Hostname -- Collect Host Name (optional) -ID id -- Collect ID, if 0 : -- if host is given - use max id used by host -- otherwise, use max (last) id automatically -skip1 yes/no -- Yes/No skip first STAT measurement (often wrong values) -file Filename -- File with STATs outputs -verbose on/off -- verbose output on/off

Example :
$ /apps/ADMIN/BatchLOAD -cmd NEW -ID 0 -base ANT -file `pwd`/vmstat.out -skip1 no -title "Test BatchLOAD" -host V880 -isec 20 -start 20031024100000
$ /apps/ADMIN/BatchLOAD -cmd ADD -ID 0 -base ANT -file `pwd`/iostat.out -skip1 no
$ /apps/ADMIN/BatchLOAD -cmd ADD -ID 0 -base ANT -file `pwd`/mpstat.out -skip1 no -verbose on

In this example the first line will create a new STAT Collect using an automatic new ID (max+1), with the title "Test BatchLOAD" and it will load the first file: "vmstat.out" The second and third lines load into the new Collect the next data, "iostat.out" and "mpstat.out". Once it is finished, we can connect to the dim_STAT web server and start to analyze.

Note : multiple "-file" options can be used at the same time. For example:
$ /apps/ADMIN/BatchLOAD -cmd NEW -ID 0 -base ANT -skip1 no -title "Test BatchLOAD" 
        -host V880 -isec 20 -start 20031024100000 -file `pwd`/vmstat.out 
        -file `pwd`/mpstat.out -file `pwd`/iostat.out 


File Format of STAT output

The file format is designed in such a way as to give maximum flexibility on data grouping and processing.

The main TAGs are STAT and END:

==> STAT StatName -- after this point all following data corresponds to given STAT command (StatName) Supported STAT names: VMSTAT MPSTAT IOSTAT (iostat -x) IOSTAT-xn (iostat -xn) VXSTAT (vxstat -v) psSTAT
And all other Add-On STAT you are able to create, like some already shipped:
netLOAD T3stat oraEXEC oraIO ...

==> END -- end of STAT data

At any time the following TAGs may also be inserted:
==> DTSET yyyy-mm-dd hh:mi:ss      -- set date+time point for next STAT data

==> LOGMSG message -- add log message into database corresponding to the currently loading data

Outside of the "STAT" - "END" blocks, any other lines are ignored.

Note : TAGs are exactly as it shown: "==> STAT", "==> END", "==> DTSET", "==> LOGMSG". Don't miss any characters!

Analyzing
Analyzing your STAT data is quite intuitive, but let's just give some screen shots and few words of comment.

Once you click on the "Analyze" link you have 3 options:

Let's take for now the Multi-Host option, as it's the easier one :-)

There are some other additional options:

Multi-Host Analyzing
Multi-Host analyzing is simpler than Single-Host analyzin and a good point to start.

NOTE: some screenshots may not be 100% up to date and don't matching exactly the latest dim_STAT version.

Main point: as we want to see several hosts at the same time and on the same graph, we cannot show more than one single stat-value per graph, however there can be several graphs viewed on the same page.

In general:

Single-Host Analyzing
Single-Host Analyzing is very similar to Multi-Host, but gives a wider variety of parameters as it is working only with one particular STAT collect. Let's use as an example the Demo collect, which is provided with the dim_STAT database and let's analyze IOSTAT data.

Open your browser and follow step by step how we're connecting to the dim_STAT server.



Bookmarks
Most of the bookmarks are pre-defined to save your time. Their number may vary from release to release, but never forget, you can always create your own and keep them as your specific kit. And you can easily move them from one base to another.

People very quickly are starting to use only bookmarks and then sometimes they are lost: "Oh, there is no way to see per network interface activity!" or, "no way to see a single process, only top-10!" But don't forget, all data is there, just go directly to the STAT interface and you'll find them. Then create new bookmarks covering other needs and you're all set.

Multi-Host Extended Analyze
Since v.8.5 the Extended Multi-Host Analyze was introduced - it combines the traditional Multi-Host options with per host Bookmarks. Probably the most sophisticated way now to analyze a server performance :-) but it gives you all the needed information grouped on the one single page :-) As well the Bookmarks links are also present now on demand - so at any time you may get a more detailed graphs while analyzing on the Multi-Host :-)
dim_STAT CLI
I was really surprised by the strong demand by users for a dim_STAT CLI solution! It seems a Web interface is not making everybody happy :))

And here we are, with version 8.1 there is a CLI module in dim_STAT :)

# /apps/ADMIN/dim_STAT-CLI
  
  dim_STAT CLI v.1.7
  Usage: dim_STAT-CLI  [options] 
    Options:
       -Base DBname
       -ID CollectID               (if empty: prints available Collect list)
       -Stat Name                  (if empty: prints available Stat list)
       -Begin YYYYMMDDhhmiss 
       -End YYYYMMDDhhmiss 
       -Out fname

optional: -Title graphtitle (if empty: uses Collect title) -Width size (if empty: uses default graph width) -Height size (if empty: uses default graph height) -AVG number (use average for too wide graphs) -Data filename save also raw stat data into file

For the moment it gives you a way to generate a single graph in PNG format for a given Database, CollectID and Time interval. Stat names are corresponding directly to your Bookmarks in your Database, so the more Bookmarks you have, the more graphs you may generate.

Since v.9.0 if you're using several Collect IDs on the same time (ID1,ID2,ID3,..) dim_STAT-CLI will propose you to use Multi-Host stats and draw Multi-Host graphs! ;-))

Administration
Several administration points were already covered in previous sections. Let's speak about some other, more oriented on day to day management...



Add-On Statistics

One of the most powerful features of dim_STAT is the ability to integrate your own statistic programs with the tool. Once added, they will be considered by dim_STAT as being the same as the standard set of STAT(s) and give you the same kind of service: Online Monitoring, Up-Loading, Analyzing, Reporting, etc.

However, the choice of external stat programs is so wide that it's quite impossible to design a wrapper for each and every format. Therefore, I've decided to limit the input recognizer to just 2 formats (which covers maybe 95% of needs) and leave it to you to write, if necessary, your own wrapper and modify the output to one of the supported formats.

Formats supported by dim_STAT:

     - SINGLE-Line: with one output line per measurement (ex: vmstat)

     - MULTI-Line: with several output lines per measurement (ex: iostat)

To be correctly interpreted, your stat program should produce a stable output. This means the same format for data lines, at least one line in case of MULTI, keep the time-out interval constant, etc. Lines not containing data have to be declared, so that they can be ignored by dim_STAT.

NOTE: lines shorter than 4 characters are considered as "spam" and will be ignored!

Let's look at some examples...



Linux Special Notes
I don't know if it will surprise you that all dim_STAT binaries for Solaris SPARC until now were compiled on the same old and legendary SPARCstation-5, which runs Solaris 2.6 and that they still work on every next generation Sun SPARC machines. This includes the last generation, and Solaris 10. Some unchanged binaries are still here and are even 10 years old! This is calling a TRUE binary compatibility! :))

Now, can I say the same thing about Linux??? Sometimes, even the same vendor breaks binary compatibility between previous and next distributions!

Because the main problem lies with the different implementations of shared libraries, I've recompiled all main dim_STAT programs as static binaries to be sure they will run on every distribution. Over time, things got worse: static binaries are core dumping on some distros. Therefore, the current dim_STAT Linux version ships with both dynamic and several static versions of the same binary generated on the different distros.

dim_STAT reported to work out-of-the-box on MEPIS 3.3.1-1, MEPIS 6.0/7.0, Debian 3/4, RHEL 4.x/5.x, CentOS 4.x/5.x, OEL 5.x/6.x, SuSE 9/10/11/12, Fedora Core. Anyway, if you encounter any problems during installation or execution of dim_STAT, please, contact me directly and we'll try to fix the issue together. Last years many Linux vendors have stopped even to ship system libraries to run 32bit programs on their 64bit distributions.. - keep it in mind if you're planning to install dim_STAT on a 64bit Linux, you may will need to add 32bit packages then like: glibc.i686 / libc6-i386, libzip.i686/ lib32z1, libX11, libssl, libcrypto, libpng12, libjpeg, .. (check for some discussions on the dim_STAT Users Group @Google: http://groups.google.com/group/dimstat )

NOTE: PC boxes are quite cheap nowadays. So rather than trying to fix issue after issue, ask yourself if buying a $300 PC, installing MEPIS-6.0 or openSUSE-11.2 32bit on it (10 minutes), installing dim_STAT (5 minutes) and starting the collection of stats from all your servers, will not be a cheaper, easier and simpler solution.

And Again: why you simply don't use Solaris/OpenSolaris and just avoid all such kind of problems?... :-) There is even Pocket Solaris available (http://milax.org) - 300MB full install + 60MB dim_STAT = all other disk space to use securely with ZFS and collect data from your servers!... Seriously...

Report Tool
This User's Guide is completely written using Report Tool!! And as so often, this tool was mainly created to cover my own day to day needs.

Quite often I have to write reports to show performance findings, to present the observed system / application activity, etc., etc. Yes, etc. because sometime we have to write too much to make things work or simply to protect people from doing stupid things. :))

OK, you've started to write your document for a French customer, so you write it in French, and then it appears that the majority of the development team only speaks English. You start to keep two copies in parallel for the same document: FR/EN. Then you discover something very important but you can not say it yet your customer, but you absolutely need to communicate it internally. So you split the document once again: FR/EN and Customer/Internal, which means four different documents. The next split will give you eight version of the document. But it is still based on the same source of information. The result is a lot of hours spent doing copy-paste of activity graphs from the browser, teamquest, best1, patrol, etc. into your wordprocessor. It makes me cry... :))

I was really tired of this situation and tried to imagine something different.



Additional Tools
Since version 5, additional tools are shipping with the package, but it seems I forgot to mention them explicitly and a lot of users didn't know about it.



FAQ

Full Working cycle Example
TBD...