orca - Make HTML & PNG plots of daily, weekly, monthly & yearly data
orca [-gifs] [-o] [-r] [-v [-v [-v]]] configuration_file
Orca is a tool useful for plotting arbitrary data from text files onto a directory on Web server. It has the following features:
* Configuration file based. * Reads white space separated data files. * Watches data files for updates and sleeps between reads. * Finds new files at specified times. * Remembers the last modification times for files so they do not have to be reread continuously. * Can plot the same type of data from different files into different or the same PNGs. * Different plots can be created based on the filename. * Parses the date from the text files. * Create arbitrary plots of data from different columns. * Ignore columns or use the same column in many plots. * Add or remove columns from plots without having to deleting RRDs. * Plot the results of arbitrary Perl expressions, including mathematical ones, using one or more columns. * Group multiple columns into a single plot using regular expressions on the column titles. * Creates an HTML tree of HTML files and PNG plots. * Creates an index of URL links listing all available targets. * Creates an index of URL links listing all different plot types. * No separate CGI set up required. * Can be run under cron or it can sleep itself waiting for file updates based on when the file was last updated.
Orca is similar to but substantially different from other tools that record and display hourly, daily, monthly, and yearly data, such as MRTG and Cricket. To see these other tools, examine
http://ee-staff.ethz.ch/~oetiker/webtools/mrtg/mrtg.html
and
http://www.munitions.com/~jra/cricket/
A static example of Orca is at
http://www.orcaware.com/orca/orca-example/
Please inform me of any other sites using Orca and I will include them here.
Orca has only four command line options. They are:
-gifs: Generate GIFs instead of PNGs. Tell Orca to generate GIFs instead of PNGs. You may not want to generate GIFs since PNGs are 1/3 the size of GIFs and take less time to generate.
-o: Once. This tells Orca to go through the steps of finding files, updating the RRDs, updating the PNGs, and creating the HTML files once. Normally, Orca loops continuously looking for new and updated files.
-r: RRD only. Have Orca only update its RRD files. Do not generate any HTML or PNG files. This is useful if you are loading in a large amount of data in several invocations of Orca and do not want to create the HTML and PNG files in each run since it is time consuming.
-v: Verbose. Have Orca spit out more verbose messages. As you add more -v's to the command line, more messages are sent out. Any more than three -v's are not used by Orca.
After the command line options are listed, Orca takes one more argument which is the name of the configuration file to use. Sample configuration files can be found in the sample_configs directory with the distribution of this tool.
Orca, when it received the HUP signal, will look for new source data files the next time it runs through the main loop. If you have a constantly running Orca, this is a simpler and faster solution than restarting Orca, which takes time to reread all the source files.
Because Orca is extremely IO intensive, I recommend that the host that locally mounts the RRD data files be the same machine that runs Orca. In addition, the HTML and image files that Orca creates also require a good amount of IO. The machine running Orca should always have the rrd_dir directory locally mounted. It is more important this rrd_dir be locally stored than html_dir for performance concerns. The two options html_dir and rrd_dir are described in more detail below.
The first step in using Orca is to set up a configuration file that instructs Orca on what to do. The configuration file is based on a key/value pair structure. The key name must start at the beginning of a line. Lines that begin with whitespace are concatenated onto the last key's value.
There are three main groups of options in a Orca confg: general options, file specific options, and plot specific options. General options may be used by the file and plot specific options. If an option is required, then it is only placed one time into the configuration file.
General options break down into two main groups, required and options. These are the required options:
Each entry for a data input file is roughly 100 bytes, so for small sites, this file will not be large.
If directory does not begin with a / and the base_dir option was set, then the base_dir directory will be prepended to directory.
If directory does not begin with a / and the base_dir option was set, then the base_dir directory will be prepended to directory.
If rrd_dir is not defined, then base_dir will be used as rrd_dir. Orca will quit with an error if both rrd_dir and base_dir are not set.
This is useful for allowing the data files to update somewhat later than they would in an ideal world. For example, to add a 10% overhead to the sampling_interval before an input file is considered late, this would be used
late_interval 1.1 * interval
By default, the input file's sampling interval is used as the late_interval.
1) When a file did exist and now is gone. 2) When a file was being updated regularly and then no longer is updated.
By default, nobody is emailed.
< < #MetaDir .web --- > > MetaFiles on > MetaDir .
< #MetaSuffix .meta --- > MetaSuffix .meta
By default, expiration of images is not enabled.
find_times 0:10
would work.
By default, files are only searched for when Orca starts up.
sub_dir 1
The next step in configuring Orca is telling where to find the files to use as input, a description of the columns of data comprising the file, the interval at which the file is updated, and where the measurement time is stored in the file. This is stored into a group.
A generic example of a group and its options are:
group GROUP_NAME1 { find_files filename1 filename2 ... column_description column1_name column2_name ... date_source file_mtime interval 300 . . . }
group GROUP_NAME2 { . . }
The key for a group, in this example GROUP_NAME1 and GROUP_NAME2, is a descriptive name that is unique for all files and is used later when the plots to create are defined. Files that share the same general format of column data may be grouped together. The options for a particular group must be enclosed in the curly brackets {}'s. An unlimited number of groups may be listed.
find_files /data/source1 /data/source2
will have Orca use /data/source1 and /data/source2 as the inputs to Orca. This could have also been written as
find_files /data/source\d
and both data files will be used.
In the two above examples, Orca will assume that both data files represent data from the same source. If this is not the case, such as source1 is data from one place and source2 is data from another place, then Orca needs to be told to treat the data from each file as distinct data sources. This be accomplished in two ways. The first is by creating another group { ... } set. However, this requires copying all of the text and makes maintenance of the configuration file complex. The second and recommend approach is to place ()'s around parts of the regular expression to tell Orca how to distinguish the two data files:
find_files /data/(source\d)
This creates two groups, one named source1 and the other named source2 which will be plotted separately. One more example:
find_files /data/solaris.*/(.*)/percol-\d{4}-\d{2}-\d{2}(?:\.(?:Z|gz|bz2))?
will use files of the form
/data/solaris-2.6/olympia/percol-1998-12-01 /data/solaris-2.6/olympia/percol-1998-12-02.Z /data/solaris-2.5.1/sunridge/percol-1998-12-01.gz /data/solaris-2.5.1/sunridge/percol-1998-12-02
and treat the files in the olympia and sunridge directories as distinct, but the files within each directory as from the same data source.
You'll notice that all but the first () has the form (?:...). This tells Perl to match the expression but not save the matched text in the $1, $2, variables. Orca uses the matched text to generate a subgroup name, which is used to place files into different subgroups. Here, only the hostname should be used to generate a subgroup name, hence all the (?:...) for matching anything else.
If any of the paths or regular expressions given to find_files do not begin with a / and the base_dir option was set, then the base_dir directory will be prepended to the path or regular expression.
column_description date in_packets/s out_packets/s
Files that have a column description as the first line of the file may use the argument ``first_line'' to column_description:
column_description first_line
This informs Orca that it should read the first line of all the input data files for the column description. Orca can handle different files in the same group that have different number of columns and column descriptions. The only limitation here is that column descriptions are white space separated and therefore, no spaces are allowed in the column descriptions.
sort()
function, which takes the two items to
compare in the package global $a and $b variables instead of the @_
array.
Use of this option has an additional effect on letting Orca know when it can flush data to the RRD files. It determines this when it compares the previously loaded filename to the filename about to be loaded using the filename_compare function. If the result of the comparison is greater than 1, then the data is flushed. If the comparison is equal to or less than 1, then the data is not flushed. Orca uses a value of 1 instead of 0 since there are cases when the filenames should still be ordered but not flushed.
For example, the orcallator.cfg file uses the following subroutine for filenames of the form ``orcallator-2000-02-14'':
sub { my ($ay, $am, $ad) = $a =~ /-(\d{4})-(\d\d)-(\d\d)/; my ($by, $bm, $bd) = $b =~ /-(\d{4})-(\d\d)-(\d\d)/; if (my $c = (( $ay <=> $by) || ( $am <=> $bm) || (($ad >> 3) <=> ($bd >> 3)))) { return 2*$c; } $ad <=> $bd; }
When Orca is about to load a new data file it compares the new filename with the previous name. Using this function, if the year, or month is different, then data gets flushed. If these two are equal but the day divided by 8 is different, then the data gets flushed. So loading orcallator-2000-02-14 followed by orcallator-2000-02-15 will not cause a flush but when orcallator-2000-02-16 is about to be loaded, previously loaded data will be flushed.
If the filename_compare option is not used, then the filenames are sorted using the Perl <=> operator and data is not flushed until all of it is loaded.
The final step is to tell Orca what plots to create and how to create them. The general format for creating a plot is:
plot { title Plot title source GROUP_NAME1 data column_name1 data 1024 * column_name2 + column_name3 legend First column legend Some math y_legend Counts/sec data_min 0 data_max 100 . . }
Unlike the group, there is no key for generating a plot. An unlimited number of plots can be created.
Some of the plot options if they have the two characters %g or %G will perform a substitution of this substring with the group name from the find_files ()'s matching. %g gets replaced with the exact match from () and %G gets replaced with the first character capitalized. For example, if
find_files /(olympia)/data
was used to locate a file, then %g will be replaced with olympia and %G replaced with Olympia. This substitution is performed on the title and legend plot options.
Two forms of arguments to data are allowed. The first form allows arbitrary Perl expressions, including mathematical expressions, that result in a number as a data source to plot. The expression may contain the names of the columns as found in the group given to the source option. The column names must be separated with white space from any other characters in the expression. For example, if you have number of bytes per second input and output and you want to plot the total number of bits per second, you could do this:
plot { source bytes_per_second data 8 * ( in_bytes_per_second + out_bytes_per_second ) }
The second form allows for matching column names that match a regular expression and plotting all of those columns that match the regular expression in a single plot. To tell Orca that a regular expression is being used, then only a single non whitespace separated argument to data is allowed. In addition, the argument must contain at least one set of parentheses ()'s. When a regular expression matches a column name, the portion of the match in the ()'s is placed into the normal Perl $1, $2, etc variables. Take the following configuration for example:
group throughput { find_files /data/solaris.*/(.*)/percol-\d{4}-\d{2}-\d{2} column_description hme0Ipkt/s hme0Opkt/s hme1Ipkt/s hme1Opkt/s hme0InKB/s hme0OuKB/s hme1InKB/s hme1OuKB/s hme0IErr/s hme0OErr/s hme1IErr/s hme1OErr/s . . }
plot { source throughput data (.*\d)Ipkt/s data $1Opkt/s . . }
plot { source throughput data (.*\d)InKB/s data $1OuKB/s . . }
plot { source throughput data (.*\d)IErr/s data $1OErr/s . . }
If the following data files are found by Orca
/data/solaris-2.6/olympia/percol-1998-12-01 /data/solaris-2.6/olympia/percol-1998-12-02 /data/solaris-2.5.1/sunridge/percol-1998-12-01 /data/solaris-2.5.1/sunridge/percol-1998-12-02
then separate plots will be created for olympia and sunridge, with each plot containing the input and output number of packets per second.
By default, when Orca finds a plot set with a regular expression match, it will only find one match, and then go on to the next plot set. After it reaches the last plot set, it will go back to the first plot set with a regular expression match and look for the next data that matches the regular expression. The net result of this is that the generated HTML files using the above configuration will have links in this order:
hme0 Input & Output Packets per Second hme0 Input & Output Kilobytes per Second hme0 Input & Output Errors per Second hme1 Input & Output Packets per Second hme1 Input & Output Kilobytes per Second hme1 Input & Output Errors per Second
If you wanted to have the links listed in order of hme0 and hme1, then you would add the flush_regexps option to tell Orca to find all regular expression matches for a particular plot set and all plot sets before the plot set containing flush_regexps before continuing on to the next plot set. For example, if
flush_regexps 1
were added to the plot set for InKB/s and OuKB/s, then the order would be
hme0 Input & Output Packets per Second hme0 Input & Output Kilobytes per Second hme1 Input & Output Packets per Second hme1 Input & Output Kilobytes per Second hme0 Input & Output Errors per Second hme1 Input & Output Errors per Second
If you wanted to have all of the plots be listed in order of the type of data being plotted, then you would add ``flush_regexps 1'' to all the plot sets and the order would be
hme0 Input & Output Packets per Second hme1 Input & Output Packets per Second hme0 Input & Output Kilobytes per Second hme1 Input & Output Kilobytes per Second hme0 Input & Output Errors per Second hme1 Input & Output Errors per Second
The following options are plot optional. Like the data option, multiple copies of these may be specified. The first option of a particular type sets the option for the first data option, the second option refers to the second data option, etc.
If the data_type is not specified for a data option, it defaults to GAUGE.
If you want to specify the second data sources minimum and maximum but do not want to limit the first data source, then set the number's to U. For example:
plot { data column1 data column2 data_min U data_max U data_min 0 data_max 100 }
required 1
in the options for a particular plot. In this case, Orca will record a *UNKNOWN* value for all invalid data.
The following options should be specified multiple times for each data source in the plot.
Orca makes very heavy use of references to hashes and arrays to store all of the different data it uses.
The Digest::MD5 module is used to cache the result of some expensive calculations that commonly could be performed more than once. In particular, this arises when the same code is used to pull data from many different input data files into the same type of data structures. In this case, the code to be evaluated is run through MD5, where the resulting binary code is used as a key in a hash with the value being the anonymous subroutine array. This saves in memory and in processing time.
Four mailing lists exist for Orca. To subscribe to any of the mailing lists, please visit the URL below. You have the option of choosing a digest form of the mailing list if you wish it when you subscribe to the mailing list or anytime thereafter. To send email to any of these lists you must subscribe to the list.
orca-announce
Subscribe http://www.onelist.com/subscribe/orca-announce Archive http://www.onelist.com/archive/orca-announce
The orca-announce@onelist.com mailing list is a LOW volume moderated mailing list for announcing stable releases of Orca.
Subscribe http://www.onelist.com/subscribe/orca-users Archive http://www.onelist.com/archive/orca-users
The orca-users@onelist.com is a first stop mailing list for getting help in setting up and getting Orca running. Problems relating to downloading, configuring, compiling the necessary Perl modules, and installing Orca belong here. People interested anything more than this, such as developing data gathering modules or active Perl development, should be on one or both of the orca-discuss@onelist.com or orca-developers@onelist.com mailing lists. Once you get Orca running to your satisfaction, you may want to remove yourself from this list.
Subscribe http://www.onelist.com/subscribe/orca-discuss Archive http://www.onelist.com/archive/orca-discuss
The orca-discuss@onelist.com mailing list is for active users of Orca who are doing new interesting things with Orca and want to discuss Orca but are not interested in actively developing Orca source code. These people are also not interested in helping people get Orca running on their systems.
Subscribe http://www.onelist.com/subscribe/orca-developers Archive http://www.onelist.com/archive/orca-developers
The orca-developers@onelist.com mailing list is for hackers of Orca who actually hack and improve Orca.
Please direct all Orca comments and bugs to one of the above mailing lists.
If you wish to contact the author or Orca, Blair Zajac, directly, please email me to at the Orca Users mailing list.