[Orca-users] Re: Digest Number 174

Steve Gilbert gilbert at 8020softwaretools.com
Tue Nov 13 16:45:31 PST 2001


On Thu, 18 Oct 2001, Blair Zajac wrote:

> Can you send me a copy of the scripts you wrote for monitoring your linux
> cluster for inclusion into Orca?  This work looks really interesting and
> useful.

Hi Blair,
	Sorry for the horribly late reply...I've changed jobs
and was out of the Orca loop for a few months.  The script I
wrote is attached below.  I haven't looked at it in a while,
but I think it should be mostly self-explanatory.  As it stands,
the script takes the name of a Linux farm as it's sole argument.
It requires a DB file that is simply a list of all the Linux
machines.  The site I wrote this for had over 20 different
Linux farms, with each hostname being a cryptic number somehow
relating to IP addresses.  So I had to do some funky hostname
comparisons to build the list of hosts in a given farm.  I
would imagine most people could cut all that out and do it a
lot more directly.   The script polls each machine via SNMP
for the desired stats, generates an average of the returned
data, and builds a percol-* file based on that average.  As
I said back in my original post, this is nothing really
profound, but it worked great for me.  There's definitely
lots of room to improve on this and make it less kludgy.
I'll be glad to answer any specific questions.  Also, if
anyone has any suggestions to make this better, I'd love to
hear them...although I doubt I'll get to act on them anytime
soon.


Steve Gilbert             gilbert at 8020softwaretools.com
80/20 Software Tools      650-557-0969
**     Consulting and support for open source        **
** network management and performance analysis tools **

-------------- next part --------------
#!/usr/local/bin/perl -w

# farmallator
# Written by Steve Gilbert <gilbert at 8020softwaretools.com>
#
# Emulates the behavior of the Solaris-specific orcallator.se
# performance monitoring tool for Linux computing farms.  This
# takes the name of a farm as it's argument.  It gathers the
# specified performance data from the machines in that farm via
# SNMP and generates an average of that data.  In this way, Orca
# will treat an entire Linux farm as a single machine.

use strict;
use DB_File;
use Time::localtime;
use SNMP;
use Statistics::Descriptive;

# BEGIN user-configurable vars
# Location of system database...a simple DB file containing a
# list of all the Linux hostnames
my($all_systems) = "/usr/local/linuxfarms/var/snmp-systems.db";
# SNMP community name
my($comm) = "changeme";
# This farm list will have to be maintained by hand.  It would be
# nicer to read this from a config file, but we need to optimize
# for speed.  In this example, all Linux machines in a given farm
# have hostnames that begin with "l98-"
my(%farm_hash) = (farm1      => "l98-2-",
                  farm2      => "l98-3-",
                  bigfarm    => "l98-4-",
                  littlefarm => "l35-4-");
# END user-configurable vars

my(%systems, $machine, @this_farm);
my($host);
my(@f_1runq, @f_5runq, @f_10runq, @f_usr, @f_sys, @f_freemem, @f_swap_avail);
my(@f_uptime);
# Set to '1' for more detailed error log
$SNMP::verbose = 0;

# Make sure we know about the specified farm
my($farm_name) = shift(@ARGV);
unless (exists($farm_hash{$farm_name})) {
    die("No such farm as $farm_name\n");
}
# Output file
my($tm) = localtime;
my($file) = sprintf("percol-%04d-%02d-%02d",
                    $tm->year+1900, ($tm->mon)+1, $tm->mday);
my($percol) ="/usr/local/orca/var/orca/orcallator/linux/$farm_name/$file";
unless (-e $percol) {
    open(PERCOL, ">$percol") || die("Unable to create $percol.\n");
    print(PERCOL " timestamp locltime   uptime  usr%  sys%  1runq  5runq 15runq  swap_avail  freememK alive\n");
    close(PERCOL);
}
# Get the list of machines that belong to the farm in question
dbmopen(%systems, $all_systems, 0644) || die("Could not open $all_systems\n");
foreach $machine (keys(%systems)) {
    if ($machine =~ /^$farm_hash{$farm_name}/) {
        push(@this_farm, $machine);
    }
}
dbmclose(%systems);
# Make sure that we actually got some machines from the DB file...
# otherwise, we will do a lot of work for nothing and slow things down
unless ($#this_farm >= 0) {
    die("Could not find any machines that belong to the $farm_name farm\n");
}
# Now we have the machine list, let's do the actual work
foreach $host (@this_farm) {
    my($session) = new SNMP::Session(DestHost => $host, Community => $comm,
                                     UseSprintValue => 1);
    unless (defined $session) {
        print(STDERR "$host: SNMP session creation error\n");
        next;
    }
    # .1.3.6.1.4.1.2021.10.1.3.1 = laTable.laLoad.1 ( 1 min. average)
    # .1.3.6.1.4.1.2021.10.1.3.2 = laTable.laLoad.2 ( 5 min. average)
    # .1.3.6.1.4.1.2021.10.1.3.3 = laTable.laLoad.3 (10 min. average)
    # .1.3.6.1.4.1.2021.11.9.0   = systemStats.ssCpuUser
    # .1.3.6.1.4.1.2021.11.10.0  = systemStats.ssCpuSystem
    # .1.3.6.1.4.1.2021.4.6.0    = memory.memAvailReal (free physical mem.)
    # .1.3.6.1.4.1.2021.4.4.0    = memory.memAvailSwap (free swap space)
    # .1.3.6.1.2.1.1.3.0         = system.sysUpTime
    my($vars) = new SNMP::VarList(['.1.3.6.1.4.1.2021.10.1.3.1'],
                                  ['.1.3.6.1.4.1.2021.10.1.3.2'],
                                  ['.1.3.6.1.4.1.2021.10.1.3.3'],
                                  ['.1.3.6.1.4.1.2021.11.9.0'],
                                  ['.1.3.6.1.4.1.2021.11.10.0'],
                                  ['.1.3.6.1.4.1.2021.4.6.0'],
                                  ['.1.3.6.1.4.1.2021.4.4.0'],
                                  ['.1.3.6.1.2.1.1.3.0']);
    $session->get($vars);
    if ($session->{ErrorNum} != 0) {
        print(STDERR "$host: SNMP GET error: $session->{ErrorStr}\n");
        next;
    }
    push(@f_1runq, $vars->[0]->val);
    push(@f_5runq, $vars->[1]->val);
    push(@f_10runq, $vars->[2]->val);
    push(@f_usr, $vars->[3]->val);
    push(@f_sys, $vars->[4]->val);
    push(@f_freemem, $vars->[5]->val);
    push(@f_swap_avail, $vars->[6]->val);
    my($d, $h, $m, $s) = split(/:/, $vars->[7]->val);
    $d *= 86400;
    $h *= 3600;
    $m *= 60;
    ($s) = split(/\./, $s);
    $s++;
    my($this_uptime) = ($d + $h + $m + $s);
    push(@f_uptime, $this_uptime);
}
# Now that that's done, let's check one of the arrays to make sure that
# we actually got some data back.  Otherwise, the statistics module will
# crap all over itself if you try to pass it an empty array as data
unless ($#f_uptime >= 0) {
    die("Unable to get results from any machines in the $farm_name farm.\n");
}
my($alive) = ($#f_uptime + 1);
my($timestamp) = time();
my($uptime) = comp_mean(@f_uptime);
my($one_runq) = comp_mean(@f_1runq);
my($five_runq) = comp_mean(@f_5runq);
my($ten_runq) = comp_mean(@f_10runq);
my($usr) = comp_mean(@f_usr);
my($sys) = comp_mean(@f_sys);
my($freemem) = comp_mean(@f_freemem);
my($swap_avail) = comp_mean(@f_swap_avail);
open(PERCOL, ">>$percol") || die("Could not open $percol\n");
printf(PERCOL "%10d %02d:%02d:%02d %8d %3.1f %3.1f %3.2f %3.2f %3.2f %11d %9d %3d\n",
       $timestamp, $tm->hour, $tm->min, $tm->sec, $uptime, $usr, $sys,
       $one_runq, $five_runq, $ten_runq, $swap_avail, $freemem, $alive);
close(PERCOL);

sub comp_mean {
    my($stat) = Statistics::Descriptive::Sparse->new();
    if (defined($stat)) {
        $stat->add_data(@_);
        return $stat->mean();
    } else {
        die("Something is wrong with the statistics module.\n");
    }
}


More information about the Orca-users mailing list