[Orca-users] Re: Orcallator dumping core
Blair Zajac
blair at akamai.com
Fri Feb 9 14:28:55 PST 2001
Patrick,
Can you send the last 50 to 100 lines of both the truss output,
the se -d output and the /etc/mnttab file exactly when se crashes
to make sure that the below description fits?
I think the problem is with the last line shown here in
include/mnt_class.se:
if (initial == 1) {
input = fopen("/etc/mnttab", "r");
if (input == 0) {
number$ = -1;
return;
}
initial = 0;
return;
}
if (number$ == last) {
return;
}
last = number$;
if (last == 0) {
fseek(input, 0, SEEK_SET);
}
if (fgets(buf, sizeof(buf), input) == nil) {
fseek(input, 0, SEEK_SET);
number$ = -1;
return;
}
strcpy(strchr(buf, '\n'), "");
According to the SE users manual, this code should never be run unless
the user is entirely sure that a buf will contain a '\n':
while(fgets(buf, sizeof(buf), stdin) ! = nil) {
strcpy(strchr(buf, '\n'), "");
puts(buf);
}
In this case, the result of the "strchr" call is never assigned to a
variable and its return value remains uncopied before being sent to the
"strcpy" function. Strcpy then copies the string "" onto the new-line
and turns it to the null character in doing so.
There are several things to do here.
1) Change start_orcallator.se to keep se in the foreground. When it
exists, make a copy of /etc/mnttab or email it somewhere to look at.
Please email me a copy of it and the se -d output.
# Now start the logging.
echo "Starting logging"
$SE $LE_PATCH -DWATCH_OS $WATCH_WEB $libdir/orcallator.se
mailx -s "Bad /etc/mnttab" YOUR at EMAIL_ADDRESS.COM < /etc/mnttab
It would be interesting to see the problem with the file.
2) Another is the edit include/mnt_class.se and change
mnt$() {
char buf[BUFSIZ];
string p;
to
mnt$() {
char buf[BUFSIZ<<4];
string p;
in the hope that the line does have a \n but that the lines are too
long for the buffer.
3) Change the last line from
strcpy(strchr(buf, '\n'), "");
to
if (strchr(buf, '\n') != nil) {
strcpy(strchr(buf, '\n'), "");
}
This isn't optimal since it will search for '\n' twice, but it
should work.
Regards,
Blair
Patrick Aland wrote:
>
> Ok, I'm running
> se - Version 3.1 (pre-fcs) (10:39 AM 03/31/99) for sparcv9 SunOS 5.7
> I'm using the orcallator.se that came with the .26 tar
>
> After running se in debug mode twice it appears to be dieing during one of the filesystem checks,
> Run 1:
> if (last<120> == <0>)
> if (fgets(buf<tophat:/export/home1/kharman>, sizeof(buf<tophat:/export/home1/kharman>), input<4296373264>) == <(nil)>)
> strcpy(strchr(buf<tophat:/export/home1/abaker\t/home/abaker\tnfs\t>, <10>), <>)
>
> Run 2:
> if (last<120> == <0>)
> if (fgets(buf<tophat:/export/home2/jpim>, sizeof(buf<tophat:/export/home2/jpim>), input<4296373264>) == <(nil)>)
> strcpy(strchr(buf<tophat:/export/home1/mdemurga\t/home/mdemurga\tnfs\tdev=313c5>, <10>), <>)
>
> Those are the last 3 lines of output from the two runs before if segfaulted.
>
> Running it through truss we get:
> statvfs("/var/log", 0x10019E300) = 0
> statvfs("/var/news", 0x10019E300) = 0
> statvfs("/var/mail", 0x10019E300) = 0
> statvfs("/export", 0x10019E300) = 0
> read(9, 0x1007595C4, 8192) = 0
> Incurred fault #6, FLTBOUNDS %pc = 0xFFFFFFFF7EB40070
> siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
> Received signal #11, SIGSEGV [default]
> siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
> *** process killed ***
>
> As I said in my original post the timing where completely different for all three.
> Run 1 went almostand hour, Run 2 about 3 minutes, Run 3 about 25.
>
> As far as PCP goes I don't believe it has the ability to monitor solaris simply because it uses the /proc filesystem. It doesn however have a plugin architecture to write new namespaces (some have been written to monitor Cisco equip, orcale db's, etc) so it might be possible to extend it. I have found that I can't get all the stats from it that se can get and unfortunately a kernel patch is required to get stats on a partition level basis.
>
> --Patrick
>
> On Wed, Jan 24, 2001 at 02:55:41PM -0800, Blair Zajac wrote:
> > First, which version of SE and orcallator.se are you using? To track
> > this down, I'd do two things. Run it with the -d SE flag and then
> > under truss (probably separately since there will be a lot of output):
> >
> > se -d -DWATCH_OS orcallator.se 5
> > truss se -DWATCH_OS orcallator.se 5
> >
> > Let's see what is causing the failure.
> >
> > I haven't heard of PCP but it looks interesting. Could it be
> > packaged so that Orca would have an OS independent data collector?
> > This may be similar to libgtop which I was hoping would provide
> > an OS independent data collector but there's been no work on.
> >
> > Regards,
> > Blair
> >
>
> --
> ------------------------------------------------------------
> Patrick Aland paland at stetson.edu
> Network Administrator Voice: 904.822.7217
> Stetson University Fax: 904.822.7367
> ------------------------------------------------------------
More information about the Orca-users
mailing list