[Orca-users] Sun Sparc - Replaced disk and Orca fails
David Devault
DDevault at Elance.com
Thu Aug 11 10:18:08 PDT 2005
Adrian,
Thanks for the information. I am scheduling time to run some additional
tests and to reboot. I should have the iostat information within a
week.
David
-----Original Message-----
From: Cockcroft, Adrian [mailto:acockcroft at ebay.com]
Sent: Thursday, August 11, 2005 8:11 AM
To: David Devault; orca-users at orcaware.com
Subject: RE: [Orca-users] Sun Sparc - Replaced disk and Orca fails
The orcallator code does not use the iostat class code that you
attached. It has its own embedded version of the code based on an older
fixed size algorithm which is why it breaks and why the regular
iostat_class based tools should never have this problem.
The problem you are reporting is one we haven't seen before, there seems
to be a runaway loop going on where the system keeps finding an endless
supply of disks to monitor. What do you get with the command
# truss -c iostat -n | wc -l
There is an ioctl call to devinfo when iostat -n starts that takes a
long time to run on some systems, and wc will count how many disks you
seem to have. You can use iostat -En to see all the info on the disks
and see if it makes sense.
I think the system is in a strange state, since we haven't seen this
happen before. I would reboot, or if you can get iostat to record the
strange state, call in a bug with Sun.
HTH Adrian
-----Original Message-----
From: David Devault [mailto:DDevault at Elance.com]
Sent: Wed 8/10/2005 7:56 PM
To: Cockcroft, Adrian; orca-users at orcaware.com
Cc:
Subject: RE: [Orca-users] Sun Sparc - Replaced disk and Orca
fails
I've added the -DMAX_DISK=<some large number> and this worked for a few
minutes.
The numbers I started at were too low, 100, 500, 1000, 5000, 9999
The only number that works is 99999. That number did not work very well
either. Orca consumed all of the memory on my system and I needed to
kill it.
I previously ran the devfsadm command, apparently that one command is
suppose to do what drvconfig;disks;tapes did in the past. Is the
drvconfig;disk;tapes a better approach?
Maybe the 3.4 se toolkit has better examples than 3.3.1. I attached
the code from version 3.3.1 of se toolkit, let me know if this is what
your referring to. If not I should have 3.4 se tomorrow.
Thanks for the help.
David
_____
From: Cockcroft, Adrian [mailto:acockcroft at ebay.com]
Sent: Wednesday, August 10, 2005 1:24 PM
To: David Devault; orca-users at orcaware.com
Subject: RE: [Orca-users] Sun Sparc - Replaced disk and Orca fails
This is a pretty regular FAQ. The orcallator code makes an array that
isn't always big enough and doesn't resize it.
I posted a suggested fix for the code a few weeks ago.
When Solaris reboots it can rebuild the drive and path to inst stuff.
Sometimes just running # drvconfig; disks; tapes
Will clean it up the same way.
The workaround is to start orcollator with se -DMAX_DISK=<some large
number> ....
And increase the large number until it works.
You can also put the MAX_DISK entry in /opt/RICHPse/etc/se-defines and
it will apply to all SE commands.
Adrian
_____
From: orca-users-bounces+acockcroft=ebay.com at orcaware.com
[mailto:orca-users-bounces+acockcroft=ebay.com at orcaware.com] On Behalf
Of David Devault
Sent: Wednesday, August 10, 2005 11:02 AM
To: orca-users at orcaware.com
Subject: [Orca-users] Sun Sparc - Replaced disk and Orca fails
Hello.
We have orca running on many systems with no problems, until yesterday.
I had a disk fail in a 280R (internal fiber channel drive0, w/ solaris
8). This disk has been replaced and the system is fine, however, orca
died when the disk failed and after the replacement orca will not run.
I think this is more of a SE toolkit / Solaris problem than an orca
problem.
I get this error from nohup.out:
Fatal: subscript: 34 out of range for: GLOBAL_disk_info[34]
I suppose this has something to do with the device names and path
instances. When we reboot this problem goes away and orca works fine.
I'd like to fix this without rebooting.
I've traced this back to the path_to_inst file where the old disk is
still configured. That's not very odd to me because I have seen systems
with three or four old disks still in the path_to_inst, one big
difference is orca was not on the other systems. I've ran the devfsadm
command and the system seems fine. Disk Suite mirrored the new disk and
the system is humming along. Another item to note is that the format
output only contains two internal disks, this is the expected output. I
was thinking that if orca had a problem with detecting the disks then
format might show the old failed disk as well. This is not the case.
The system looks fine other than the old failed disk instance showing up
in the path_to_inst I can't see a reference to the old disk anywhere
else. There must be a scsi cache that the kernel uses with the old disk
device path still in there or something. Anyone have any suggestions?
Thanks,
David
More information about the Orca-users
mailing list