[Orca-users] Orcallator - Segmentation Fault
Brian Poole
pooleb at gmail.com
Thu Sep 7 10:38:31 PDT 2006
Here is all of the information I've been able to gather on the crash
(SE Toolkit 3.4 on Solaris 10). I compiled it fresh using Forte with
debugging enabled. I took a quick look at trying to find where the
problem actually lies but was unable to come up with anything useful.
Here is running the disks.se with debug:
# /opt/RICHPse/bin/se.sparcv9 -d /opt/RICHPse/examples/disks.se
if (count<31> == GLOBAL_diskinfo_size<101>)
dp = *((dirent_t *) ld<4281687704>)
if (dp.d_name<c3t8d24s3> == <.> || dp.d_name<c3t8d24s3> == <..>)
if (!(dp.d_name<c3t8d24s3> =~ <s0$>))
ld = readdir(dirp<4281664128>)
if (count<31> == GLOBAL_diskinfo_size<101>)
dp = *((dirent_t *) ld<4281687736>)
if (dp.d_name<c3t8d24s4> == <.> || dp.d_name<c3t8d24s4> == <..>)
if (!(dp.d_name<c3t8d24s4> =~ <s0$>))
ld = readdir(dirp<4281664128>)
if (count<31> == GLOBAL_diskinfo_size<101>)
dp = *((dirent_t *) ld<4281687768>)
if (dp.d_name<c3t8d24s5> == <.> || dp.d_name<c3t8d24s5> == <..>)
if (!(dp.d_name<c3t8d24s5> =~ <s0$>))
ld = readdir(dirp<4281664128>)
if (count<31> == GLOBAL_diskinfo_size<101>)
dp = *((dirent_t *) ld<4281687800>)
Segmentation Fault (core dumped)
So tracking that back shows the segfault occurs on line 215 of
include/diskinfo.se:
for (ld = readdir(dirp); ld != 0; ld = readdir(dirp)) {
// grow the array if needed
if (count == GLOBAL_diskinfo_size) {
GLOBAL_diskinfo_size += 4;
GLOBAL_disk_info = renew GLOBAL_disk_info[GLOBAL_diskinfo_size];
}
dp = *((dirent_t *) ld); <---------
Also the truss output:
# truss -fo /tmp/truss.log /opt/RICHPse/bin/se.sparcv9
/opt/RICHPse/examples/disks.se
# tail -15 /tmp/truss.log
5967: ioctl(4, KSTAT_IOC_READ, "sd3547,err") = 701015
5967: ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000) = 701015
5967: ioctl(4, KSTAT_IOC_READ, "sd2146,err") = 701015
5967: ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000) = 701015
5967: ioctl(4, KSTAT_IOC_READ, "sd2177,err") = 701015
5967: ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000) = 701015
5967: ioctl(4, KSTAT_IOC_READ, "sd3935,err") = 701015
5967: ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000) = 701015
5967: ioctl(4, KSTAT_IOC_READ, "sd1971,err") = 701015
5967: ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000) = 701015
5967: ioctl(4, KSTAT_IOC_READ, "sd1972,err") = 701015
5967: Incurred fault #6, FLTBOUNDS %pc = 0xFF2E08EC
5967: siginfo: SIGSEGV SEGV_MAPERR addr=0xFF356000
5967: Received signal #11, SIGSEGV [default]
5967: siginfo: SIGSEGV SEGV_MAPERR addr=0xFF356000
And perhaps more indicative, the trace:
# /opt/SUNWspro/bin/dbx /opt/RICHPse/bin/se.sparcv9 core
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.5' in your .dbxrc
Reading se.sparcv9
core file header read successfully
Reading ld.so.1
Reading libkvm.so.1
Reading libkstat.so.1
Reading libdl.so.1
Reading libelf.so.1
Reading libgen.so.1
Reading libm.so.2
Reading libsocket.so.1
Reading libnsl.so.1
Reading libc.so.1
Reading libc_psr.so.1
Reading libmp.so.2
Reading libmd5.so.1
Reading libscf.so.1
Reading libdoor.so.1
Reading libuutil.so.1
Reading librt.so.1
Reading libaio.so.1
program terminated by signal SEGV (no mapping at the fault address)
0xff2e08ec: _memcpy+0x042c: ldd [%o1], %c2
Current function is member_fill
dbx: warning: can't find file "/tmp/se-src/run.c"
dbx: warning: see `help finding-files'
(dbx) where
[1] _memcpy(0x129938, 0xff356000, 0x8, 0xfffffffa, 0x4, 0x1), at 0xff2e08ec
=>[2] member_fill(vp = 0x1297f0, area = 0xff355ef8 "", bias = 0), line
994 in "run.c"
[3] struct_fill(vp = 0x1296b0, area = 0xff355ef8 "", bias = 0), line
1043 in "run.c"
[4] run_indirection(sp = 0xffbfc4b8), line 1308 in "run.c"
[5] run_call(sp = 0xffbfc4b8), line 1608 in "run.c"
[6] resolve_expression(vp = 0xffbfcae0, ep = 0x129620, runit = 1),
line 2892 in "run.c"
[7] run_assign(sp = 0x127530), line 1675 in "run.c"
[8] run_statement_list(lp = 0x127510), line 513 in "run.c"
[9] run_for(sp = 0x12c078), line 2538 in "run.c"
[10] run_statement_list(lp = 0x127330), line 513 in "run.c"
[11] run_for(sp = 0x12c0b8), line 2538 in "run.c"
[12] run_statement_list(lp = 0x121208), line 513 in "run.c"
[13] run_block(bp = 0x133288), line 402 in "run.c"
[14] run_call(sp = 0xffbfcec8), line 1625 in "run.c"
[15] resolve_expression(vp = 0xffbfd450, ep = 0x13cd80, runit = 1),
line 2892 in "run.c"
[16] resolve_l_expression(ep = 0x13ae18), line 2659 in "run.c"
[17] run_if(sp = 0x13cf88), line 523 in "run.c"
[18] run_statement_list(lp = 0x13cf88), line 513 in "run.c"
[19] run_block(bp = 0x1426f8), line 402 in "run.c"
[20] se_run(argc = 1, argv = 0x74b88), line 366 in "run.c"
[21] main(argc = 2, argv = 0xffbffcc4), line 542 in "main.c"
*vp = {
var_flags = VF_MEMBER
var_special = 0
var_type = VAR_CHAR
var_struct = (nil)
var_name = 0xc44f0 "d_name"
var_qname = (nil)
var_attach_lib = (nil)
var_address = (nil)
var_initial = (nil)
var_un = {
var_string = 0x129840 "c3t8d24s6"
var_digit = 1218624
var_udigit = 1218624U
var_ldigit = 5233950226120704LL
var_uldigit = 5233950226120704ULL
var_rdigit = 2.5859149987693e-308
var_user = 0x129840
var_array = 0x129840
}
var_dimension = 256
var_subscript = (nil)
var_instances = (nil)
var_offset = 10
var_parent = 0xffbfd588
var_next = (nil)
}
I would be more than happy to provide any additional information on
the problem you might need. Feel free to contact me directly on this
issue.
Thank you,
Brian
On 9/7/06, Cockcroft, Adrian <acockcroft at ebay.com> wrote:
> It should still be possible to avoid the crash by checking for a null at
> the right point.
>
> Is it crashing in kstat read of the iostats, or the devinfo name mapping
> at startup?
>
> Adrian
>
> -----Original Message-----
> From: Dmitry Berezin [mailto:dberezin at surfside.rutgers.edu]
> Sent: Thursday, September 07, 2006 8:43 AM
> To: Cockcroft, Adrian; 'Biju Joseph'; orca-users at orcaware.com
> Subject: RE: [Orca-users] Orcallator - Segmentation Fault
>
> Adrian,
>
> I believe that the actual problem is not with the array sizes, but has
> to do
> with the "stale" disk devices. SE "segfaults" when it tries to access a
> device that is not currently present on the system. That is why the
> problem
> is usually seen on the clustered systems with shared storage or systems
> with
> BCV devices that frequently change their state to offline. A number of
> people had previously reported that rebuilding device tree fixed the
> problem.
>
> I have not had time to look at the code, so I do not know if this could
> be
> solved by changing scripts or SE itself has to be patched.
>
> -Dmitry.
>
>
> > -----Original Message-----
> > From: orca-users-bounces+dberezin=acs.rutgers.edu at orcaware.com
> > [mailto:orca-users-bounces+dberezin=acs.rutgers.edu at orcaware.com] On
> > Behalf Of Cockcroft, Adrian
> > Sent: Thursday, September 07, 2006 11:13 AM
> > To: Biju Joseph; orca-users at orcaware.com
> > Subject: Re: [Orca-users] Orcallator - Segmentation Fault
> >
> > Years ago I fixed the code that looks at disks to resize the array
> > dynamically, I guess that this code got overwritten at some point, but
> its
> > a simple fix, just doesn't look much like C code...
> >
> > You can use the "renew" keyword to make a new array that is bigger and
> > contains the same items, so figure out where its indexing into the
> disk
> > array, check the index and renew the array to be size+10 or something.
> > There's example code in the generic SE disk class, which for some
> reason
> > orcallator doesn't seem to use?
> >
> > I'm not currently working on a Solaris box, so it will take me a while
> to
> > get a setup I could test this fix on, probably a few weeks when I get
> back
> > from a business trip.
> >
> > Adrian
> >
> > -----Original Message-----
> > From: orca-users-bounces+acockcroft=ebay.com at orcaware.com on behalf of
> > Biju Joseph
> > Sent: Thu 9/7/2006 7:28 AM
> > To: orca-users at orcaware.com
> > Subject: [Orca-users] Orcallator - Segmentation Fault
> >
> > Hello All,
> >
> > I am trying to start orcallator on two nodes of VCS cluster ( 4.1 )
> with
> > VxVM 4.1 . Database is on EMC disks. Orcallator is giving segmentation
> > fault.
> >
> > RICHPse version is 3.4 (03:59 PM 01/05/05). I tried using
> orcallator.se
> > 1.36 and 1.37. Both giving same problem.
> >
> > The same combination is working on non clustered systems. All systems
> are
> > Solaris 10
> >
> > Can any of you help.
> >
> > Appreciate your help.
> >
> > Regards
> > Biju K Joseph
> > +91-9866116298
> >
> > _______________________________________________
> > Orca-users mailing list
> > Orca-users at orcaware.com
> > http://www.orcaware.com/mailman/listinfo/orca-users
>
> _______________________________________________
> Orca-users mailing list
> Orca-users at orcaware.com
> http://www.orcaware.com/mailman/listinfo/orca-users
>
More information about the Orca-users
mailing list