[Orca-users] Orcallator - Segmentation Fault

Biju Joseph biju.joseph at gmail.com
Fri Sep 8 06:49:26 PDT 2006


Today I tried to run orcallator on a different machine which has no cluster
software, but EMC disks attached and VxVM installed. Getting same problem. (
Segmentation Fault )

But as mentioned earlier, It has got installed successfully on machines
which are not having EMC SAN disks.

Any solution is highly appreciated.

Thanks
Biju..


On 9/8/06, Dmitry Berezin <dberezin at acs.rutgers.edu> wrote:
>
> It is failing while dereferencing, but the pointer is not null -
>
> dp = *((dirent_t *) ld<4281687800>)
> > Segmentation Fault (core dumped)
>
> -Dmitry.
>
>
> > -----Original Message-----
> > From: orca-users-bounces+dberezin=acs.rutgers.edu at orcaware.com
> > [mailto:orca-users-bounces+dberezin=acs.rutgers.edu at orcaware.com] On
> > Behalf Of Cockcroft, Adrian
> > Sent: Thursday, September 07, 2006 2:32 PM
> > To: Brian Poole
> > Cc: Dmitry Berezin; orca-users at orcaware.com; Biju Joseph
> > Subject: Re: [Orca-users] Orcallator - Segmentation Fault
> >
> > OK, so it's failing while walking the directory tree, I can see that the
> > renew is already in place a line or so earlier.
> >
> > Its dereferencing a directory structure that isn't there, so a test
> > needs to be added to skip this if readdir returns something bad. Its
> > already testing for null, so there is something bad happening between
> > the null test and the actual usage of the dirp.
> >
> > http://docs.sun.com/app/docs/doc/819-2243/6n4i099g0?q=readdir&a=view
> >
> > I'm not sure how to fix this, maybe a second test for null immediately
> > before it's de-referenced?
> >
> > Adrian
> >
> > -----Original Message-----
> > From: Brian Poole [mailto:pooleb at gmail.com]
> > Sent: Thursday, September 07, 2006 10:39 AM
> > To: Cockcroft, Adrian
> > Cc: Dmitry Berezin; Biju Joseph; orca-users at orcaware.com
> > Subject: Re: [Orca-users] Orcallator - Segmentation Fault
> >
> > Here is all of the information I've been able to gather on the crash
> > (SE Toolkit 3.4 on Solaris 10). I compiled it fresh using Forte with
> > debugging enabled. I took a quick look at trying to find where the
> > problem actually lies but was unable to come up with anything useful.
> >
> > Here is running the disks.se with debug:
> >
> > # /opt/RICHPse/bin/se.sparcv9 -d /opt/RICHPse/examples/disks.se
> > if (count<31> == GLOBAL_diskinfo_size<101>)
> > dp = *((dirent_t *) ld<4281687704>)
> > if (dp.d_name<c3t8d24s3> == <.> || dp.d_name<c3t8d24s3> == <..>)
> > if (!(dp.d_name<c3t8d24s3> =~ <s0$>))
> > ld = readdir(dirp<4281664128>)
> > if (count<31> == GLOBAL_diskinfo_size<101>)
> > dp = *((dirent_t *) ld<4281687736>)
> > if (dp.d_name<c3t8d24s4> == <.> || dp.d_name<c3t8d24s4> == <..>)
> > if (!(dp.d_name<c3t8d24s4> =~ <s0$>))
> > ld = readdir(dirp<4281664128>)
> > if (count<31> == GLOBAL_diskinfo_size<101>)
> > dp = *((dirent_t *) ld<4281687768>)
> > if (dp.d_name<c3t8d24s5> == <.> || dp.d_name<c3t8d24s5> == <..>)
> > if (!(dp.d_name<c3t8d24s5> =~ <s0$>))
> > ld = readdir(dirp<4281664128>)
> > if (count<31> == GLOBAL_diskinfo_size<101>)
> > dp = *((dirent_t *) ld<4281687800>)
> > Segmentation Fault (core dumped)
> >
> > So tracking that back shows the segfault occurs on line 215 of
> > include/diskinfo.se:
> >
> >     for (ld = readdir(dirp); ld != 0; ld = readdir(dirp)) {
> >       // grow the array if needed
> >       if (count == GLOBAL_diskinfo_size) {
> >         GLOBAL_diskinfo_size += 4;
> >         GLOBAL_disk_info = renew GLOBAL_disk_info[GLOBAL_diskinfo_size];
> >       }
> >       dp = *((dirent_t *) ld);     <---------
> >
> > Also the truss output:
> >
> > # truss -fo /tmp/truss.log /opt/RICHPse/bin/se.sparcv9
> > /opt/RICHPse/examples/disks.se
> > # tail -15 /tmp/truss.log
> > 5967:   ioctl(4, KSTAT_IOC_READ, "sd3547,err")          = 701015
> > 5967:   ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000)        = 701015
> > 5967:   ioctl(4, KSTAT_IOC_READ, "sd2146,err")          = 701015
> > 5967:   ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000)        = 701015
> > 5967:   ioctl(4, KSTAT_IOC_READ, "sd2177,err")          = 701015
> > 5967:   ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000)        = 701015
> > 5967:   ioctl(4, KSTAT_IOC_READ, "sd3935,err")          = 701015
> > 5967:   ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000)        = 701015
> > 5967:   ioctl(4, KSTAT_IOC_READ, "sd1971,err")          = 701015
> > 5967:   ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000)        = 701015
> > 5967:   ioctl(4, KSTAT_IOC_READ, "sd1972,err")          = 701015
> > 5967:       Incurred fault #6, FLTBOUNDS  %pc = 0xFF2E08EC
> > 5967:         siginfo: SIGSEGV SEGV_MAPERR addr=0xFF356000
> > 5967:       Received signal #11, SIGSEGV [default]
> > 5967:         siginfo: SIGSEGV SEGV_MAPERR addr=0xFF356000
> >
> > And perhaps more indicative, the trace:
> >
> > # /opt/SUNWspro/bin/dbx /opt/RICHPse/bin/se.sparcv9 core
> > For information about new features see `help changes'
> > To remove this message, put `dbxenv suppress_startup_message 7.5' in
> > your .dbxrc
> > Reading se.sparcv9
> > core file header read successfully
> > Reading ld.so.1
> > Reading libkvm.so.1
> > Reading libkstat.so.1
> > Reading libdl.so.1
> > Reading libelf.so.1
> > Reading libgen.so.1
> > Reading libm.so.2
> > Reading libsocket.so.1
> > Reading libnsl.so.1
> > Reading libc.so.1
> > Reading libc_psr.so.1
> > Reading libmp.so.2
> > Reading libmd5.so.1
> > Reading libscf.so.1
> > Reading libdoor.so.1
> > Reading libuutil.so.1
> > Reading librt.so.1
> > Reading libaio.so.1
> > program terminated by signal SEGV (no mapping at the fault address)
> > 0xff2e08ec: _memcpy+0x042c:     ldd      [%o1], %c2
> > Current function is member_fill
> > dbx: warning: can't find file "/tmp/se-src/run.c"
> > dbx: warning: see `help finding-files'
> > (dbx) where
> >   [1] _memcpy(0x129938, 0xff356000, 0x8, 0xfffffffa, 0x4, 0x1), at
> > 0xff2e08ec
> > =>[2] member_fill(vp = 0x1297f0, area = 0xff355ef8 "", bias = 0), line
> > 994 in "run.c"
> >   [3] struct_fill(vp = 0x1296b0, area = 0xff355ef8 "", bias = 0), line
> > 1043 in "run.c"
> >   [4] run_indirection(sp = 0xffbfc4b8), line 1308 in "run.c"
> >   [5] run_call(sp = 0xffbfc4b8), line 1608 in "run.c"
> >   [6] resolve_expression(vp = 0xffbfcae0, ep = 0x129620, runit = 1),
> > line 2892 in "run.c"
> >   [7] run_assign(sp = 0x127530), line 1675 in "run.c"
> >   [8] run_statement_list(lp = 0x127510), line 513 in "run.c"
> >   [9] run_for(sp = 0x12c078), line 2538 in "run.c"
> >   [10] run_statement_list(lp = 0x127330), line 513 in "run.c"
> >   [11] run_for(sp = 0x12c0b8), line 2538 in "run.c"
> >   [12] run_statement_list(lp = 0x121208), line 513 in "run.c"
> >   [13] run_block(bp = 0x133288), line 402 in "run.c"
> >   [14] run_call(sp = 0xffbfcec8), line 1625 in "run.c"
> >   [15] resolve_expression(vp = 0xffbfd450, ep = 0x13cd80, runit = 1),
> > line 2892 in "run.c"
> >   [16] resolve_l_expression(ep = 0x13ae18), line 2659 in "run.c"
> >   [17] run_if(sp = 0x13cf88), line 523 in "run.c"
> >   [18] run_statement_list(lp = 0x13cf88), line 513 in "run.c"
> >   [19] run_block(bp = 0x1426f8), line 402 in "run.c"
> >   [20] se_run(argc = 1, argv = 0x74b88), line 366 in "run.c"
> >   [21] main(argc = 2, argv = 0xffbffcc4), line 542 in "main.c"
> > *vp = {
> >     var_flags      = VF_MEMBER
> >     var_special    = 0
> >     var_type       = VAR_CHAR
> >     var_struct     = (nil)
> >     var_name       = 0xc44f0 "d_name"
> >     var_qname      = (nil)
> >     var_attach_lib = (nil)
> >     var_address    = (nil)
> >     var_initial    = (nil)
> >     var_un         = {
> >         var_string  = 0x129840 "c3t8d24s6"
> >         var_digit   = 1218624
> >         var_udigit  = 1218624U
> >         var_ldigit  = 5233950226120704LL
> >         var_uldigit = 5233950226120704ULL
> >         var_rdigit  = 2.5859149987693e-308
> >         var_user    = 0x129840
> >         var_array   = 0x129840
> >     }
> >     var_dimension  = 256
> >     var_subscript  = (nil)
> >     var_instances  = (nil)
> >     var_offset     = 10
> >     var_parent     = 0xffbfd588
> >     var_next       = (nil)
> > }
> >
> > I would be more than happy to provide any additional information on
> > the problem you might need. Feel free to contact me directly on this
> > issue.
> >
> > Thank you,
> >
> > Brian
> >
> > On 9/7/06, Cockcroft, Adrian <acockcroft at ebay.com> wrote:
> > > It should still be possible to avoid the crash by checking for a null
> > at
> > > the right point.
> > >
> > > Is it crashing in kstat read of the iostats, or the devinfo name
> > mapping
> > > at startup?
> > >
> > > Adrian
> > >
> > > -----Original Message-----
> > > From: Dmitry Berezin [mailto:dberezin at surfside.rutgers.edu]
> > > Sent: Thursday, September 07, 2006 8:43 AM
> > > To: Cockcroft, Adrian; 'Biju Joseph'; orca-users at orcaware.com
> > > Subject: RE: [Orca-users] Orcallator - Segmentation Fault
> > >
> > > Adrian,
> > >
> > > I believe that the actual problem is not with the array sizes, but has
> > > to do
> > > with the "stale" disk devices. SE "segfaults" when it tries to access
> > a
> > > device that is not currently present on the system. That is why the
> > > problem
> > > is usually seen on the clustered systems with shared storage or
> > systems
> > > with
> > > BCV devices that frequently change their state to offline. A number of
> > > people had previously reported that rebuilding device tree fixed the
> > > problem.
> > >
> > > I have not had time to look at the code, so I do not know if this
> > could
> > > be
> > > solved by changing scripts or SE itself has to be patched.
> > >
> > >   -Dmitry.
> > >
> > >
> > > > -----Original Message-----
> > > > From: orca-users-bounces+dberezin=acs.rutgers.edu at orcaware.com
> > > > [mailto:orca-users-bounces+dberezin=acs.rutgers.edu at orcaware.com] On
> > > > Behalf Of Cockcroft, Adrian
> > > > Sent: Thursday, September 07, 2006 11:13 AM
> > > > To: Biju Joseph; orca-users at orcaware.com
> > > > Subject: Re: [Orca-users] Orcallator - Segmentation Fault
> > > >
> > > > Years ago I fixed the code that looks at disks to resize the array
> > > > dynamically, I guess that this code got overwritten at some point,
> > but
> > > its
> > > > a simple fix, just doesn't look much like C code...
> > > >
> > > > You can use the "renew" keyword to make a new array that is bigger
> > and
> > > > contains the same items, so figure out where its indexing into the
> > > disk
> > > > array, check the index and renew the array to be size+10 or
> > something.
> > > > There's example code in the generic SE disk class, which for some
> > > reason
> > > > orcallator doesn't seem to use?
> > > >
> > > > I'm not currently working on a Solaris box, so it will take me a
> > while
> > > to
> > > > get a setup I could test this fix on, probably a few weeks when I
> > get
> > > back
> > > > from a business trip.
> > > >
> > > > Adrian
> > > >
> > > > -----Original Message-----
> > > > From: orca-users-bounces+acockcroft=ebay.com at orcaware.com on behalf
> > of
> > > > Biju Joseph
> > > > Sent: Thu 9/7/2006 7:28 AM
> > > > To: orca-users at orcaware.com
> > > > Subject: [Orca-users] Orcallator - Segmentation Fault
> > > >
> > > > Hello All,
> > > >
> > > > I am trying to start orcallator on two nodes of VCS cluster ( 4.1 )
> > > with
> > > > VxVM 4.1 . Database is on EMC disks. Orcallator is giving
> > segmentation
> > > > fault.
> > > >
> > > > RICHPse version is 3.4 (03:59 PM 01/05/05).  I tried using
> > > orcallator.se
> > > > 1.36 and 1.37. Both giving same problem.
> > > >
> > > > The same combination is working on non clustered systems. All
> > systems
> > > are
> > > > Solaris 10
> > > >
> > > > Can any of you help.
> > > >
> > > > Appreciate your help.
> > > >
> > > > Regards
> > > > Biju K Joseph
> > > > +91-9866116298
> > > >
> > > > _______________________________________________
> > > > Orca-users mailing list
> > > > Orca-users at orcaware.com
> > > > http://www.orcaware.com/mailman/listinfo/orca-users
> > >
> > > _______________________________________________
> > > Orca-users mailing list
> > > Orca-users at orcaware.com
> > > http://www.orcaware.com/mailman/listinfo/orca-users
> > >
> >
> > _______________________________________________
> > Orca-users mailing list
> > Orca-users at orcaware.com
> > http://www.orcaware.com/mailman/listinfo/orca-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/orca-users/attachments/20060908/1242b5bc/attachment.html>


More information about the Orca-users mailing list