[Orca-users] Orcallator - Segmentation Fault
Dmitry Berezin
dberezin at surfside.rutgers.edu
Fri Sep 8 07:05:56 PDT 2006
Can you rebuild device tree on this server and try to run Orcallator again?
-Dmitry.
-----Original Message-----
From: Biju Joseph [mailto:biju.joseph at gmail.com]
Sent: Friday, September 08, 2006 9:49 AM
To: Dmitry Berezin
Cc: Cockcroft, Adrian; Brian Poole; orca-users at orcaware.com
Subject: Re: [Orca-users] Orcallator - Segmentation Fault
Today I tried to run orcallator on a different machine which has no cluster
software, but EMC disks attached and VxVM installed. Getting same problem. (
Segmentation Fault )
But as mentioned earlier, It has got installed successfully on machines
which are not having EMC SAN disks.
Any solution is highly appreciated.
Thanks
Biju..
On 9/8/06, Dmitry Berezin <dberezin at acs.rutgers.edu> wrote:
It is failing while dereferencing, but the pointer is not null -
dp = *((dirent_t *) ld<4281687800>)
> Segmentation Fault (core dumped)
-Dmitry.
> -----Original Message-----
> From: orca-users-bounces+dberezin=acs.rutgers.edu at orcaware.com
> [mailto:orca-users-bounces+dberezin=acs.rutgers.edu at orcaware.com] On
> Behalf Of Cockcroft, Adrian
> Sent: Thursday, September 07, 2006 2:32 PM
> To: Brian Poole
> Cc: Dmitry Berezin; orca-users at orcaware.com; Biju Joseph
> Subject: Re: [Orca-users] Orcallator - Segmentation Fault
>
> OK, so it's failing while walking the directory tree, I can see that the
> renew is already in place a line or so earlier.
>
> Its dereferencing a directory structure that isn't there, so a test
> needs to be added to skip this if readdir returns something bad. Its
> already testing for null, so there is something bad happening between
> the null test and the actual usage of the dirp.
>
> http://docs.sun.com/app/docs/doc/819-2243/6n4i099g0?q=readdir
<http://docs.sun.com/app/docs/doc/819-2243/6n4i099g0?q=readdir&a=view>
&a=view
>
> I'm not sure how to fix this, maybe a second test for null immediately
> before it's de-referenced?
>
> Adrian
>
> -----Original Message-----
> From: Brian Poole [mailto:pooleb at gmail.com]
> Sent: Thursday, September 07, 2006 10:39 AM
> To: Cockcroft, Adrian
> Cc: Dmitry Berezin; Biju Joseph; orca-users at orcaware.com
> Subject: Re: [Orca-users] Orcallator - Segmentation Fault
>
> Here is all of the information I've been able to gather on the crash
> (SE Toolkit 3.4 on Solaris 10). I compiled it fresh using Forte with
> debugging enabled. I took a quick look at trying to find where the
> problem actually lies but was unable to come up with anything useful.
>
> Here is running the disks.se with debug:
>
> # /opt/RICHPse/bin/se.sparcv9 -d /opt/RICHPse/examples/disks.se
> if (count<31> == GLOBAL_diskinfo_size<101>)
> dp = *((dirent_t *) ld<4281687704>)
> if (dp.d_name<c3t8d24s3> == <.> || dp.d_name<c3t8d24s3> == <..>)
> if (!(dp.d_name<c3t8d24s3> =~ <s0$>))
> ld = readdir(dirp<4281664128>)
> if (count<31> == GLOBAL_diskinfo_size<101>)
> dp = *((dirent_t *) ld<4281687736>)
> if (dp.d_name<c3t8d24s4> == <.> || dp.d_name<c3t8d24s4> == <..>)
> if (!( dp.d_name<c3t8d24s4> =~ <s0$>))
> ld = readdir(dirp<4281664128>)
> if (count<31> == GLOBAL_diskinfo_size<101>)
> dp = *((dirent_t *) ld<4281687768>)
> if (dp.d_name <c3t8d24s5> == <.> || dp.d_name<c3t8d24s5> == <..>)
> if (!(dp.d_name<c3t8d24s5> =~ <s0$>))
> ld = readdir(dirp<4281664128>)
> if (count<31> == GLOBAL_diskinfo_size<101>)
> dp = *((dirent_t *) ld<4281687800>)
> Segmentation Fault (core dumped)
>
> So tracking that back shows the segfault occurs on line 215 of
> include/diskinfo.se:
>
> for (ld = readdir(dirp); ld != 0; ld = readdir(dirp)) {
> // grow the array if needed
> if (count == GLOBAL_diskinfo_size) {
> GLOBAL_diskinfo_size += 4;
> GLOBAL_disk_info = renew GLOBAL_disk_info[GLOBAL_diskinfo_size];
> }
> dp = *((dirent_t *) ld); <---------
>
> Also the truss output:
>
> # truss -fo /tmp/truss.log /opt/RICHPse/bin/se.sparcv9
> /opt/RICHPse/examples/disks.se
> # tail -15 /tmp/truss.log
> 5967: ioctl(4, KSTAT_IOC_READ, "sd3547,err") = 701015
> 5967: ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000) = 701015
> 5967: ioctl(4, KSTAT_IOC_READ, "sd2146,err") = 701015
> 5967: ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000) = 701015
> 5967: ioctl(4, KSTAT_IOC_READ, "sd2177,err") = 701015
> 5967: ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000) = 701015
> 5967: ioctl(4, KSTAT_IOC_READ, "sd3935,err") = 701015
> 5967: ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000) = 701015
> 5967: ioctl(4, KSTAT_IOC_READ, "sd1971,err") = 701015
> 5967: ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000) = 701015
> 5967: ioctl(4, KSTAT_IOC_READ, "sd1972,err") = 701015
> 5967: Incurred fault #6, FLTBOUNDS %pc = 0xFF2E08EC
> 5967: siginfo: SIGSEGV SEGV_MAPERR addr=0xFF356000
> 5967: Received signal #11, SIGSEGV [default]
> 5967: siginfo: SIGSEGV SEGV_MAPERR addr=0xFF356000
>
> And perhaps more indicative, the trace:
>
> # /opt/SUNWspro/bin/dbx /opt/RICHPse/bin/se.sparcv9 core
> For information about new features see `help changes'
> To remove this message, put `dbxenv suppress_startup_message 7.5' in
> your .dbxrc
> Reading se.sparcv9
> core file header read successfully
> Reading ld.so.1
> Reading libkvm.so.1
> Reading libkstat.so.1
> Reading libdl.so.1
> Reading libelf.so.1
> Reading libgen.so.1
> Reading libm.so.2
> Reading libsocket.so.1
> Reading libnsl.so.1
> Reading libc.so.1
> Reading libc_psr.so.1
> Reading libmp.so.2
> Reading libmd5.so.1
> Reading libscf.so.1
> Reading libdoor.so.1
> Reading libuutil.so.1
> Reading librt.so.1
> Reading libaio.so.1
> program terminated by signal SEGV (no mapping at the fault address)
> 0xff2e08ec: _memcpy+0x042c: ldd [%o1], %c2
> Current function is member_fill
> dbx: warning: can't find file "/tmp/se-src/run.c"
> dbx: warning: see `help finding-files'
> (dbx) where
> [1] _memcpy(0x129938, 0xff356000, 0x8, 0xfffffffa, 0x4, 0x1), at
> 0xff2e08ec
> =>[2] member_fill(vp = 0x1297f0, area = 0xff355ef8 "", bias = 0), line
> 994 in "run.c"
> [3] struct_fill(vp = 0x1296b0, area = 0xff355ef8 "", bias = 0), line
> 1043 in "run.c"
> [4] run_indirection(sp = 0xffbfc4b8), line 1308 in "run.c"
> [5] run_call(sp = 0xffbfc4b8), line 1608 in "run.c"
> [6] resolve_expression(vp = 0xffbfcae0, ep = 0x129620, runit = 1),
> line 2892 in "run.c"
> [7] run_assign(sp = 0x127530), line 1675 in "run.c"
> [8] run_statement_list(lp = 0x127510), line 513 in "run.c"
> [9] run_for(sp = 0x12c078), line 2538 in " run.c"
> [10] run_statement_list(lp = 0x127330), line 513 in "run.c"
> [11] run_for(sp = 0x12c0b8), line 2538 in "run.c"
> [12] run_statement_list(lp = 0x121208), line 513 in " run.c"
> [13] run_block(bp = 0x133288), line 402 in "run.c"
> [14] run_call(sp = 0xffbfcec8), line 1625 in "run.c"
> [15] resolve_expression(vp = 0xffbfd450, ep = 0x13cd80, runit = 1),
> line 2892 in "run.c"
> [16] resolve_l_expression(ep = 0x13ae18), line 2659 in "run.c"
> [17] run_if(sp = 0x13cf88), line 523 in "run.c"
> [18] run_statement_list(lp = 0x13cf88), line 513 in " run.c"
> [19] run_block(bp = 0x1426f8), line 402 in "run.c"
> [20] se_run(argc = 1, argv = 0x74b88), line 366 in "run.c"
> [21] main(argc = 2, argv = 0xffbffcc4), line 542 in " main.c"
> *vp = {
> var_flags = VF_MEMBER
> var_special = 0
> var_type = VAR_CHAR
> var_struct = (nil)
> var_name = 0xc44f0 "d_name"
> var_qname = (nil)
> var_attach_lib = (nil)
> var_address = (nil)
> var_initial = (nil)
> var_un = {
> var_string = 0x129840 "c3t8d24s6"
> var_digit = 1218624
> var_udigit = 1218624U
> var_ldigit = 5233950226120704LL
> var_uldigit = 5233950226120704ULL
> var_rdigit = 2.5859149987693e-308
> var_user = 0x129840
> var_array = 0x129840
> }
> var_dimension = 256
> var_subscript = (nil)
> var_instances = (nil)
> var_offset = 10
> var_parent = 0xffbfd588
> var_next = (nil)
> }
>
> I would be more than happy to provide any additional information on
> the problem you might need. Feel free to contact me directly on this
> issue.
>
> Thank you,
>
> Brian
>
> On 9/7/06, Cockcroft, Adrian <acockcroft at ebay.com> wrote:
> > It should still be possible to avoid the crash by checking for a null
> at
> > the right point.
> >
> > Is it crashing in kstat read of the iostats, or the devinfo name
> mapping
> > at startup?
> >
> > Adrian
> >
> > -----Original Message-----
> > From: Dmitry Berezin [mailto:dberezin at surfside.rutgers.edu]
> > Sent: Thursday, September 07, 2006 8:43 AM
> > To: Cockcroft, Adrian; 'Biju Joseph'; orca-users at orcaware.com
> > Subject: RE: [Orca-users] Orcallator - Segmentation Fault
> >
> > Adrian,
> >
> > I believe that the actual problem is not with the array sizes, but has
> > to do
> > with the "stale" disk devices. SE "segfaults" when it tries to access
> a
> > device that is not currently present on the system. That is why the
> > problem
> > is usually seen on the clustered systems with shared storage or
> systems
> > with
> > BCV devices that frequently change their state to offline. A number of
> > people had previously reported that rebuilding device tree fixed the
> > problem.
> >
> > I have not had time to look at the code, so I do not know if this
> could
> > be
> > solved by changing scripts or SE itself has to be patched.
> >
> > -Dmitry.
> >
> >
> > > -----Original Message-----
> > > From: orca-users-bounces+dberezin=acs.rutgers.edu at orcaware.com
<mailto:acs.rutgers.edu at orcaware.com>
> > > [mailto:orca-users-bounces+dberezin=acs.rutgers.edu at orcaware.com] On
> > > Behalf Of Cockcroft, Adrian
> > > Sent: Thursday, September 07, 2006 11:13 AM
> > > To: Biju Joseph; orca-users at orcaware.com
> > > Subject: Re: [Orca-users] Orcallator - Segmentation Fault
> > >
> > > Years ago I fixed the code that looks at disks to resize the array
> > > dynamically, I guess that this code got overwritten at some point,
> but
> > its
> > > a simple fix, just doesn't look much like C code...
> > >
> > > You can use the "renew" keyword to make a new array that is bigger
> and
> > > contains the same items, so figure out where its indexing into the
> > disk
> > > array, check the index and renew the array to be size+10 or
> something.
> > > There's example code in the generic SE disk class, which for some
> > reason
> > > orcallator doesn't seem to use?
> > >
> > > I'm not currently working on a Solaris box, so it will take me a
> while
> > to
> > > get a setup I could test this fix on, probably a few weeks when I
> get
> > back
> > > from a business trip.
> > >
> > > Adrian
> > >
> > > -----Original Message-----
> > > From: orca-users-bounces+acockcroft= ebay.com at orcaware.com on behalf
> of
> > > Biju Joseph
> > > Sent: Thu 9/7/2006 7:28 AM
> > > To: orca-users at orcaware.com
> > > Subject: [Orca-users] Orcallator - Segmentation Fault
> > >
> > > Hello All,
> > >
> > > I am trying to start orcallator on two nodes of VCS cluster ( 4.1 )
> > with
> > > VxVM 4.1 . Database is on EMC disks. Orcallator is giving
> segmentation
> > > fault.
> > >
> > > RICHPse version is 3.4 (03:59 PM 01/05/05). I tried using
> > orcallator.se
> > > 1.36 and 1.37. Both giving same problem.
> > >
> > > The same combination is working on non clustered systems. All
> systems
> > are
> > > Solaris 10
> > >
> > > Can any of you help.
> > >
> > > Appreciate your help.
> > >
> > > Regards
> > > Biju K Joseph
> > > +91-9866116298
> > >
> > > _______________________________________________
> > > Orca-users mailing list
> > > Orca-users at orcaware.com
> > > http://www.orcaware.com/mailman/listinfo/orca-users
> >
> > _______________________________________________
> > Orca-users mailing list
> > Orca-users at orcaware.com
> > http://www.orcaware.com/mailman/listinfo/orca-users
> >
>
> _______________________________________________
> Orca-users mailing list
> Orca-users at orcaware.com
> http://www.orcaware.com/mailman/listinfo/orca-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/orca-users/attachments/20060908/a434a70a/attachment.html>
More information about the Orca-users
mailing list