[Orca-users] Re: Not reporting I/O wait under 2.6?

Nick Steel. (lists) mrlist at noid.org
Tue Nov 27 01:19:42 PST 2001


You're welcome :)

I have one box that I've found on my network so far that's running 2.6 with
KU < -21, however its not generating any kind of I/O wait and its only a
dual processor U60. I'll have a bit more of a poke around tomorrow to see if
there are any other boxes but I'm not holding out much hope. We've been
pretty good about keeping up on patches and the chances of running into a
box that hasn't been patched in 18 months is pretty low.

Patching orcallator.se would work, tho since the problem is in
vmglobal_refresh(); which is part of SE Toolkit, wouldn't it make sense to
just ask to have SE patched? I suppose the alternative would be to move the
functionality of vmglobal_refresh() into orcallator.se and fix it there.
Either way, for now I'll tweak the function in live_rules.se

- Nick


> Thanks for tracking this down.
>
> Are you running the older versions of the kernel patches?  Do the patches
> help?
>
> We can always patch orcallator.se to ignore these measurements on systems
> that have the old patch.
>
> Blair
>
> "Nick Steel. (lists)" wrote:
> >
> > Ok,I'll reply to my own post again (the first sign of madness right?)
> >
> > I had a dig through Sunsolve, and came up with two bugid's: 4139268 and
> > 4116873 that appear to be relevant to this. Both these bugs were fixed
in
> > kernel patch 105181-21 which was released 5/31/00. 105181-29 is the
latest
> > KU for 2.6. Both bugs address the fact that I/O wait wasn't calculated
on a
> > per-cpu basis under 2.6. It seems to me that this would explain the
behavior
> > I've been seeing.
> >
> > - Nick
> >
> > ----- Original Message -----
> > From: "Nick Steel (Lists)" <mrlist at noid.org>
> > To: <orca-users at yahoogroups.com>
> > Sent: Monday, November 26, 2001 5:47 PM
> > Subject: Re: [orca-users] Not reporting I/O wait under 2.6?
> >
> > > Blair (et al)  -
> > >
> > > Here's what I got back from Rich. I'll write a couple of test scripts
> > > tonight to test that theory out. I'll have a poke through sunsolve and
see
> > > if I can find anything of note in the 2.6 kernel patches as well.
> > >
> > > Is anyone else running 2.6 or 2.7 running into this or am I the only
one
> > > who's noticed something whacky about IO Wait?
> > >
> > > Thanks!
> > >
> > > - Nick
> > >
> > >
> > > Nick Steel wrote:
> > >
> > > > Hi Rich -
> > > >
> > > > I've started working with Orca and its orcallator.se script and I've
> > been
> > > > seeing some odd results from the orcallator.se script.
> > > >
> > > > The environment is Solaris 2.6 with kernel patch -26 and Solaris 8
with
> > > > kernel patch -10.SE Toolkit 3.2.1, orcallator v1.28 and 1.32.
> > > >
> > > > What I'm seeing is that the orcallator script doesn't appear to be
> > > reporting
> > > > iowait on a 2.6 system properly but it appears to be reporting it
> > properly
> > > > on a 2.8 box. On the 2.6 system, wait_time is reported in the 0.4 -
1.0
> > > > range however sar is reporting wio% around 20 - 25% which is what I
> > expect
> > > > it to be on this system. The numbers reported by both the se script
and
> > > sar
> > > > are roughly in synch in the 20 - 25% range. Orcallator.se is using
> > > > vmglobal_total(); to populate the p_vmstat structure. In
live_rules.se ,
> > > > there's this chunk of code:
> > > >
> > > > if (GLOBAL_pvm_ncpus > 1) {
> > > >     /* average over cpu count */
> > > >     pvm.user_time        /= GLOBAL_pvm_ncpus;
> > > >     pvm.system_time      /= GLOBAL_pvm_ncpus;
> > > >     pvm.wait_time        /= GLOBAL_pvm_ncpus;
> > > >     pvm.idle_time        /= GLOBAL_pvm_ncpus;
> > > > #if MINOR_VERSION < 70
> > > >     /* reduce wait time - only one CPU can ever be waiting - others
are
> > > idle
> > > > */
> > > >     /* system counts all idle CPUs as waiting if any I/O is
outstanding
> > */
> > > >     pvm.wait_time        /= GLOBAL_pvm_ncpus;
> > > > #endif
> > > >
> > > > As a wild stab in the dark, I stopped it from averaging
pvm.wait_time on
> > > the
> > > > 2.6 and I found that the values being reported are more inline with
what
> > > > iostat and sar report. What was different in 2.6 that meant that
that
> > > > wait_time needed to be averaged accross all CPU's but not so with
2.7
> > and
> > > > up? Which value can I trust on a 2.6 system?
> > > >
> > > > Thanks!
> > >
> > >
> > > Wait time, previous to Solaris 7, was only computed once and reported
> > > identically across all CPUs. So, if you had a 4 CPU machine and one
> > > of the CPUs had 40% wait time but the other 3 were totally idle, then
> > > all four processors reported 40% wait time. This code attempts to
remedy
> > > this by dividing the total by 4, bringing the wait time down from 160%
to
> > > 40%. This of course, is still wrong, which is why they fixed it. At
one
> > > time, this code divided the wait time AGAIN by ncpus bringing the wait
> > > time down to 10% which kinda makes sense (40% on one of 4 CPUs is
10%),
> > > but is still wrong.
> > >
> > > Reminds me of the joke I heard on the West Wing last night: 3
> > statisticians
> > > go hunting. A deer jumps out in front of them and the first man fires
and
> > > hits 10 feet in front of the deer. The second man fires and hits 10
feet
> > > behind the deer. The third man starts jumping up and down yelling "I
got
> > > him!
> > > I got him!".
> > >
> > > Anyway, perhaps the issue was addressed by a kernel patch that I'm
unaware
> > > of.
> > > If you feel the numbers are accurate without the ifdef, then remove
the
> > > ifdef.
> > >
> > > Rich
> > > ---- Richard Pettit                           richp at setoolkit.com
> > > ---- Author, SE Toolkit
> > > ---- SE Toolkit.com                        http://www.setoolkit.com
> > >
> > >
> > >
> > >
> > >
http://docs.yahoo.com/info/terms/
> > >
> > >
> >
>
> --
> Blair Zajac <blair at orcaware.com> - Perl & sysadmin services for hire
> Web and OS performance plots - http://www.orcaware.com/orca/
>
>
>
>
>
>



More information about the Orca-users mailing list