[Orca-users] Re: Not reporting I/O wait under 2.6?
Blair Zajac
blair at orcaware.com
Mon Nov 26 22:56:32 PST 2001
Thanks for tracking this down.
Are you running the older versions of the kernel patches? Do the patches
help?
We can always patch orcallator.se to ignore these measurements on systems
that have the old patch.
Blair
"Nick Steel. (lists)" wrote:
>
> Ok,I'll reply to my own post again (the first sign of madness right?)
>
> I had a dig through Sunsolve, and came up with two bugid's: 4139268 and
> 4116873 that appear to be relevant to this. Both these bugs were fixed in
> kernel patch 105181-21 which was released 5/31/00. 105181-29 is the latest
> KU for 2.6. Both bugs address the fact that I/O wait wasn't calculated on a
> per-cpu basis under 2.6. It seems to me that this would explain the behavior
> I've been seeing.
>
> - Nick
>
> ----- Original Message -----
> From: "Nick Steel (Lists)" <mrlist at noid.org>
> To: <orca-users at yahoogroups.com>
> Sent: Monday, November 26, 2001 5:47 PM
> Subject: Re: [orca-users] Not reporting I/O wait under 2.6?
>
> > Blair (et al) -
> >
> > Here's what I got back from Rich. I'll write a couple of test scripts
> > tonight to test that theory out. I'll have a poke through sunsolve and see
> > if I can find anything of note in the 2.6 kernel patches as well.
> >
> > Is anyone else running 2.6 or 2.7 running into this or am I the only one
> > who's noticed something whacky about IO Wait?
> >
> > Thanks!
> >
> > - Nick
> >
> >
> > Nick Steel wrote:
> >
> > > Hi Rich -
> > >
> > > I've started working with Orca and its orcallator.se script and I've
> been
> > > seeing some odd results from the orcallator.se script.
> > >
> > > The environment is Solaris 2.6 with kernel patch -26 and Solaris 8 with
> > > kernel patch -10.SE Toolkit 3.2.1, orcallator v1.28 and 1.32.
> > >
> > > What I'm seeing is that the orcallator script doesn't appear to be
> > reporting
> > > iowait on a 2.6 system properly but it appears to be reporting it
> properly
> > > on a 2.8 box. On the 2.6 system, wait_time is reported in the 0.4 - 1.0
> > > range however sar is reporting wio% around 20 - 25% which is what I
> expect
> > > it to be on this system. The numbers reported by both the se script and
> > sar
> > > are roughly in synch in the 20 - 25% range. Orcallator.se is using
> > > vmglobal_total(); to populate the p_vmstat structure. In live_rules.se ,
> > > there's this chunk of code:
> > >
> > > if (GLOBAL_pvm_ncpus > 1) {
> > > /* average over cpu count */
> > > pvm.user_time /= GLOBAL_pvm_ncpus;
> > > pvm.system_time /= GLOBAL_pvm_ncpus;
> > > pvm.wait_time /= GLOBAL_pvm_ncpus;
> > > pvm.idle_time /= GLOBAL_pvm_ncpus;
> > > #if MINOR_VERSION < 70
> > > /* reduce wait time - only one CPU can ever be waiting - others are
> > idle
> > > */
> > > /* system counts all idle CPUs as waiting if any I/O is outstanding
> */
> > > pvm.wait_time /= GLOBAL_pvm_ncpus;
> > > #endif
> > >
> > > As a wild stab in the dark, I stopped it from averaging pvm.wait_time on
> > the
> > > 2.6 and I found that the values being reported are more inline with what
> > > iostat and sar report. What was different in 2.6 that meant that that
> > > wait_time needed to be averaged accross all CPU's but not so with 2.7
> and
> > > up? Which value can I trust on a 2.6 system?
> > >
> > > Thanks!
> >
> >
> > Wait time, previous to Solaris 7, was only computed once and reported
> > identically across all CPUs. So, if you had a 4 CPU machine and one
> > of the CPUs had 40% wait time but the other 3 were totally idle, then
> > all four processors reported 40% wait time. This code attempts to remedy
> > this by dividing the total by 4, bringing the wait time down from 160% to
> > 40%. This of course, is still wrong, which is why they fixed it. At one
> > time, this code divided the wait time AGAIN by ncpus bringing the wait
> > time down to 10% which kinda makes sense (40% on one of 4 CPUs is 10%),
> > but is still wrong.
> >
> > Reminds me of the joke I heard on the West Wing last night: 3
> statisticians
> > go hunting. A deer jumps out in front of them and the first man fires and
> > hits 10 feet in front of the deer. The second man fires and hits 10 feet
> > behind the deer. The third man starts jumping up and down yelling "I got
> > him!
> > I got him!".
> >
> > Anyway, perhaps the issue was addressed by a kernel patch that I'm unaware
> > of.
> > If you feel the numbers are accurate without the ifdef, then remove the
> > ifdef.
> >
> > Rich
> > ---- Richard Pettit richp at setoolkit.com
> > ---- Author, SE Toolkit
> > ---- SE Toolkit.com http://www.setoolkit.com
> >
> >
> >
> >
> >
> >
> >
>
--
Blair Zajac <blair at orcaware.com> - Perl & sysadmin services for hire
Web and OS performance plots - http://www.orcaware.com/orca/
More information about the Orca-users
mailing list