[Orca-checkins] r398 - trunk/orca/data_gatherers/orcallator
dmberezin at hotmail.com
dmberezin at hotmail.com
Mon Oct 11 12:12:35 PDT 2004
Author: dmberezin at hotmail.com
Date: Mon Oct 11 12:10:53 2004
New Revision: 398
Modified:
trunk/orca/data_gatherers/orcallator/orcallator.se
Log:
Fix for kio.nread bug in SE
* data_gatherers/orcallator/orcallator.se
(get_new_kstat_data): new function
(orca_io_info_update): add code to re-read kstat data if kio.nread appears to
be corrupt.
SE appears to have a bug - occasionally kio.nread is erroneously set to 0. It
looks like a memory management problem somewhere deep in SE's code, since
this problem is related to any memory allocation calls elsewhere in the
script. For example, a call to "renew" inside kstat traversing loop will
cause nread to be 0 in the next iteration of the loop. "Data fixing" code,
dealing with this issue, was removed in revision 392. This patch introduces a
new function to re-read affected kstat, instead of ignoring bad data.
I can add a few more "if" statements to cover the case when re-read data is
still bad, but I think this will be an overkill, since we should trust SE to
some degree :-).
Modified: trunk/orca/data_gatherers/orcallator/orcallator.se
==============================================================================
--- trunk/orca/data_gatherers/orcallator/orcallator.se (original)
+++ trunk/orca/data_gatherers/orcallator/orcallator.se Mon Oct 11 12:10:53 2004
@@ -339,6 +339,37 @@
uint _rcnt; // Count of elements in run state
};
+// SE appears to have a bug - occasionally kio.nread is erroneously set to 0.
+// This function is used to re-read data for a given kstat.
+ulong get_new_kstat_data(kstat_t okp[1]) {
+ ulong ul;
+ kstat_ctl_t kc[1];
+ kstat_t nkp[1];
+ kstat_t rkp[1];
+
+ // Return old data if no match found
+ rkp = okp;
+ // Initialize kstat control structure
+ kc[0] = kstat_open();
+ // Traverse the chain looking for matching kstat
+ for (ul=kc[0].kc_chain; ul!=0; ul=nkp[0].ks_next) {
+ nkp[0] = *((kstat_t *) ul);
+ if (nkp[0].ks_type == okp[0].ks_type &&
+ nkp[0].ks_class == okp[0].ks_class &&
+ nkp[0].ks_name == okp[0].ks_name &&
+ nkp[0].ks_instance == okp[0].ks_instance ) {
+ if (kstat_read(kc, nkp, 0) == -1) {
+ perror("get_new_kstat_data:kstat_read error");
+ exit(1);
+ }
+ rkp = nkp;
+ break;
+ }
+ }
+ kstat_close(kc);
+ return rkp[0].ks_data;
+}
+
// Define globals for tracking kstat io data.
io_dev_info_t ORCA_io_dev_info[];
int ORCA_io_dev_count=0;
@@ -408,6 +439,17 @@
}
ORCA_io_dev_info[iodev].short_name = nkp[0].ks_name;
ORCA_io_dev_info[iodev].dev_class = nkp[0].ks_class;
+
+ // Check if kio data is valid, and re-read kstat if it is not.
+ // At this time, only kio.nread appears to have occasional problems,
+ // but we check the other three just in case.
+ // It is possible for these statistics to be 0, in such case
+ // kio data will remain the same.
+ if (kio.writes == 0 || kio.nwritten == 0 ||
+ kio.reads == 0 || kio.nread == 0) {
+ kio = *((kstat_io_t *) get_new_kstat_data(nkp));
+ }
+
ORCA_io_dev_info[iodev]._writes = kio.writes;
ORCA_io_dev_info[iodev]._nwritten = kio.nwritten;
ORCA_io_dev_info[iodev]._wlastupdate = kio.wlastupdate;
@@ -422,6 +464,15 @@
ORCA_io_dev_info[iodev]._rcnt = kio.rcnt;
ORCA_io_dev_count++;
}
+ // Check if kio data is valid, and re-read kstat if it is not.
+ // At this time, only kio.nread appears to have occasional problems,
+ // but we check the other three just in case.
+ if (kio.writes < ORCA_io_dev_info[iodev]._writes ||
+ kio.nwritten < ORCA_io_dev_info[iodev]._nwritten ||
+ kio.reads < ORCA_io_dev_info[iodev]._reads ||
+ kio.nread < ORCA_io_dev_info[iodev]._nread) {
+ kio = *((kstat_io_t *) get_new_kstat_data(nkp));
+ }
elapsed_etime = (kio.wlastupdate-ORCA_io_dev_info[iodev]._wlastupdate);
if (elapsed_etime == 0) {
More information about the Orca-checkins
mailing list