[Orca-dev] orcallator.se patches

Dmitry Berezin dberezin at surfside.rutgers.edu
Fri May 28 09:43:37 PDT 2004


> Dmitry,
> 
> I'm not comfortable yet with the timestamp modification patch.  Can you
> explain
> what exactly you're seeing that makes you want to change this?

Since timestamps do not exactly line up with the integer multiples of the
interval time, when orca extracts data from RRDs, it gets "corrected"
results (averaged data for the requested time), so in case of integer data,
such as process counts, number of tape drives used (this came up some time
ago), etc., the result is 5.97 tape drives or 32.93 Oracle connections, etc.
This is of course only relevant for the daily graphs, since all other data
is averaged anyways.

> How much off is the timestamp?  Just a couple of seconds I would guess?

Yes. Here is the log for today from one of my systems:

timestamp locltime
1085716800 00:00:00
1085717100 00:05:00
1085717400 00:10:00
1085717701 00:15:01
1085718001 00:20:01
1085718301 00:25:01
1085718601 00:30:01
1085718901 00:35:01
1085719201 00:40:01
1085719501 00:45:01
1085719801 00:50:01
1085720101 00:55:01
1085720403 01:00:03
1085720704 01:05:04
1085721004 01:10:04
1085721304 01:15:04
1085721604 01:20:04
1085721904 01:25:04
1085722204 01:30:04
1085722504 01:35:04
1085722804 01:40:04
1085723104 01:45:04
1085723404 01:50:04
1085723700 01:55:00
1085724000 02:00:00
1085724300 02:05:00
1085724600 02:10:00
1085724900 02:15:00
1085725200 02:20:00
1085725500 02:25:00
1085725800 02:30:00
1085726100 02:35:00
1085726400 02:40:00
1085726701 02:45:01
1085727001 02:50:01
1085727301 02:55:01
1085727601 03:00:01
1085727901 03:05:01
1085728201 03:10:01
1085728501 03:15:01
1085728801 03:20:01
1085729101 03:25:01
1085729401 03:30:01
1085729702 03:35:02
1085730002 03:40:02
1085730302 03:45:02
1085730602 03:50:02
1085730902 03:55:02
1085731202 04:00:02
1085731502 04:05:02
1085731802 04:10:02
1085732102 04:15:02
1085732402 04:20:02
1085732702 04:25:02
1085733003 04:30:03
1085733303 04:35:03
1085733603 04:40:03
1085733903 04:45:03
1085734203 04:50:03
1085734503 04:55:03
1085734803 05:00:03
1085735103 05:05:03
1085735403 05:10:03
1085735703 05:15:03
1085736003 05:20:03
1085736304 05:25:04
1085736604 05:30:04
1085736904 05:35:04
1085737204 05:40:04
1085737502 05:45:02
1085737802 05:50:02
1085738102 05:55:02
1085738402 06:00:02
1085738702 06:05:02
1085739002 06:10:02
1085739302 06:15:02
1085739602 06:20:02
1085739902 06:25:02
1085740203 06:30:03
1085740503 06:35:03
1085740803 06:40:03
1085741103 06:45:03
1085741403 06:50:03
1085741703 06:55:03
1085742003 07:00:03
1085742303 07:05:03
1085742603 07:10:03
1085742903 07:15:03
1085743203 07:20:03
1085743504 07:25:04
1085743804 07:30:04
1085744104 07:35:04
1085744404 07:40:04
1085744704 07:45:04
1085745004 07:50:04
1085745304 07:55:04
1085745604 08:00:04
1085745904 08:05:04
1085746204 08:10:04
1085746500 08:15:00
1085746800 08:20:00
1085747100 08:25:00
1085747400 08:30:00
1085747700 08:35:00
1085748000 08:40:00
1085748300 08:45:00
1085748600 08:50:00
1085748900 08:55:00
1085749200 09:00:00
1085749501 09:05:01
1085749801 09:10:01
1085750101 09:15:01
1085750401 09:20:01
1085750701 09:25:01
1085751001 09:30:01
1085751301 09:35:01
1085751601 09:40:01
1085751901 09:45:01
1085752202 09:50:02
1085752502 09:55:02
1085752802 10:00:02
1085753102 10:05:02
1085753402 10:10:02
1085753702 10:15:02
1085754004 10:20:04
1085754304 10:25:04
1085754604 10:30:04
1085754900 10:35:00
1085755200 10:40:00
1085755500 10:45:00
1085755800 10:50:00
1085756100 10:55:00
1085756400 11:00:00
1085756700 11:05:00
1085757000 11:10:00
1085757300 11:15:00
1085757601 11:20:01
1085757901 11:25:01
1085758201 11:30:01
1085758501 11:35:01
1085758801 11:40:01
1085759101 11:45:01
1085759401 11:50:01
1085759702 11:55:02
1085760002 12:00:02
1085760302 12:05:02

> Looking at this code:
> 
>      // Calculate the next time to sleep to that is an integer multiple of
>      // the interval time.  Make sure that at least half of the interval
>      // passes before waking up.
>      now        = time(0);
>      sleep_till = (now/interval)*interval;
>      while (sleep_till < now + interval*0.5) {
>        sleep_till += interval;
>      }
> 
> #ifdef WATCH_WEB
>      measure_web(sleep_till);
> #else
>      sleep_till_and_count_new_processes(sleep_till);
> #endif
> 
> and down further in sleep_till_and_count_new_processes() and
> measure_web(), the
> code is designed to sleep to the next integer multiple of the interval.
> So this should work fine.

No. The problem is inside both measure_web and
sleep_till_and_count_new_processes. Look at my explanation of it here:
http://www.orcaware.com/pipermail/orca-dev/2004-May/000465.html

Basically, orcallator.se will sleep for 5 seconds regardless of the integer
multiple of the interval boundary:

#ifdef WATCH_CPU
    if (can_read_kernel != 0) {
      // Sleep at least 5 seconds to make a measurement.
      sleep_till1 = now + 5;
      while (now < sleep_till1) {
        sleep(sleep_till1 - now);
        now = time(0);
      }

> Are you using WATCH_HTTPD or WATCH_WEB?

On some systems yes, on some no. The included log (above) is from the system
that is not "watching web".

> So something is happening that the process isn't waking up at the right
> time.

Exactly, it sleeps for 5 seconds even if that goes past sleep_till boundary.

> Are you on a slow system?

No. Again, the log above is from V1280 with 4 CPUs, being 75% idle.

I agree with your concerns about "fixing" timestamps; this patch is just a
quick fix. Perhaps we need to make more fundamental changes to try to
actually fix the problem.

  -Dmitry.





More information about the Orca-dev mailing list