Back to home page

LXR

 
 

    


0001 CPU load
0002 --------
0003 
0004 Linux exports various bits of information via `/proc/stat' and
0005 `/proc/uptime' that userland tools, such as top(1), use to calculate
0006 the average time system spent in a particular state, for example:
0007 
0008     $ iostat
0009     Linux 2.6.18.3-exp (linmac)     02/20/2007
0010 
0011     avg-cpu:  %user   %nice %system %iowait  %steal   %idle
0012               10.01    0.00    2.92    5.44    0.00   81.63
0013 
0014     ...
0015 
0016 Here the system thinks that over the default sampling period the
0017 system spent 10.01% of the time doing work in user space, 2.92% in the
0018 kernel, and was overall 81.63% of the time idle.
0019 
0020 In most cases the `/proc/stat' information reflects the reality quite
0021 closely, however due to the nature of how/when the kernel collects
0022 this data sometimes it can not be trusted at all.
0023 
0024 So how is this information collected?  Whenever timer interrupt is
0025 signalled the kernel looks what kind of task was running at this
0026 moment and increments the counter that corresponds to this tasks
0027 kind/state.  The problem with this is that the system could have
0028 switched between various states multiple times between two timer
0029 interrupts yet the counter is incremented only for the last state.
0030 
0031 
0032 Example
0033 -------
0034 
0035 If we imagine the system with one task that periodically burns cycles
0036 in the following manner:
0037 
0038  time line between two timer interrupts
0039 |--------------------------------------|
0040  ^                                    ^
0041  |_ something begins working          |
0042                                       |_ something goes to sleep
0043                                      (only to be awaken quite soon)
0044 
0045 In the above situation the system will be 0% loaded according to the
0046 `/proc/stat' (since the timer interrupt will always happen when the
0047 system is executing the idle handler), but in reality the load is
0048 closer to 99%.
0049 
0050 One can imagine many more situations where this behavior of the kernel
0051 will lead to quite erratic information inside `/proc/stat'.
0052 
0053 
0054 /* gcc -o hog smallhog.c */
0055 #include <time.h>
0056 #include <limits.h>
0057 #include <signal.h>
0058 #include <sys/time.h>
0059 #define HIST 10
0060 
0061 static volatile sig_atomic_t stop;
0062 
0063 static void sighandler (int signr)
0064 {
0065      (void) signr;
0066      stop = 1;
0067 }
0068 static unsigned long hog (unsigned long niters)
0069 {
0070      stop = 0;
0071      while (!stop && --niters);
0072      return niters;
0073 }
0074 int main (void)
0075 {
0076      int i;
0077      struct itimerval it = { .it_interval = { .tv_sec = 0, .tv_usec = 1 },
0078                              .it_value = { .tv_sec = 0, .tv_usec = 1 } };
0079      sigset_t set;
0080      unsigned long v[HIST];
0081      double tmp = 0.0;
0082      unsigned long n;
0083      signal (SIGALRM, &sighandler);
0084      setitimer (ITIMER_REAL, &it, NULL);
0085 
0086      hog (ULONG_MAX);
0087      for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog (ULONG_MAX);
0088      for (i = 0; i < HIST; ++i) tmp += v[i];
0089      tmp /= HIST;
0090      n = tmp - (tmp / 3.0);
0091 
0092      sigemptyset (&set);
0093      sigaddset (&set, SIGALRM);
0094 
0095      for (;;) {
0096          hog (n);
0097          sigwait (&set, &i);
0098      }
0099      return 0;
0100 }
0101 
0102 
0103 References
0104 ----------
0105 
0106 http://lkml.org/lkml/2007/2/12/6
0107 Documentation/filesystems/proc.txt (1.8)
0108 
0109 
0110 Thanks
0111 ------
0112 
0113 Con Kolivas, Pavel Machek