0001 ========
0002 CPU load
0003 ========
0004
0005 Linux exports various bits of information via ``/proc/stat`` and
0006 ``/proc/uptime`` that userland tools, such as top(1), use to calculate
0007 the average time system spent in a particular state, for example::
0008
0009 $ iostat
0010 Linux 2.6.18.3-exp (linmac) 02/20/2007
0011
0012 avg-cpu: %user %nice %system %iowait %steal %idle
0013 10.01 0.00 2.92 5.44 0.00 81.63
0014
0015 ...
0016
0017 Here the system thinks that over the default sampling period the
0018 system spent 10.01% of the time doing work in user space, 2.92% in the
0019 kernel, and was overall 81.63% of the time idle.
0020
0021 In most cases the ``/proc/stat`` information reflects the reality quite
0022 closely, however due to the nature of how/when the kernel collects
0023 this data sometimes it can not be trusted at all.
0024
0025 So how is this information collected? Whenever timer interrupt is
0026 signalled the kernel looks what kind of task was running at this
0027 moment and increments the counter that corresponds to this tasks
0028 kind/state. The problem with this is that the system could have
0029 switched between various states multiple times between two timer
0030 interrupts yet the counter is incremented only for the last state.
0031
0032
0033 Example
0034 -------
0035
0036 If we imagine the system with one task that periodically burns cycles
0037 in the following manner::
0038
0039 time line between two timer interrupts
0040 |--------------------------------------|
0041 ^ ^
0042 |_ something begins working |
0043 |_ something goes to sleep
0044 (only to be awaken quite soon)
0045
0046 In the above situation the system will be 0% loaded according to the
0047 ``/proc/stat`` (since the timer interrupt will always happen when the
0048 system is executing the idle handler), but in reality the load is
0049 closer to 99%.
0050
0051 One can imagine many more situations where this behavior of the kernel
0052 will lead to quite erratic information inside ``/proc/stat``::
0053
0054
0055 /* gcc -o hog smallhog.c */
0056 #include <time.h>
0057 #include <limits.h>
0058 #include <signal.h>
0059 #include <sys/time.h>
0060 #define HIST 10
0061
0062 static volatile sig_atomic_t stop;
0063
0064 static void sighandler(int signr)
0065 {
0066 (void) signr;
0067 stop = 1;
0068 }
0069
0070 static unsigned long hog (unsigned long niters)
0071 {
0072 stop = 0;
0073 while (!stop && --niters);
0074 return niters;
0075 }
0076
0077 int main (void)
0078 {
0079 int i;
0080 struct itimerval it = {
0081 .it_interval = { .tv_sec = 0, .tv_usec = 1 },
0082 .it_value = { .tv_sec = 0, .tv_usec = 1 } };
0083 sigset_t set;
0084 unsigned long v[HIST];
0085 double tmp = 0.0;
0086 unsigned long n;
0087 signal(SIGALRM, &sighandler);
0088 setitimer(ITIMER_REAL, &it, NULL);
0089
0090 hog (ULONG_MAX);
0091 for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog(ULONG_MAX);
0092 for (i = 0; i < HIST; ++i) tmp += v[i];
0093 tmp /= HIST;
0094 n = tmp - (tmp / 3.0);
0095
0096 sigemptyset(&set);
0097 sigaddset(&set, SIGALRM);
0098
0099 for (;;) {
0100 hog(n);
0101 sigwait(&set, &i);
0102 }
0103 return 0;
0104 }
0105
0106
0107 References
0108 ----------
0109
0110 - https://lore.kernel.org/r/loom.20070212T063225-663@post.gmane.org
0111 - Documentation/filesystems/proc.rst (1.8)
0112
0113
0114 Thanks
0115 ------
0116
0117 Con Kolivas, Pavel Machek