On 27 Aug 2007, Mike Fleetwood wrote: > On 27 Aug 2007, Mike Fleetwood wrote: > > Hi, > > > > Since I upgraded my FC6 box from kernel-2.6.20-1.2962.fc6 to > > kernel-2.6.22.1-32.fc6, and now 2.6.22.2-42.fc6, I am getting pauses > > from the whole OS. They last ~1 second and occur every few minutes. > > Every application becomes unresponsive for the duration. The 1 second > > scheduler latency this causes is long enough for my music player to be > > effected and the audio track to be interrupted. This makes the fault > > very easy to hear. At the same time kernel thread events/0 seems to > > use all the CPU time. Here is the first few lines of top's output > > when a pause happens: > > top - 21:39:41 up 1:17, 2 users, load average: 1.05, 1.20, 1.24 > > Tasks: 138 total, 4 running, 134 sleeping, 0 stopped, 0 zombie > > Cpu(s): 2.3%us, 29.3%sy, 68.4%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > > Mem: 2074880k total, 1355244k used, 719636k free, 62144k buffers > > Swap: 1004052k total, 0k used, 1004052k free, 882684k cached > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 5525 mike 39 19 202m 81m 4016 R 68.4 4.0 13:07.46 hadcm3transum_5 > > 6 root 15 -5 0 0 0 S 28.5 0.0 0:53.40 events/0 > > 3619 mike 20 0 45620 8264 5856 S 0.9 0.4 0:18.58 xmms > > 3318 root 20 0 328m 46m 8132 S 0.6 2.3 1:39.60 Xorg > > 3524 mike 20 0 76728 23m 9640 S 0.6 1.2 1:13.15 bittorrent > > 3557 mike 20 0 211m 115m 30m S 0.6 5.7 3:54.24 firefox-bin > > 3489 mike 20 0 57240 23m 15m S 0.3 1.1 0:04.02 gnome-terminal > > 5583 root 20 0 2204 1100 832 R 0.3 0.1 0:01.14 top > > 28.5% CPU time used by events/0 of a single 3 second top refresh is > > 0.85 seconds of CPU time. Rebooting back to kernel 2.6.20 completely > > fixes it. > > > > Has any one else seen this issue? > > Does anyone know what kernel thread events/0 does? > > Could this be related to CFS newly introduced into Fedora's kernel 2.6.22? > > Can anybody suggest how to fix this issue? > > Thanks for the replies so far. > > 1) Seems no one else has seen this issue. > 2) Kernel has one events thread per CPU, numbered 0 upwards. They are > used to get kernel work do a little later, for example a device drive > interrupt handler might use one. Ref: > http://docs.blackfin.uclinux.org/doku.php?id=kernel_events > 3) I have also just compiled Linus' kernel 2.6.22.2 and get the same > OS wide 1 second pauses. Therefore it's not a Fedora specific patch, > such as CFS, causing the fault. > 4) Now off to try a binary search of Linus' kernel releases between > 2.6.20 and 2.6.22.2 to see when it got introduced. > > Still could be a kernel software fault, but as it seems no one else > has this issue perhaps it is buggy hardware the kernel no longer > handles as well. (I don't think that I run any unusual software or > uncommon hardware). Could be a while before I finish binary searching > kernel releases. After lots of kernel compiling and testing I finally tracked down the causes ... 1) Between Fedora kernels 2.6.20-1.2962.fc6 and 2.6.22.1-32.fc6 this configuration change was the one which triggered the ~1 second OS pauses to appear: diff -y /boot/config-2.6.20-1.2962.fc6 /boot/config-2.6.22.1-22.fc6 ... CONFIG_RTC=y | # CONFIG_RTC is not set > CONFIG_GEN_RTC=y > CONFIG_GEN_RTC_X=y ... 2) I also run chrony (http://chrony.sunsite.dk/) rather than ntp to time synchronise my machine. (It monitors and adjusts the hardware RTC as well as the kernel software clock, hence accesses /dev/rtc). Running chrony is probably very rare, hence why no one else has seen this issue. I still want to understand what the above kernel configuration changes actually do and which RTC related OS calls chrony is making to cause the events thread to hog the CPU solidly for ~1 second. Mike