Robert wrote: ....
+time.uswo.net 198.82.1.201 3 u 22 64 77 60.548 -1778.9 316.669
*0x50a13f43.boan 192.36.134.25 2 u 26 64 77 140.391 -1947.8 394.214
+blade.avnf.com 212.82.32.15 2 u 25 64 77 55.640 -1768.1 297.106
LOCAL(0) LOCAL(0) 10 l 26 64 77 0.000 0.000 0.002
ntpq>
[root@clem ~]# uname -a
Linux clem 2.6.9-1.667 #1 Tue Nov 2 14:41:31 EST 2004 i586 i586 i386 GNU/Linux
[root@clem ~]# ntpq
ntpq> pe
remote refid st t when poll reach delay offset jitter
==============================================================================
+zoiedog.com 131.107.1.10 2 u 391 1024 377 114.447 -24.679 1.572
*171.Red-80-36-1 130.206.3.166 2 u 425 1024 377 211.861 -33.906 1.004
+ns1.pulsation.f 194.2.0.28 3 u 399 1024 377 145.030 -36.732 3.050
LOCAL(0) LOCAL(0) 10 l 48 64 377 0.000 0.000 0.002
ntpq>
First puzzling thing is the numbers under "reach". This represents an 8 bit field (using base 8 notation) showing whether or not a response was received to the last 8 polls. Now if you left for coffee, and then came back, we have clem: 377 = 11111111 all of last polls responded to mavis: 077 = 00111111 only most recent 6 polls received an answer (or only 6 polls were issued) With the shorter polling interval on mavis, it should have attempted at least as many polls as clem, unless something is drastically wrong (interrupts stuck, CPU maxed out, etc.)
It would be instructive on mavis to run nptq and issue the "assoc" command. This shows servers in the same order as the "peer" command. then use "pstat associd" replacing "associd" by the peculiar number under "assocID" in the output of assoc.
Most instructive would be the last three lines, which include the delay and offset for the most recent 8 polls for the particular server being queried.
My first guesses would be: -- connectivity problem to the servers causing either widely varying delay, or an asymetric delay (consistently different delay of query and response) -- badly implemented clock, e.g. on an interrupt that is getting disabled sufficiently long that interrupts are missed, or a processor that is being "speed adjusted" based on load, or some such.
Thanks for your reply!
The numbers in the "reach" column can be explained. On mavis, ntpd apparently throws up its hands, showing jitter of 4000 for all servers for one 64 second cycle, then starts over with reach=1...3...7...17, etc
I changed the conditions this morning but did not spoil the bug. On mavis, I removed ntp-4.2.0.a.20040617-4 than came with FC3 and installed ntp-4.1.2-5.i386.rpm from my FC1 CDs. Then I copied the ntp.conf from my FC1 backup to the running system. The problem still exists.
This pstat was just taken:
ntpq> pstat 32409 status=9484 reach, conf, sel_candidat, 8 events, event_reach, srcadr=now.cis.okstate.edu, srcport=123, dstadr=192.168.1.8, dstport=123, leap=00, stratum=1, precision=-18, rootdelay=0.000, rootdispersion=0.427, refid=PSC, reach=177, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, flash=00 ok, keyid=0, offset=-633.919, delay=22.937, dispersion=63.426, jitter=142.579, reftime=c54337ef.1e45a1ca Mon, Nov 15 2004 8:13:03.118, org=c54337fc.97024f65 Mon, Nov 15 2004 8:13:16.589, rec=c54337fd.3c3a647f Mon, Nov 15 2004 8:13:17.235, xmt=c54337fd.364d6e47 Mon, Nov 15 2004 8:13:17.212, filtdelay= 22.94 22.98 22.11 22.50 22.85 22.81 23.81 0.00, filtoffset= -633.92 -491.34 -390.64 -306.91 -195.18 -65.21 -16.16 0.00, filtdisp= 0.01 1.00 1.96 2.92 3.88 4.86 5.82 16000.0 ntpq>
This is what the silly thing has done since I installed the older version of ntp this morning, which pretty much tells me that the second of your first guesses is homing in on the problem and that I should quit beating up on ntpd and start looking for another kernel.
[root@mavis ~]# grep ntpd /var/log/messages | tail -24
Nov 15 06:25:28 mavis ntpd[28082]: ntpd exiting on signal 15
Nov 15 06:25:29 mavis ntpd: ntpd shutdown succeeded
Nov 15 06:28:04 mavis ntpd: ntpd shutdown failed
Nov 15 06:31:40 mavis ntpd[18971]: ntpd 4.1.2@xxxxx Wed Oct 29 06:06:59 EST 2003 (1)
Nov 15 06:31:41 mavis ntpd: ntpd startup succeeded
Nov 15 06:31:41 mavis ntpd[18971]: precision = 9 usec
Nov 15 06:31:41 mavis ntpd[18971]: kernel time discipline status 0040
Nov 15 06:31:41 mavis ntpd[18971]: frequency initialized 0.000 from /var/lib/ntp/drift
Nov 15 06:35:55 mavis ntpd[18971]: time reset -4.487942 s
Nov 15 06:35:55 mavis ntpd[18971]: kernel time discipline status change 41
Nov 15 06:35:55 mavis ntpd[18971]: synchronisation lost
Nov 15 06:51:06 mavis ntpd[18971]: time reset -2.007400 s
Nov 15 06:51:06 mavis ntpd[18971]: kernel time discipline status change 1
Nov 15 06:51:06 mavis ntpd[18971]: synchronisation lost
Nov 15 07:06:22 mavis ntpd[18971]: time reset -1.899484 s
Nov 15 07:06:22 mavis ntpd[18971]: synchronisation lost
Nov 15 07:21:23 mavis ntpd[18971]: time reset -1.683434 s
Nov 15 07:21:23 mavis ntpd[18971]: synchronisation lost
Nov 15 07:36:27 mavis ntpd[18971]: time reset -1.750547 s
Nov 15 07:36:27 mavis ntpd[18971]: synchronisation lost
Nov 15 07:51:36 mavis ntpd[18971]: time reset -1.632804 s
Nov 15 07:51:36 mavis ntpd[18971]: synchronisation lost
Nov 15 08:06:46 mavis ntpd[18971]: time reset -2.003586 s
Nov 15 08:06:46 mavis ntpd[18971]: synchronisation lost
[root@mavis ~]#