Fedora Users — Re: NTPD and FC3

John DeDourek wrote:

Robert wrote:
....
+time.uswo.net 198.82.1.201 3 u 22 64 77 60.548 -1778.9 316.669 *0x50a13f43.boan 192.36.134.25 2 u 26 64 77 140.391 -1947.8 394.214 +blade.avnf.com 212.82.32.15 2 u 25 64 77 55.640 -1768.1 297.106 LOCAL(0) LOCAL(0) 10 l 26 64 77 0.000 0.000 0.002 ntpq>

[root@clem ~]# uname -a Linux clem 2.6.9-1.667 #1 Tue Nov 2 14:41:31 EST 2004 i586 i586 i386 GNU/Linux [root@clem ~]# ntpq ntpq> pe remote refid st t when poll reach delay offset jitter ==============================================================================

+zoiedog.com 131.107.1.10 2 u 391 1024 377 114.447 -24.679 1.572 *171.Red-80-36-1 130.206.3.166 2 u 425 1024 377 211.861 -33.906 1.004 +ns1.pulsation.f 194.2.0.28 3 u 399 1024 377 145.030 -36.732 3.050 LOCAL(0) LOCAL(0) 10 l 48 64 377 0.000 0.000 0.002 ntpq>
First puzzling thing is the numbers under "reach".  This represents
an 8 bit field (using base 8 notation) showing whether or not a
response was received to the last 8 polls.  Now if you left for
coffee, and then came back, we have
   clem: 377 = 11111111  all of last polls responded to
   mavis: 077 = 00111111 only most recent 6 polls received
                          an answer (or only 6 polls were issued)
With the shorter polling interval on mavis, it should have attempted
at least as many polls as clem, unless something is drastically
wrong (interrupts stuck, CPU maxed out, etc.)
It would be instructive on mavis to run nptq and issue the "assoc"
command.  This shows servers in the same order as the "peer" command.
then use "pstat associd" replacing "associd" by the peculiar number
under "assocID" in the output of assoc.
Most instructive would be the last three lines, which include the
delay and offset for the most recent 8 polls for the particular
server being queried.
My first guesses would be:
-- connectivity problem to the servers causing either widely varying
   delay, or an asymetric delay (consistently different delay of
   query and response)
-- badly implemented clock, e.g. on an interrupt that is getting
   disabled sufficiently long that interrupts are missed, or a
   processor that is being "speed adjusted" based on load, or some
   such.


Thanks for your reply!

The numbers in the "reach" column can be explained. On mavis, ntpd apparently throws up its hands, showing jitter of 4000 for all servers for one 64 second cycle, then starts over with reach=1...3...7...17, etc I changed the conditions this morning but did not spoil the bug. On mavis, I removed ntp-4.2.0.a.20040617-4 than came with FC3 and installed ntp-4.1.2-5.i386.rpm from my FC1 CDs. Then I copied the ntp.conf from my FC1 backup to the running system. The problem still exists.

This pstat was just taken:

ntpq> pstat 32409
status=9484 reach, conf, sel_candidat, 8 events, event_reach,
srcadr=now.cis.okstate.edu, srcport=123, dstadr=192.168.1.8,
dstport=123, leap=00, stratum=1, precision=-18, rootdelay=0.000,
rootdispersion=0.427, refid=PSC, reach=177, unreach=0, hmode=3, pmode=4,
hpoll=6, ppoll=6, flash=00 ok, keyid=0, offset=-633.919, delay=22.937,
dispersion=63.426, jitter=142.579,
reftime=c54337ef.1e45a1ca  Mon, Nov 15 2004  8:13:03.118,
org=c54337fc.97024f65  Mon, Nov 15 2004  8:13:16.589,
rec=c54337fd.3c3a647f  Mon, Nov 15 2004  8:13:17.235,
xmt=c54337fd.364d6e47  Mon, Nov 15 2004  8:13:17.212,
filtdelay=    22.94   22.98   22.11   22.50   22.85   22.81   23.81    0.00,
filtoffset= -633.92 -491.34 -390.64 -306.91 -195.18  -65.21  -16.16    0.00,
filtdisp=      0.01    1.00    1.96    2.92    3.88    4.86    5.82 16000.0
ntpq>

This is what the silly thing has done since I installed the older version of ntp this morning, which pretty much tells me that the second of your first guesses is homing in on the problem and that I should quit beating up on ntpd and start looking for another kernel.

[root@mavis ~]# grep ntpd /var/log/messages | tail -24 Nov 15 06:25:28 mavis ntpd[28082]: ntpd exiting on signal 15 Nov 15 06:25:29 mavis ntpd: ntpd shutdown succeeded Nov 15 06:28:04 mavis ntpd: ntpd shutdown failed Nov 15 06:31:40 mavis ntpd[18971]: ntpd 4.1.2@xxxxx Wed Oct 29 06:06:59 EST 2003 (1) Nov 15 06:31:41 mavis ntpd: ntpd startup succeeded Nov 15 06:31:41 mavis ntpd[18971]: precision = 9 usec Nov 15 06:31:41 mavis ntpd[18971]: kernel time discipline status 0040 Nov 15 06:31:41 mavis ntpd[18971]: frequency initialized 0.000 from /var/lib/ntp/drift Nov 15 06:35:55 mavis ntpd[18971]: time reset -4.487942 s Nov 15 06:35:55 mavis ntpd[18971]: kernel time discipline status change 41 Nov 15 06:35:55 mavis ntpd[18971]: synchronisation lost Nov 15 06:51:06 mavis ntpd[18971]: time reset -2.007400 s Nov 15 06:51:06 mavis ntpd[18971]: kernel time discipline status change 1 Nov 15 06:51:06 mavis ntpd[18971]: synchronisation lost Nov 15 07:06:22 mavis ntpd[18971]: time reset -1.899484 s Nov 15 07:06:22 mavis ntpd[18971]: synchronisation lost Nov 15 07:21:23 mavis ntpd[18971]: time reset -1.683434 s Nov 15 07:21:23 mavis ntpd[18971]: synchronisation lost Nov 15 07:36:27 mavis ntpd[18971]: time reset -1.750547 s Nov 15 07:36:27 mavis ntpd[18971]: synchronisation lost Nov 15 07:51:36 mavis ntpd[18971]: time reset -1.632804 s Nov 15 07:51:36 mavis ntpd[18971]: synchronisation lost Nov 15 08:06:46 mavis ntpd[18971]: time reset -2.003586 s Nov 15 08:06:46 mavis ntpd[18971]: synchronisation lost [root@mavis ~]#