Re: NTP broken with 2.6.14

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



john stultz a écrit :

talla:~# uname -a
Linux talla 2.6.14-1 #2 PREEMPT Thu Nov 3 00:54:44 CET 2005 i686 GNU/Linux
talla:~# ntpdate -uq 10.0.0.1
server 10.0.0.1, stratum 3, offset -14.893095, delay 0.02644
3 Nov 01:31:59 ntpdate[8186]: step time server 10.0.0.1 offset -14.893095 sec


Hmm. Ok, so network wise you seem to be communicating with the server
without an issue. The other reasons for a reject condition are sync-loop
(your NTP server isn't synced to your box I'd assume), or your host is
drifting too severely from the NTP server for ntpd to compensate.

Attached is a cruddy python script I wrote that should provide you with
your ppm drift from your server.
To run:
o Disable ntpd
o Run "./drift-test.py <server>"
o Let it run for 10 minutes to get a decent drift value.

Ok. First I purged (remove the config files and binary) the adjtimex installation. Then I rebooted with the old 2.6.8 kernel and watch the first 5 polls of ntpd:

ntpq> pe
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.0.0.1 129.132.2.21 3 u 2 64 37 1.082 -45.761 24.670
ntpq> rv 54820
assID=54820 status=9614 reach, conf, sel_sys.peer, 1 event, event_reach,
srcadr=10.0.0.1, srcport=123, dstadr=10.0.33.10, dstport=123, leap=00,
stratum=3, precision=-17, rootdelay=35.065, rootdispersion=46.066,
refid=129.132.2.21, reach=037, unreach=0, hmode=3, pmode=4, hpoll=6,
ppoll=6, flash=00 ok, keyid=0, ttl=0, offset=-45.761, delay=1.082,
dispersion=438.994, jitter=24.670,
reftime=c713e731.c00053e2  Thu, Nov  3 2005  2:32:33.750,
org=c713e7b0.6cbe0ded  Thu, Nov  3 2005  2:34:40.424,
rec=c713e7b0.7bdb5d89  Thu, Nov  3 2005  2:34:40.483,
xmt=c713e7b0.7b72606f  Thu, Nov  3 2005  2:34:40.482,
filtdelay=     1.13    1.08    1.17    1.13    1.14    0.00    0.00    0.00,
filtoffset=  -58.48  -45.76  -32.77  -20.31   -7.60    0.00    0.00    0.00,
filtdisp=      0.01    0.96    1.92    2.86    3.82 16000.0 16000.0 16000.0

So with 2.6.8 this machine have a working ntpd. Now I stopped ntpd and used your script with the server 10.0.0.1:

03 Nov 02:36:32         offset: -6.9e-05        drift: -77.0 ppm
03 Nov 02:37:32         offset: -0.005162       drift: -84.7540983607 ppm
03 Nov 02:38:32         offset: -0.011573       drift: -95.7107438017 ppm
03 Nov 02:39:32         offset: -0.019045       drift: -105.26519337 ppm
03 Nov 02:40:32         offset: -0.02732        drift: -113.394190871 ppm
03 Nov 02:41:32         offset: -0.036287       drift: -120.581395349 ppm
03 Nov 02:42:32         offset: -0.045824       drift: -126.958448753 ppm
03 Nov 02:43:32         offset: -0.055755       drift: -132.45368171 ppm
03 Nov 02:44:33         offset: -0.065992       drift: -136.929460581 ppm
03 Nov 02:45:33         offset: -0.076472       drift: -141.10701107 ppm
03 Nov 02:46:33         offset: -0.087156       drift: -144.790697674 ppm

After, I rebooted the machine with the 2.6.14 kernel and watched the first 5 polls of ntpd:

ntpq> pe
remote refid st t when poll reach delay offset jitter
==============================================================================
10.0.0.1 129.132.2.21 3 u 2 64 37 1.106 -6989.1 3351.11
ntpq> rv 51860
assID=51860 status=9014 reach, conf, 1 event, event_reach,
srcadr=10.0.0.1, srcport=123, dstadr=10.0.33.10, dstport=123, leap=00,
stratum=3, precision=-17, rootdelay=36.804, rootdispersion=52.307,
refid=129.132.2.21, reach=037, unreach=0, hmode=3, pmode=4, hpoll=6,
ppoll=6, flash=00 ok, keyid=0, ttl=0, offset=-6989.157, delay=1.106,
dispersion=438.355, jitter=3351.111,
reftime=c713eb31.d4be1eb4  Thu, Nov  3 2005  2:49:37.831,
org=c713ec29.241b64e0  Thu, Nov  3 2005  2:53:45.141,
rec=c713ec30.21790752  Thu, Nov  3 2005  2:53:52.130,
xmt=c713ec30.21121251  Thu, Nov  3 2005  2:53:52.129,
filtdelay=     1.11    1.14    1.16    1.16    1.22    0.00    0.00    0.00,
filtoffset= -6989.1 -6239.9 -5482.4 -4133.2 -1164.0    0.00    0.00    0.00,
filtdisp=      0.01    0.97    1.95    2.91    3.88 16000.0 16000.0 16000.0

As before, the ntpd is not working properly with the 2.6.14 kernel. Now I stopped ntpd and used your script with the 10.0.0.1 server:

03 Nov 02:54:56         offset: -0.008247       drift: -4236.0 ppm
03 Nov 02:55:56         offset: -0.828716       drift: -13519.7540984 ppm
03 Nov 02:56:57         offset: -1.593172       drift: -13025.9098361 ppm
03 Nov 02:57:57         offset: -2.817531       drift: -15458.9010989 ppm
03 Nov 02:58:57         offset: -3.442019       drift: -14206.6446281 ppm
03 Nov 02:59:57         offset: -4.070492       drift: -13465.1688742 ppm
03 Nov 03:00:57         offset: -4.658962       drift: -12858.980663 ppm
03 Nov 03:01:57         offset: -5.267374       drift: -12472.4241706 ppm
03 Nov 03:02:57         offset: -5.843858       drift: -12115.8651452 ppm
03 Nov 03:03:57         offset: -7.052287       drift: -13004.199262 ppm
03 Nov 03:04:58         offset: -7.564786       drift: -12538.5986733 ppm

Interresting! So if I understand correctly the ntpd problem is because the kernel 2.6.14 kernel show a drift about 100 time bigger than with the kernel 2.6.8 on the same hardware. For information the mainboard is the MSI K7N2 (nForce 2). Here is the cpuinfo in case that matter:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 10
model name      : AMD Athlon(tm)
stepping        : 0
cpu MHz         : 2004.860
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 4013.69


citron:~# uname -a
Linux citron 2.6.12-nfs-1 #1 Fri Jun 24 18:23:39 CEST 2005 i686 GNU/Linux
citron:~# ntpdate -uq 10.0.0.1
server 10.0.0.1, stratum 3, offset 0.003676, delay 0.02647
3 Nov 01:32:06 ntpdate[13476]: adjust time server 10.0.0.1 offset 0.003676 sec
citron:~# ntpdate -uq 129.132.2.21
server 129.132.2.21, stratum 2, offset -0.010485, delay 0.04341
3 Nov 01:32:11 ntpdate[13477]: adjust time server 129.132.2.21 offset -0.010485 sec

So this could to be something after the 2.6.12. All machines run the same version of ntpd and use the same configuration file.


Would you mind confirming 2.6.12 does not have the issue on the same
hardware?

The kernel 2.6.12 run on a different hardware and is not configured to work on the hardware that have the problem with 2.6.14, so I can't confime exactly your question yet. If you don't have any better idea, I can try several kernel version to find when the problem start. But I will make that after I sleep because I you look at the time field into the test result this is very late now for me...

Thanks for you support, I hope we will quicky find to solution.
--
Jean-Christian de Rivaz
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux