First, apologies for the long post! On 12/27/07, Dean S. Messing <deanm@xxxxxxxxxxxxx> wrote: > I'd be interested to see if anyone else on the list running > > F7_x86_64 and using the 2.6.23.8-34.fc7 kernel Thanks you Aaron Konstam and Jon Stanley for confirming that `hwclock --show' is working properly on your systems. Somehow, my Real Time Clock got mangled. I'm not sure how. I had been running some measurements with `adjtimex' and the output was going totally bonkers. That's when I noticed my `hwclock' problem. Upon shutting down, bringing up the BIOS, I saw that the RTC was nearly 10 hours off of UTC. So I reset it, rebooted, and now `hwclock --show' works correctly. I don't understand why I (and you Jon) get the "/dev/rtc does not have interrupt functions" but it _is_ getting the clock tick after all. Still, there is some real strangeness going on which I explain below. Let me preface this by saying that I am not a n00bie at NTP, adjusting kernel parameters related to time-keeping via `adjtimex', &c. I've worked with this stuff for several years. Jon, Aaron, I'd like to ask you both to run another test. (Thanks in advance.) As root, what does adjtimex --utc --compare=20 --interval=10 > /tmp/adj_file.txt give you? (It will take about 3.5 minutes for this to run) For the F7 machine on which I'm seeing the weird behaviour, I get: [root@medulla ~]# adjtimex --utc --compare=20 --interval=10 --- current --- -- suggested -- cmos time system-cmos error_ppm tick freq tick freq 1198880050 0.798082 1198880059 0.900194 10211.2 10000 10426576 1198880069 0.002171 -89802.3 10000 10426576 10899 4022989 1198880078 0.104114 10194.3 10000 10426576 9899 4246426 1198880087 0.206065 10195.1 10000 10426576 9899 4194864 1198880096 0.308030 10196.5 10000 10426576 9899 4102676 1198880105 0.409981 10195.1 10000 10426576 9899 4193301 1198880114 0.511942 10196.1 10000 10426576 9899 4129239 1198880123 0.612893 10095.1 10000 10426576 9900 4192826 1198880132 0.714856 10196.3 10000 10426576 9899 4116739 1198880141 0.816836 10198.0 10000 10426576 9899 4002676 1198880150 0.918770 10193.4 10000 10426576 9899 4305801 1198880160 0.020729 -89804.1 10000 10426576 10899 4141739 1198880169 0.122682 10195.3 10000 10426576 9899 4180801 1198880178 0.224641 10195.9 10000 10426576 9899 4141739 1198880187 0.326589 10194.8 10000 10426576 9899 4213614 1198880196 0.428548 10195.9 10000 10426576 9899 4141739 1198880205 0.530505 10195.7 10000 10426576 9899 4155801 1198880214 0.632459 10195.4 10000 10426576 9899 4174551 1198880223 0.734426 10196.7 10000 10426576 9899 4088614 The second column is in seconds and is the delta between the system clock and the RTC. On a well tuned machine with an accurate RTC and a system clock adjusted to run in sync with UTC (say, via NTP) this number should be small (less than 1 second) and nearly constant. The third column is the change in Parts per Million of the number in Column 2 from one line to the next. It tells you how much the RTC is deviating relative to the system clock. (100 ppm is 8.64 seconds per day.) As you can see, the system time appears to be advancing by almost exactly 0.1 seconds every 10 seconds relative to the RTC (cmos clock). Then, just as they are about to get out of phase by 1 second, something causes either the system clock to "jump back" by ~1 second or the RTC to jump forward. It is certainly _not_ the system time doing the jumping. In fact: [root@medulla ~]# ntpdate -q montpelier.ilan.caltech.edu server 192.12.19.20, stratum 1, offset -0.001485, delay 0.05589 28 Dec 14:23:03 ntpdate[29445]: adjust time server 192.12.19.20 offset -0.001485 sec which says that my system is about 1 ms ahead of "montpelier.ilan.caltech.edu" which is a highly accurate Statum 1 time server. This delta is nearly constant so my system clock is not jumping around. That leaves the RTC doing the jumping. But having an RTC that is runing nearly 10000 ppm slower than my system clock and which "jumps ahead" every 10 seconds or so seems absurd. Furthermore, when I halt the system, boot into the bios, and watch the RTC time evolve I see no jumps and it is certainly not running 1/10 second slower than wall time. That leaves two other possibilities: a bug in adjtimex or a bug in the kernel. That's where I am right now. Couple of other items: If I increase the --interval, above, to 20 seconds, I get (in part) [root@medulla ~]# adjtimex --utc --compare=20 --interval=20 --- current --- -- suggested -- cmos time system-cmos error_ppm tick freq tick freq 1198882092 0.313780 1198882111 0.415686 5095.3 10000 10426576 1198882130 0.516580 5044.7 10000 10426576 9951 942820 1198882149 0.617472 5044.6 10000 10426576 9951 950633 1198882168 0.718367 5044.8 10000 10426576 9951 939695 1198882187 0.820270 5095.1 10000 10426576 9950 4190951 1198882206 0.921165 5044.7 10000 10426576 9951 940476 1198882226 0.022069 -44954.8 10000 10426576 10451 910789 1198882245 0.122960 5044.6 10000 10426576 9951 952976 1198882264 0.224865 5095.2 10000 10426576 9950 4184701 The really odd thing is that the change in "system-cmos" from line to line is still about 0.1 seconds! If the RTC were really running slowly, it should be 0.2 seconds. In fact, no matter what I make the interval, the 0.1 seconds remains the same. So I vote for this being a bug in adjtimex (or the kernel). Just as a reference here's the output of the same commond on my FC6 laptop (whose sytem clock is also well sychronised with UTC): [root@axon ~]# adjtimex --utc --compare=20 --interval=20 --- current --- -- suggested -- cmos time system-cmos error_ppm tick freq tick freq 1198882771 0.000563 1198882791 0.001026 23.1 10000 -580512 1198882811 0.000995 -1.5 10000 -580512 9999 6072345 1198882831 0.001093 4.9 10000 -580512 9999 5652814 1198882851 0.000830 -13.2 10000 -580512 10000 282026 1198882871 0.001060 11.5 10000 -580512 9999 5220782 1198882891 0.000850 -10.5 10000 -580512 10000 104683 1198882911 0.001099 12.4 10000 -580512 9999 5158282 1198882931 0.001014 -4.3 10000 -580512 9999 6252033 1198882951 0.001111 4.8 10000 -580512 9999 5656720 As you can see, the drift between the RTC and the system is quite low, less than plus or minus 25ppm, and there's no wierd "jumping" going on. (Well there _is_ the "11 minute kernel correction" but that's another bedtime story.) I have no idea what's happening on my F7 system to cause the strange behaviour. Does anyone else? Dean