Re: [PATCH] i386: fix TSC clock source calibration error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Dave Johnson <[email protected]> wrote:

> I ran into this problem on a system that was unable to obtain NTP sync 
> because the clock was running very slow (over 10000ppm slow). ntpd had 
> declared all of its peers 'reject' with 'peer_dist' reason.
> 
> On investigation, the tsc_khz variable was significantly incorrect 
> causing xtime to run slow.  After a reboot tsc_khz was correct so I 
> did a reboot test to see how often the problem occurred:
> 
> Test was done on a 2000 Mhz Xeon system.  Of 689 reboots, 8 of them 
> had unacceptable tsc_khz values (>500ppm):
> 
>  range of tsc_khz  # of boots  % of boots
> -----------------  ----------  ----------
>         < 1999750           0      0.000%
> 1999750 - 1999800          21      3.048%
> 1999800 - 1999850         166     24.128%
> 1999850 - 1999900         241     35.029%
> 1999900 - 1999950         211     30.669%
> 1999950 - 2000000          42      6.105%
> 2000000 - 2000000           0      0.000%
> 2000050 - 2000100           0      0.000%
>                    [...]
> 2000100 - 2015000           1      0.145%  << BAD
> 2015000 - 2030000           6      0.872%  << BAD
> 2030000 - 2045000           1      0.145%  << BAD
> 2045000 <                   0      0.000%
> 
> The worst boot was 2032.577 Mhz, over 1.5% off!

you are plain crazy, 689 reboots! :-)

> It appears that on rare occasions, mach_countup() is taking longer to 
> complete than necessary.
> 
> I suspect that this is caused by the CPU taking a periodic SMI 
> interrupt right at the end of the 30ms calibration loop.  This would 
> cause the loop to delay while the SMI BIOS hander runs. The resulting 
> TSC value is beyond what it actually should be resulting in a higher 
> tsc_khz.
> 
> The below patch makes native_calculate_cpu_khz() take the best 
> (shortest duration, lowest khz) run of it's 3 calibration loops.  If a 
> SMI goes off causing a bad result (long duration, higher khz) it will 
> be discarded.
> 
> With the patch applied, 300 boots of the same system produce good
> results:
> 
>  range of tsc_khz  # of boots  % of boots
> -----------------  ----------  ----------
>         < 1999750           0      0.000%
> 1999750 - 1999800          30     10.000%
> 1999800 - 1999850         166     55.333%
> 1999850 - 1999900          89     29.667%
> 1999900 - 1999950          15      5.000%
> 1999950 <                   0      0.000%
> 
> Problem was found and tested against 2.6.18.  Patch is against 2.6.22.

very cool problem description and debugging, and a very nice patch! 
We've added your fix to the x86 tree, will go to Linus in the next batch 
of fixes. This patch is a stable kernel candidate as well.

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux