False "lost ticks" on dual-Opteron system (=> timer twice as fast)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I've recently set up a dual Opteron RAID server (AMD-8000-based Tyan 
Thunder K8S Pro SCSI board, 2 246 Opterons, stepping 10). Kernel is a 
modified 2.6.11.4-20a from SuSE 9.3 (SMP version, sure). The Opterons 
are capable of changing the CPU frequency (between 1GHz and 2GHz).

The system clock runs (on average) about twice as fast as it should be. 
A closer observation revealed that the clock jumps forward by about 
10-30 seconds every 10-30 seconds (plus other oddities, including 
backward clock jumps). The timer interrupts are distributed roughly 
evenly among the two CPUs, but looking at the timer interrupt number 
(grep timer /proc/interrupts) revealed that for about 10-30 seconds, 
one CPU gets the interrupt, and then the other CPU gets them; the 
transition causes the system clock to advance.

A quick look at timer_interrupt shows what I suspect is the culprit: 
Each CPU keeps track of the last TSC at a timer interrupt, and adds the 
"lost" ticks to jiffies when perceived necessary. If there's only a 
single jiffies, but two vxtime.last_tsc, it can't work.

A quick workaround would be to ditch the handling of the "lost" jiffies. 
I still anticipate to have annoying time skews by do_gettimeoffset() 
(that's what explains the other oddities - if I do gettimeofday() on 
the CPU that isn't getting interrupts, I'll going to add the "lost" 
jiffies, too). A proposed fix would be to *also* store the last jiffies 
value in the vxtime variable, and verify if it's really *this* CPU that 
did miss the timer interrupts. This local "last-stored-jiffies" can 
help do_gettimeoffset() to calculate the local time good enough on both 
CPUs.

What I can't believe is that I'm the only one who has this problem.

<rant>I know the timer system on an Intel or AMD system is broken by 
design, because there should be a single constant-clocked atomically 
read-only system-wide timer. But this is no excuse for that ;-).</rant>

-- 
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Attachment: pgp01yNFVepad.pgp
Description: PGP signature


[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux