Re: AMD64 X2 lost ticks on PM timer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 01, 2006 at 04:47:33PM +0100, Andi Kleen wrote:
> > My guess is the sata_nv driver, as it happens during heavy local file read.
> > The machines all have 2-4 SATA WD Raptors connected to the mobo.
> 
> Are you accessing all these disks in parallel with that cpio? If 
> yes could you try it with only a single disk? 
 
Yes, all of the hosts are LVM2 over MD RAID1.  The PostgreSQL 
LV has striping over the two MD RAID1 PVs.

> My box only has a single SATA disk. Maybe there is some 
> corner case in that SATA driver that leaks interrupt state
> and it's only turned on later by idle or softirq then.

Good call!  Stressing one disk results in no lost ticks.

I stuck a spare disk in one of the workstations that has its system
partitions on Ext3/LVM2/MD-RAID1, and then copied the 9GB /usr to
a raw Ext3 partition on the new disk:

    find usr | cpio -pdum /opt

That resulted in:

Mar  1 11:39:27 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:39:41 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:40:37 ti94 kernel: time.c: Lost 3 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:40:40 ti94 kernel: time.c: Lost 6 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:40:41 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:40:42 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:40:50 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:40:54 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:40:57 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:41:01 ti94 kernel: time.c: Lost 2 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:41:06 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:41:12 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:41:17 ti94 kernel: time.c: Lost 2 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:41:21 ti94 kernel: time.c: Lost 3 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:41:27 ti94 kernel: time.c: Lost 4 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:41:27 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:41:29 ti94 kernel: time.c: Lost 3 timer tick(s)! rip _spin_unlock_irqrestore+0xb/0xd)
Mar  1 11:41:42 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:41:43 ti94 kernel: time.c: Lost 2 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:41:43 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:41:46 ti94 kernel: time.c: Lost 2 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:41:55 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar  1 11:41:55 ti94 kernel: time.c: Lost 2 timer tick(s)! rip default_idle+0x37/0x7a)
Mar  1 11:41:57 ti94 kernel: time.c: Lost 1 timer tick(s)! rip default_idle+0x37/0x7a)
Mar  1 11:41:57 ti94 kernel: time.c: Lost 1 timer tick(s)! rip default_idle+0x37/0x7a)
Mar  1 11:42:00 ti94 kernel: time.c: Lost 2 timer tick(s)! rip default_idle+0x37/0x7a)
Mar  1 11:42:00 ti94 kernel: time.c: Lost 1 timer tick(s)! rip default_idle+0x37/0x7a)
Mar  1 11:42:00 ti94 kernel: time.c: Lost 2 timer tick(s)! rip default_idle+0x37/0x7a)
...

After a umount/mount on /opt, I did

   find /opt | cpio -o > /dev/null

and got no lost ticks in the log.  My "nice --10 ./trtc" gave me:

rugolsky@ti94: tail +10 one-disk | grep -v '=125'
1141232738:578610: rtc 448 int 124 0 (=124)
1141232807:67036: rtc 464 int 0 124 (=124)
1141232875:557669: rtc 448 int 0 124 (=124)

I converted the raw EXT3 partition to a degraded MD RAID1, and again
got no lost ticks.  Then I created a PV/VG/LV/Ext3 on top of the degraded MD RAID1,
populated it, and re-read it; once again, there were no lost ticks on the
single-disk read.

Time to instrument sata_nv, I suppose.  Many thanks for helping to narrow this
down.

	-Bill
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux