On Wed, Mar 01, 2006 at 04:47:33PM +0100, Andi Kleen wrote:
> > My guess is the sata_nv driver, as it happens during heavy local file read.
> > The machines all have 2-4 SATA WD Raptors connected to the mobo.
>
> Are you accessing all these disks in parallel with that cpio? If
> yes could you try it with only a single disk?
Yes, all of the hosts are LVM2 over MD RAID1. The PostgreSQL
LV has striping over the two MD RAID1 PVs.
> My box only has a single SATA disk. Maybe there is some
> corner case in that SATA driver that leaks interrupt state
> and it's only turned on later by idle or softirq then.
Good call! Stressing one disk results in no lost ticks.
I stuck a spare disk in one of the workstations that has its system
partitions on Ext3/LVM2/MD-RAID1, and then copied the 9GB /usr to
a raw Ext3 partition on the new disk:
find usr | cpio -pdum /opt
That resulted in:
Mar 1 11:39:27 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:39:41 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:40:37 ti94 kernel: time.c: Lost 3 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:40:40 ti94 kernel: time.c: Lost 6 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:40:41 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:40:42 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:40:50 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:40:54 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:40:57 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:41:01 ti94 kernel: time.c: Lost 2 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:41:06 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:41:12 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:41:17 ti94 kernel: time.c: Lost 2 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:41:21 ti94 kernel: time.c: Lost 3 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:41:27 ti94 kernel: time.c: Lost 4 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:41:27 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:41:29 ti94 kernel: time.c: Lost 3 timer tick(s)! rip _spin_unlock_irqrestore+0xb/0xd)
Mar 1 11:41:42 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:41:43 ti94 kernel: time.c: Lost 2 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:41:43 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:41:46 ti94 kernel: time.c: Lost 2 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:41:55 ti94 kernel: time.c: Lost 1 timer tick(s)! rip __do_softirq+0x55/0xd4)
Mar 1 11:41:55 ti94 kernel: time.c: Lost 2 timer tick(s)! rip default_idle+0x37/0x7a)
Mar 1 11:41:57 ti94 kernel: time.c: Lost 1 timer tick(s)! rip default_idle+0x37/0x7a)
Mar 1 11:41:57 ti94 kernel: time.c: Lost 1 timer tick(s)! rip default_idle+0x37/0x7a)
Mar 1 11:42:00 ti94 kernel: time.c: Lost 2 timer tick(s)! rip default_idle+0x37/0x7a)
Mar 1 11:42:00 ti94 kernel: time.c: Lost 1 timer tick(s)! rip default_idle+0x37/0x7a)
Mar 1 11:42:00 ti94 kernel: time.c: Lost 2 timer tick(s)! rip default_idle+0x37/0x7a)
...
After a umount/mount on /opt, I did
find /opt | cpio -o > /dev/null
and got no lost ticks in the log. My "nice --10 ./trtc" gave me:
rugolsky@ti94: tail +10 one-disk | grep -v '=125'
1141232738:578610: rtc 448 int 124 0 (=124)
1141232807:67036: rtc 464 int 0 124 (=124)
1141232875:557669: rtc 448 int 0 124 (=124)
I converted the raw EXT3 partition to a degraded MD RAID1, and again
got no lost ticks. Then I created a PV/VG/LV/Ext3 on top of the degraded MD RAID1,
populated it, and re-read it; once again, there were no lost ticks on the
single-disk read.
Time to instrument sata_nv, I suppose. Many thanks for helping to narrow this
down.
-Bill
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]