Moin.
The machine:
amd64, 2G mem, 4 SATA disks on SiI 3114 [SATALink/SATARaid] Serial ATA
Controller, configured as md raid5/raid1, using [cfq-scheduler], running
2.6.12-rc6 (application postgresql)
The story:
This morning the machine was very slow, first check shows that the machine
swaps and all disk i/o are very slow.
Looking further slabtop shows
Active / Total Objects (% used) : 19821561 / 19828316 (100.0%)
Active / Total Slabs (% used) : 369737 / 369739 (100.0%)
Active / Total Caches (% used) : 80 / 120 (66.7%)
Active / Total Size (% used) : 1415795.50K / 1416586.19K (99.9%)
Minimum / Average / Maximum Object : 0.02K / 0.07K / 128.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
9865575 9864690 99% 0.02K 43847 225 175388K biovec-1
9865254 9864654 99% 0.12K 318234 31 1272936K bio
28755 28755 100% 0.09K 639 45 2556K buffer_head
16856 16856 100% 0.52K 2408 7 9632K radix_tree_node
11286 10852 96% 0.21K 627 18 2508K dentry_cache
5955 5953 99% 0.73K 1191 5 4764K shmem_inode_cache
5795 3461 59% 0.06K 95 61 380K size-64
5566 5506 98% 0.17K 253 22 1012K vm_area_struct
3294 3294 100% 0.66K 549 6 2196K reiser_inode_cache
3240 3212 99% 0.07K 60 54 240K sysfs_dir_cache
2046 2046 100% 0.12K 66 31 264K size-128
2023 1657 81% 0.03K 17 119 68K size-32
*ouch*
I decided to first compile 2.6.12 (final) and reboot the machine to reproduce
the problem.
During reboot md shows that /dev/sdd are failed. Log:
[...]
Jun 23 11:23:45 c64 ata4: command 0x35 timeout, stat 0xd8 host_stat 0x0
Jun 23 11:23:45 c64 ata4: status=0xd8 { Busy }
Jun 23 11:23:45 c64 SCSI error : <3 0 0 0> return code = 0x8000002
Jun 23 11:23:45 c64 sdd: Current: sense key: Aborted Command
Jun 23 11:23:45 c64 Additional sense: Scsi parity error
Jun 23 11:23:45 c64 end_request: I/O error, dev sdd, sector 234436299
Jun 23 11:23:45 c64 raid5: Disk failure on sdd3, disabling device. Operation
continuing on 3 devices
Jun 23 11:23:45 c64 RAID5 conf printout:
Jun 23 11:23:45 c64 --- rd:4 wd:3 fd:1
Jun 23 11:23:45 c64 disk 0, o:1, dev:sda3
Jun 23 11:23:45 c64 disk 1, o:1, dev:sdb3
Jun 23 11:23:45 c64 disk 2, o:1, dev:sdc3
Jun 23 11:23:45 c64 disk 3, o:0, dev:sdd3
Jun 23 11:23:45 c64 RAID5 conf printout:
Jun 23 11:23:45 c64 --- rd:4 wd:3 fd:1
Jun 23 11:23:45 c64 disk 0, o:1, dev:sda3
Jun 23 11:23:45 c64 disk 1, o:1, dev:sdb3
Jun 23 11:23:45 c64 disk 2, o:1, dev:sdc3
Jun 23 11:24:39 c64 ATA: abnormal status 0xD8 on port 0xFFFFC200000066C7
Jun 23 11:24:39 c64 ATA: abnormal status 0xD8 on port 0xFFFFC200000066C7
Jun 23 11:24:39 c64 ATA: abnormal status 0xD8 on port 0xFFFFC200000066C7
Jun 23 11:25:09 c64 ata4: command 0x25 timeout, stat 0xd8 host_stat 0x1
Jun 23 11:25:09 c64 ata4: status=0xd8 { Busy }
Jun 23 11:25:09 c64 SCSI error : <3 0 0 0> return code = 0x8000002
Jun 23 11:25:09 c64 sdd: Current: sense key: Aborted Command
Jun 23 11:25:09 c64 Additional sense: Scsi parity error
Jun 23 11:25:09 c64 end_request: I/O error, dev sdd, sector 2640176
Jun 23 11:25:09 c64 raid5: Disk failure on sdd2, disabling device. Operation
continuing on 3 devices
Jun 23 11:25:09 c64 RAID5 conf printout:
Jun 23 11:25:09 c64 --- rd:4 wd:3 fd:1
Jun 23 11:25:09 c64 disk 0, o:1, dev:sda2
Jun 23 11:25:09 c64 disk 1, o:1, dev:sdb2
Jun 23 11:25:09 c64 disk 2, o:1, dev:sdc2
Jun 23 11:25:09 c64 disk 3, o:0, dev:sdd2
Jun 23 11:25:09 c64 RAID5 conf printout:
Jun 23 11:25:09 c64 --- rd:4 wd:3 fd:1
Jun 23 11:25:09 c64 disk 0, o:1, dev:sda2
Jun 23 11:25:09 c64 disk 1, o:1, dev:sdb2
Jun 23 11:25:09 c64 disk 2, o:1, dev:sdc2
[...]
(next i should look at the machines even i am at linuxtag:-)
Now i readded /dev/sdd[2,3] and everthing worked again (after checking all
cables).
Looks like a BUG, slab should not be filled up if a disk fails.
Ok, this is a 'needed' testingmachine, but i'm willing to try reproducing it,
if nobody else are able to do it ;-)
<earny>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]