Grant Ozolins wrote:
Hi all,
We enountered an unusual condition with the LSI SCSI drivers last night
- we got an "attempted task abort", followed by about 10 minutes of no
messages (I wasn't logged in at the time but believe the system was
unresponsive), followed by several minutes more of similar messages,
repeated:
Nov 27 22:08:48 hostname kernel: mptscsih: ioc0: attempting task abort!
(sc=ffff810078cb49c0)
Nov 27 22:08:48 hostname kernel: sd 0:0:0:0:
Nov 27 22:08:48 hostname kernel: command: Read(10): 28 00 01 fb
b7 4d 00 00 08 00
Nov 27 22:08:48 hostname kernel: mptbase: ioc0: IOCStatus(0x0048): SCSI
Task Terminated
Nov 27 22:08:48 hostname kernel: mptscsih: ioc0: task abort: SUCCESS
(sc=ffff810078cb49c0)
Nov 27 22:08:58 hostname kernel: mptscsih: ioc0: attempting task abort!
(sc=ffff810078cb49c0)
Nov 27 22:08:58 hostname kernel: sd 0:0:0:0:
Nov 27 22:08:58 hostname kernel: command: Test Unit Ready: 00 00
00 00 00 00
and then...
Nov 27 22:19:38 hostname kernel: mptbase: ioc0: IOCStatus(0x0048): SCSI
Task Terminated
Nov 27 22:19:40 hostname kernel: mptscsih: ioc0: task abort: SUCCESS
(sc=ffff810078cb49c0)
Nov 27 22:19:41 hostname kernel: mptscsih: ioc0: attempting task abort!
(sc=ffff81005b759e00)
Nov 27 22:19:42 hostname kernel: sd 0:0:0:0:
Nov 27 22:19:42 hostname kernel: command: Read(10): 28 00 01 fb
b7 fd 00 00 08 00
Nov 27 22:19:42 hostname kernel: mptbase: ioc0: IOCStatus(0x0048): SCSI
Task Terminated
Nov 27 22:19:42 hostname kernel: mptscsih: ioc0: task abort: SUCCESS
(sc=ffff81005b759e00)
Nov 27 22:19:42 hostname kernel: mptscsih: ioc0: attempting task abort!
(sc=ffff81005b759e00)
Nov 27 22:19:42 hostname kernel: sd 0:0:0:0:
Nov 27 22:19:42 hostname kernel: command: Test Unit Ready: 00 00
00 00 00 00
Nov 27 22:19:43 hostname kernel: mptbase: ioc0: IOCStatus(0x0048): SCSI
Task Terminated
Nov 27 22:19:43 hostname kernel: mptscsih: ioc0: task abort: SUCCESS
(sc=ffff81005b759e00)
Nov 27 22:19:43 hostname kernel: mptscsih: ioc0: attempting task abort!
(sc=ffff810091491e00)
Nov 27 22:19:43 hostname kernel: sd 0:0:0:0:
Nov 27 22:19:43 hostname kernel: command: Read(10): 28 00 01 fb
b8 75 00 00 08 00
Nov 27 22:19:43 hostname kernel: mptbase: ioc0: IOCStatus(0x0048): SCSI
Task Terminated
Nov 27 22:19:43 hostname kernel: mptscsih: ioc0: task abort: SUCCESS
(sc=ffff810091491e00)
(... etc)
This is on a dual opteron 1.8Ghz Tyan system, running a fairly minimal
FC5 install.
The problem resolved itself after a little while, but the web server
became unresponsive during this time - obviously it looks like some kind
of loop in the LSI kernel module - has anyone seen anything like this
before? It looks like something to report to the linux-kernel or
linux-scsi mailing lists, but as I'm using FC5 I thought I'd ask here
first.
Thanks in advance,
It looks to me like whatever the LSI device driver was attempting to talk to is
failing. The apparent sequence is that a SCSI command has failed to work
correctly (a read, by the looks of it), there is a SCSI command abort followed
by the device driver issuing Test Unit Ready. The TUR either took 10 mins. or
the device driver timed out after 10mins. Another read command then failed in a
similar way, resulting in the same sequence of events. At some point, presumably
whatever was attempting to read the device gave up and all went back to normal.
Does that device normally work ok? It might be going faulty, or there might be a
cabling or termination issue. I doubt it's a device driver fault, it looks to me
like a hardware read error/timeout issue, followed by re-tries, and/or
additional failed attempts.
If daytime TV made shows about computers, this one would be "When SCSI devices
go bad."
--
Nigel Wade, System Administrator, Space Plasma Physics Group,
University of Leicester, Leicester, LE1 7RH, UK
E-mail : nmw@xxxxxxxxxxxx
Phone : +44 (0)116 2523548, Fax : +44 (0)116 2523555