Gregory Machin wrote:
Hi my server hung, and when I checked the logs there's lots of nasty looking entries ... Are these hardware failure and if so what hardware ? Oct 31 15:44:47 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Oct 31 15:44:47 server kernel: ata1.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0 Oct 31 15:44:47 server kernel: res 51/04:00:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) Oct 31 15:44:47 server kernel: ata1.00: configured for UDMA/133 Oct 31 15:44:47 server kernel: ata1: EH complete Oct 31 15:44:47 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Oct 31 15:44:47 server kernel: ata1.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0 Oct 31 15:44:47 server kernel: res 51/04:00:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error)
As a layman, I think it doesn't look good. I do not like those words, "device error."
On the basis that I had what looked like terminated errors on my laptop yesterday (could not read _any_ files) but it seems okay after cycling power, I suggest you shut down and turn the port off at the wall.
After a minute - no more - restart the thing and run smartctl against all the ATA/SATA drives.
And make sure of your backups, you may need a really good one RSN.
Oct 31 15:44:47 server kernel: ata1.00: configured for UDMA/133 Oct 31 15:44:47 server kernel: ata1: EH complete Oct 31 15:44:48 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Oct 31 15:44:48 server kernel: ata1.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0 Oct 31 15:44:48 server kernel: res 51/04:00:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) Oct 31 15:44:48 server kernel: ata1.00: configured for UDMA/133 Oct 31 15:44:48 server kernel: ata1: EH complete Oct 31 15:44:48 server kernel: ata1.00: exception Emask 0x0 SAct 0x0
I don't know what that work "exception" means in this context. I'm familiar with it on IBM mainframes where "unit exception" means "end of file" and it's what tapes report when they read a tape mark, disk drives say when they read a zero-length block (IBM drives historically are not sectored at all) and card readers say when they reach the end of the deck and the operator's pressed the appropriate button. In that context, a device error might be reported as "unit check."
-- Cheers John -- spambait 1aaaaaaa@xxxxxxxxxxxxxxxx Z1aaaaaaa@xxxxxxxxxxxxxxxx -- Advice http://webfoot.com/advice/email.top.php http://www.catb.org/~esr/faqs/smart-questions.html http://support.microsoft.com/kb/555375 Please do not reply off-list