Hey
All,
Having an
interesting problem with a FC6 server regarding host bus errors - every now and
again in the messages log there are the following:
Apr 20 04:58:05
lftvm01 kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Apr 20 04:58:05 lftvm01 kernel: ata2.00: (BMDMA stat 0x6)
Apr 20 04:58:05 lftvm01 kernel: ata2.00: cmd 25/00:f8:d7:ef:61/00:00:1c:00:00/e0 tag 0 cdb 0x0 data 126976 in
Apr 20 04:58:05 lftvm01 kernel: res 51/84:a7:28:f0:61/84:00:1c:00:00/e0 Emask 0x20 (host bus error)
Apr 20 04:58:05 lftvm01 kernel: ata2.00: configured for UDMA/133
Apr 20 04:58:05 lftvm01 kernel: ata2.01: configured for UDMA/133
Apr 20 04:58:05 lftvm01 kernel: ata2: EH complete
Apr 20 04:58:05 lftvm01 kernel: SCSI device sdc: 625142448 512-byte hdwr sectors (320073 MB)
Apr 20 04:58:05 lftvm01 kernel: sdc: Write Protect is off
Apr 20 04:58:05 lftvm01 kernel: SCSI device sdc: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Apr 20 04:58:05 lftvm01 kernel: SCSI device sdd: 625142448 512-byte hdwr sectors (320073 MB)
Apr 20 04:58:05 lftvm01 kernel: sdd: Write Protect is off
Apr 20 04:58:05 lftvm01 kernel: SCSI device sdd: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Apr 20 04:58:05 lftvm01 kernel: ata2.00: (BMDMA stat 0x6)
Apr 20 04:58:05 lftvm01 kernel: ata2.00: cmd 25/00:f8:d7:ef:61/00:00:1c:00:00/e0 tag 0 cdb 0x0 data 126976 in
Apr 20 04:58:05 lftvm01 kernel: res 51/84:a7:28:f0:61/84:00:1c:00:00/e0 Emask 0x20 (host bus error)
Apr 20 04:58:05 lftvm01 kernel: ata2.00: configured for UDMA/133
Apr 20 04:58:05 lftvm01 kernel: ata2.01: configured for UDMA/133
Apr 20 04:58:05 lftvm01 kernel: ata2: EH complete
Apr 20 04:58:05 lftvm01 kernel: SCSI device sdc: 625142448 512-byte hdwr sectors (320073 MB)
Apr 20 04:58:05 lftvm01 kernel: sdc: Write Protect is off
Apr 20 04:58:05 lftvm01 kernel: SCSI device sdc: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Apr 20 04:58:05 lftvm01 kernel: SCSI device sdd: 625142448 512-byte hdwr sectors (320073 MB)
Apr 20 04:58:05 lftvm01 kernel: sdd: Write Protect is off
Apr 20 04:58:05 lftvm01 kernel: SCSI device sdd: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
and then every now
and again
Apr 20 04:54:32
lftvm01 kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
frozen
Apr 20 04:54:32 lftvm01 kernel: ata2.00: cmd 25/00:08:ff:3d:8c/00:01:1d:00:00/e0 tag 0 cdb 0x0 data 135168 in
Apr 20 04:54:32 lftvm01 kernel: res 40/00:c7:d0:61:84/84:00:1d:00:00/e0 Emask 0x4 (timeout)
Apr 20 04:54:39 lftvm01 kernel: ata2: port is slow to respond, please be patient (Status 0xd0)
Apr 20 04:55:02 lftvm01 kernel: ata2: port failed to respond (30 secs, Status 0xd0)
Apr 20 04:55:02 lftvm01 kernel: ata2: soft resetting port
Apr 20 04:55:02 lftvm01 kernel: ata2.00: configured for UDMA/133
Apr 20 04:55:02 lftvm01 kernel: ata2.01: configured for UDMA/133
Apr 20 04:55:02 lftvm01 kernel: ata2: EH complete
Apr 20 04:54:32 lftvm01 kernel: ata2.00: cmd 25/00:08:ff:3d:8c/00:01:1d:00:00/e0 tag 0 cdb 0x0 data 135168 in
Apr 20 04:54:32 lftvm01 kernel: res 40/00:c7:d0:61:84/84:00:1d:00:00/e0 Emask 0x4 (timeout)
Apr 20 04:54:39 lftvm01 kernel: ata2: port is slow to respond, please be patient (Status 0xd0)
Apr 20 04:55:02 lftvm01 kernel: ata2: port failed to respond (30 secs, Status 0xd0)
Apr 20 04:55:02 lftvm01 kernel: ata2: soft resetting port
Apr 20 04:55:02 lftvm01 kernel: ata2.00: configured for UDMA/133
Apr 20 04:55:02 lftvm01 kernel: ata2.01: configured for UDMA/133
Apr 20 04:55:02 lftvm01 kernel: ata2: EH complete
Doesn't always cause
a crash but looks like sometimes it might. Have had a look around at various
places and haven't found a definitive cause let alone an
answer.
If anyone has any
ideas or theories please throw em out there. Curiously these errors only occur
on ata2, we have 2 drives sitting on ata1 without an issue. ata1 drives are the
"os" and ata2 drives are the "data"
Some more detail
about the machine / OS.
Kernel:
2.6.20-1.2933.fc6PAE
Drives:
4x WD 320Gb SATA Drives
ATA
Module: ata_piix
CPU:
Dual-Core Intel Xeon 3.0Ghz
Mem:
6Gb
Not exactly sure
what the motherboard is - it's a "whitebox" server not Tier1
(IBM,HP,Dell)
Cheers
Dave
Brown
IT Consultant
RHCE, MCP, CCA
IT Consultant
RHCE, MCP, CCA
175 Fullarton Rd
Dulwich
SA 5065
Ph: (08) 8304
8888
Fax: (08) 8364 2910
Mob: 0414 494 802
NOTE: This email and any files transmitted with it are confidential and may contain information intended only for the addressee(s). If you have received this communication in error, you must not copy or distribute any part of it or otherwise disclose its contents to anyone - please notify Loftus IT immediately. Loftus IT does not accept liability for any errors or omissions in the information provided herein. No representation is made that email and any files transmitted with it are virus-free - virus scanning is the responsibility of the recipient and is recommended.