Re: Major SATA / EXT3 Issue?

On Fri, Oct 26, 2007 at 09:21:52PM -0500, Chris Holvenstot wrote:
> In each case the failure mode appears to have been the same ??? the system
> appears to lock up. When rebooted I get a long string of messages like:
> 
> 
> Oct 26 20:07:37 localhost kernel: [ 101.581091] ata2: timeout waiting
> for ADMA IDLE, stat=0x440
> 
> Oct 26 20:07:37 localhost kernel: [ 101.581096] sd 1:0:0:0: [sda] Write
> Protect is off
> 
> Oct 26 20:07:37 localhost kernel: [ 101.581174] res
> 71/04:08:00:00:00/04:00:1d:00:00/e0 Emask 0x1 (device error)
> 
> Oct 26 20:07:37 localhost kernel: [ 101.644992] ata2.00: configured for
> UDMA/33
> 
> Oct 26 20:07:37 localhost kernel: [ 101.644994] ata2: EH complete
> 
> Oct 26 20:07:37 localhost kernel: [ 101.645006] sd 1:0:0:0: [sda] Write
> cache: disabled, read cache: enabled, doesn't support DPO or FUA

As it turns out, our organizations version control server just blew up 
last night. The filesystem is rather corrupt at the moment, fortunately 
we have backups.

We are running 2.6.22.5 (vanilla) on Debian 4.0. It has two SATA devices 
with Linux software raid mirror (mdadm) on Intel ata_piix chipset.
The filesystem is EXT3.

1. This happened during nightly backup:

init_special_inode: bogus i_mode (177777)
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd ca/00:68:6f:3a:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 53248 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1: port is slow to respond, please be patient (Status 0xd0)
ata1: device not ready (errno=-16), forcing hardreset
ata1: soft resetting port
ata1.00: revalidation failed (errno=-2)
ata1: failed to recover some devices, retrying in 5 secs
ata1: soft resetting port
ata1.00: configured for UDMA/133
ata1: EH complete
sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
init_special_inode: bogus i_mode (177777)
init_special_inode: bogus i_mode (177777)
init_special_inode: bogus i_mode (177777)
journal_bmap: journal block not found at offset 5133 on md2
Aborting journal on device md2.
ext3_abort called.
EXT3-fs error (device md2): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
__journal_remove_journal_head: freeing b_committed_data

2. I rebooted

3. fsck -p on boot failed

(it is very probable not many files were corrupted at this stage)

4. I ran fsck.ext3 -y

=> that corrupted lots and lots of files. This went 
into a loop, the fsck.ext3 restarted checking over and over again.

Heikki Orsila
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: Major SATA / EXT3 Issue?
  - From: Theodore Tso <[email protected]>

References:
- Major SATA / EXT3 Issue?
  - From: Chris Holvenstot <[email protected]>

Prev by Date: [PATCH 1/1] at32ap700x_wdt: add support for boot status and add fix for silicon errata
Next by Date: Re: [REGRESSION] 2.6.24-rc1 fails to boot on a 486
Previous by thread: Re: Major SATA / EXT3 Issue?
Next by thread: Re: Major SATA / EXT3 Issue?
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]