Re: Major SATA / EXT3 Issue?

Chris Holvenstot wrote:

I am curious if anyone else has had major problems with SATA drives on
the current series of kernels.  I have (or rather had) two SATA drives
on my system - the first was a Maxtor MaxLine 500 and the second was a
Maxtor MaxLine 250.

Both of these drives were plugged to the 1.5 Gigabyte / second mode.

My SATA controller is integrated on my MSI motherboard and sports four
ports.  It is implemented using the Nvidia CK804 chipset.  My processor

is an AMD64 X2 4600+ running the 32 bit version of Linux.

I have had these drives up and running for about six months.

The first drive "failed" about 10 days ago - and unfortunately I focused
on hardware error and after several attempts to get the drive back
online I physically pulled it from the system.  This drive was used for

backups and thus was not critical to day-to-day operations.

However, tonight I "lost" a second SATA drive, this one I use on a daily
basis for my kernel build and test processes.  It failed in the same
manner as the first, which makes me a little suspicious.


The first drive “failed” while I was running a modified Ubuntu 7.04
system. Because I focused on hardware as the reason for the failure I
did not collect specific information about the version of the kernel
being used, but it was likely to be 2.6.24-git8.


The second drive “failed” tonight on what is, except for the kernel, a
fairly standard Ubuntu 7.10 system (the same hardware - I upgraded my OS
this past week) – the kernel in use tonight at the time of the second
failure was 2.6.24-rc1-git1


In each case the failure mode appears to have been the same – the system
appears to lock up. When rebooted I get a long string of messages like:


Oct 26 20:07:37 localhost kernel: [ 101.581091] ata2: timeout waiting
for ADMA IDLE, stat=0x440

Oct 26 20:07:37 localhost kernel: [ 101.581096] sd 1:0:0:0: [sda] Write
Protect is off

Oct 26 20:07:37 localhost kernel: [ 101.581174] res
71/04:08:00:00:00/04:00:1d:00:00/e0 Emask 0x1 (device error)

Oct 26 20:07:37 localhost kernel: [ 101.644992] ata2.00: configured for
UDMA/33

Oct 26 20:07:37 localhost kernel: [ 101.644994] ata2: EH complete

Oct 26 20:07:37 localhost kernel: [ 101.645006] sd 1:0:0:0: [sda] Write
cache: disabled, read cache: enabled, doesn't support DPO or FUA

You should try and get some output from dmesg and not from the messageslog, as the log daemon seems to have a nasty habit of discardingcritical output from these errors. In this case the failing command ismissing and the message ordering even seems off.



The hardware appears to be correctly identified by the BIOS during the

power up sequence.


Not much is seen in the dmesg log excpet for:


[ 43.649673] scsi0 : sata_nv

[ 43.649722] scsi1 : sata_nv

[ 43.649776] ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xcc00
irq 19

[ 43.649778] ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xcc08
irq 19

There should be more than this at the very least.. As above, please tryto get output from dmesg itself.



When I try to run a file system check on these devices I get:




e2fsck 1.40.2 (12-Jul-2007)

fsck.ext2: No such file or directory while trying to open /dev/sdb1

The superblock could not be read or does not describe a correct ext2

filesystem. If the device is valid and it really contains an ext2

filesystem (and not swap or ufs or something else), then the superblock

is corrupt, and you might try running e2fsck with an alternate
superblock:

e2fsck -b 8193 <device>


I have a gut feeling that when the system appears to lock up what is
really going on is that the contents of the drive are being trashed. But
I have no proof of that.

I don't think that is the case, more like the drives have not beendetected at all. If this happens after a reboot when they were workingbefore, that sounds like some kind of a hardware issue most likely..



When I try to do a parted to see what the system thinks is on the drive
I get the error message:

Error: Error opening /dev/sdb: No medium found


I am not having any problems with my EXT3 file systems located on
“standard” IDE / PATA drives.


My config file, which has not changed in months beyond taking the
defaults during make oldconfig looks like:


--
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Prev by Date: Re: build #286 failed for 2.6.24-rc1-gea45d15 in linux/arch/x86/kernel/setup_32.c
Next by Date: Re: [RFC, PATCH] locks: remove posix deadlock detection
Previous by thread: Re: Major SATA / EXT3 Issue?
Next by thread: Re: Major SATA / EXT3 Issue?
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]