On Fri, Feb 8, 2008 at 8:34 PM, Reid Rivenburgh <reidr@xxxxxxxxx> wrote: > > On Feb 5, 2008 9:43 PM, Reid Rivenburgh <reidr@xxxxxxxxx> wrote: > > On Feb 5, 2008 5:07 PM, Reid Rivenburgh <reidr@xxxxxxxxx> wrote: > > > True enough! Another person privately mailed me, suggesting I run > > > sdparm periodically to prevent the drive from going asleep. Before > > > trying that, though, I'm just using the new USB cable. It's been okay > > > for several hours now, so maybe that was indeed the problem. I will > > > keep an eye on things and followup here with what I think is the final > > > verdict in a day or two. > > > > For those following my saga: Something just went wrong again. I got a > > bunch of "rejecting I/O to dead device" messages, as well as a lot of > > "new high speed USB device using ehci_hcd and address 65" (the number > > varies). It's unusual this time in that it's not just reconnecting > > immediately, instead showing that message. Does that mean anything to > > anyone? It looks like the ehci_hcd messages will go on forever, so I > > guess I will unplug it or power cycle it and hope it comes back. I > > guess the next thing to try is calling sdparm in a cron job, as one > > person suggested. > > Here is (hopefully) my last followup for those interested. As I > mentioned, someone suggested off-list that I run sdparm in a cron job > like this: > > * * * * * /usr/bin/sdparm /dev/sdd > /dev/null > > That could very well be overkill, but it seems like a very > light-weight process. In any case, my USB drive has been up without a > single disconnect for several days now. Looks like the person was > right: The drive was going to sleep on me. > > I hope this information helps someone else. Resurrecting this dead horse.... Since that last report, my USB hard drive still would occasionally disconnect and reconnect, even with the sdparm cron job. (I actually switched to touching a dummy file, because I could then hard-code the mounted path rather than using the correct dev file. See below....) When it happens, I manually unmount it, run e2fsck (just in case), and remount it. Kind of a hassle. That made me rethink the idea of using autofs to handle mounting. There were two reasons I wasn't using it: 1. I'm not sure how autofs, with its potential for frequent mounts/unmounts, works with e2fsck. If the USB drive (ext3) is configured to run e2fsck every N mounts, will it be done under autofs? If it does, would autofs just wait until the e2fsck is finished before mounting? I guess the best thing to do in that case would be to configure the filesystem to be checked after some period of time, not after some number of mounts. I'm still not clear on this. 2. The drive would usually be found as /dev/sdd or /dev/sde, but especially with the disconnecting/reconnecting issue where it'd bounce between the two, it wasn't predictable. I thought you had to enter one of them in the autofs config file, but it turns out you can enter the drive's UUID or filesystem label, neither of which change. Cool, that fixes it! Here's my auto.misc entry: disk2 -fstype=ext3 :/dev/disk/by-uuid/a95.... You could also use /dev/disk/by-label, but my labels have slashes in them, and the dev path had strange \\x2 characters in them so I went with uuid. I store music files on this external hard drive. This morning, I tried playing a few files, and it would consistently lose the drive within a minute. Here's a typical but probably unhelpful messages entry: Feb 15 08:20:50 pigpen kernel: scsi 29:0:0:0: rejecting I/O to dead device and I would get a console error everywhere about the journal having problems. Autofs would eventually timeout, unmounting and remounting the drive, so that sort of worked at least. Given I was accessing the drive at the time, it seems unlikely that the problem is it going to sleep. I assume autofs isn't unmounting the drive while it's in use. So I'm back to thinking there's some sort of hardware problem or a bug in the kernel's USB drivers. I'm not sure if anyone still has suggestions, but I thought I'd at least get this info out there in case other people are looking for similar problems/solutions. Thanks, reid