On Tue, 29 May 2007, Dan Aloni wrote:
> Hello,
>
> We have a system where the rootfs is a partition on a USB device,
> and I've noticed upon a few rare cases where the USB controller
> loses the connection to the USB device after some uptime (days,
> weeks...), and the USB device reappears a very short time later.
That failure mode is pretty uncommon. More often what happens is the
connection remains intact but communication/protocol/firmware/???
errors cause the device to stop working. It never disappears but it
can't be used again without unplugging or power-cycling.
> It doesn't really matter why, I guess that USB controllers and
> storage devices aren't always of the best quality - let's assume
> that. The issue is that Linux currently makes it problematic
> to recover the rootfs for several reasons.
>
> If /dev/sda1 is mounted as root and the SCSI device goes into
> offline mode, it is quite impossible to get out of this situation.
> At first, I had to disable the usermode helper that sends events
> to udev since it blocks on the dysfunction /dev/sda1. Afterwards,
> I noticed that the reappearing device is being assigned with
> /dev/sdb along with a new SCSI host. I guess the the VFS and
> block layer still keep a reference to the old scsi_disk and as
> a result sd doesn't free it.
Yes.
> So, I gave some thoughts about this, I and came up with two
> main solutions:
>
> 1) Improve the USB storage error handling - bind the already
> existing SCSI host to the USB port that has the device, e.g.,
> if host2 got created for usb 5-3 then keep it that way for the
> sake of EH. /dev/sda1 should come to life when the USB device
> recovers, unless a few seconds have passed or some attributes
> (such as manufactor id or serial) have changed.
The difficulty here is that this "error" is indistinguishable from
normal activity -- someone simply unplugs the device and then later on
another connection is made. It might be the same device as before or
it might be a different one. In other words, it isn't really an error.
You would solve this by relying on "a few seconds" timeout.
> 2) Block layer hack - Write a special layering block device
> driver that acts as a proxy to the currently functioning
> /dev/sd* which gets auto-detected by this block layer driver.
> md multipath for the 'poor', you might say..
>
> I chose '2' for the time being. It was much easier than hacking
> around the USB subsystem... Yes, I know rootfs on USB is an
> uncommon use-case at the moment but I do like to know if there
> are some improvements planned in this area.
Interesting. I've been working on something very much like your 1) for
some time. A version of it is now in the USB development tree. It's
not meant for situations like the one you describe; instead it is
intended for suspend-to-RAM or hibernation. Generally these processes
involve loss of power on the USB bus, causing it to appear as though
all the devices have been unplugged.
In theory the same approach could be used to recover connections even
in the absence of an intervening suspend. It would be clumsy in some
respects, though. For instance, when a device actually was
disconnected, the USB core wouldn't be able to notify the device's
driver for a few seconds. In that time lots of unwanted activity and
errors could mount up.
It also goes against the USB specification. And it is potentially
unsafe, in that it is possible for users to change media or make other
alterations that the kernel cannot detect. The same would be true of
your proposal, assuming that somebody was quick enough to unplug one
device and plug in another (or swap memory cards) in the span of a few
seconds.
Alan Stern
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]