Re: [Ext2-devel] Re: [RFD] FS behavior (I/O failure) in kernel summit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Theodore Ts'o wrote:

>On Tue, Jun 14, 2005 at 11:46:36AM +0900, Kenichi Okuyama wrote:
>  
>
>>I agree that kernel can not directly influence user.
>>But, application may have better chance.
>>
>>Think about case of editor (vi, emacs, almost any text editors are ok ).
>>
>>If you try to save file, and recieve no error, user will believe they 
>>have been written on disk they believe to be existing.
>>Even log yells for error, user will not notice.
>>
>>If editor recieve error, then user can know something is wrong. Though 
>>he is still wondering, if he recieve the message
>>like "Input Output Error: may be HW error?", he definitely will start 
>>from looking at cable.
>>    
>>
>
>Kenichi-San,
>
>Part of the problem is that we are limited by the constraints of the
>POSIX specification for error handling. 
>
Ted, if I understand you correctly, I agree with you.  ;-)

What users need is for a window to pop up saying "the usb drive is
turned off" or "we are getting checksum errors from XXX, this may
indicate hardware problems that require your attention".

Now that GUIs exist, and now that more errors are possible because the
kernel is more complex, perhaps kernel error handling should be
reconsidered.  I don't have the feeling that anyone has felt themselves
authorized to take a deep look at how this ought to be designed.  I mean
sure, there are sometimes console windows that things get printed into,
but unsophisticated users basically want to be prompted if something is
wrong that needs their attention and to not have their experience
cluttered by a console window otherwise.  Also, it has long been
irritating having to make error codes conform to one of the existing
error codes when there is often no good connection between the name of
an existing error code and the new error condition one has just coded,
and there is no space left for new error codes.

Ted, what do you think?

> For example, we don't have a
>way of telling the application, "the reason why you the filesystem was
>remounted-read-only was in reaction to an I/O error that appears to be
>caused by the multiple CRC checksum errors reported by the SCSI
>controller".  We can only return EIO or EROFS.  And while the write()
>which causes an I/O error that remounts the filesystem read/only can
>(and probably does) return EIO, any subsequent writes will return
>EROFS, and changing this would be hard, hackish, and probably wouldn't
>be accepted.
>
>Also, there is not neccesarily one right answer to how to respond to a
>underlying I/O error in the filesystem.  So for ext2/3 filesystem, it
>is configurable.  In case of an underlying error detected in the
>filesystem metadata, the filesystem can be set to either (a) panic and
>force a reboot, so that hopefully fsck can resolve the issue, (b)
>remount the filesystem read/only, to prevent further damage, or (c)
>continue and do nothing (the don't worry, be happy approach).
>Different users will want different approaches, and so trying to
>standardize what applications will see at the user level doesn't seem
>like the right approach, since we want to allow system administrators
>some flexibility about how they wish to configure their systems.
>  
>
Perhaps these policy choices should be mount options, what do you think?

>(For example, an embedded system or a system where there is higher
>levels of redundancy, the right answer might be to panic and either
>reboot or halt --- continuing and possibly returning wrong answers
>might be completely unacceptable, and it may be that the once the
>system goes down hard, the adjacent backup blade can pick up
>operations.)
>
>So instead of trying to standardize the existing error returns, which
>are they way they are and for which trying to standardize them would
>probably be not worth the effort, since they don't return enough
>context to the application anyway ---- I would suggest the better
>thing to do is to design a new mechanism for returning block device
>errors via either some kind of notifcation mechanism (pick your choice
>of hotplug, dbus, or netlink --- dbus may make the most amount of
>sense, since multiple applications may want to subscribe to such
>notifications) of problems at the filesystem level, so that
>applications can take corrective action as necessary.  
>
>This is a better approach, since it far more flexible and returns much
>more information to the user.  For example, in a desktop environment,
>the desktop can pop up a warning dialog to the user of a failure of a
>block device or filesystem corruption, without having to modify every
>single application.  In the case of an embedded system, the
>notification can trigger an appropriate failover or recovery process.  
>
>Regards,
>
>						- Ted
>
>
>  
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux