Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

Ric Wheeler wrote:

Mark Lord wrote:
Eric D. Mudama wrote:
Actually, it's possibly worse, since each failure in libata willgenerate 3-4 retries.


(note: libata does *not* generate retries for medium errors;
the looping is driven by the SCSI mid-layer code).

It really beats the alternative of a forced reboot
due to, say, superblock I/O failing because it happened
to get merged with an unrelated I/O which then failed..
Etc..

Definitely an improvement.

The number of retries is an entirely separate issue.
If we really care about it, then we should fix SD_MAX_RETRIES.

The current value of 5 is *way* too high.  It should be zero or one.

..

I think that drives retry enough, we should leave retry at zero fornormal (non-removable) drives. Should this be a policy we can set likewe do with NCQ queue depth via /sys ?


Or perhaps we could have the mid-layer always "early-exit"
without retries for "MEDIUM_ERROR", and still do retries for the rest.

When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable,
as the drive itself has already done internal retries (libata uses the
"with retry" ATA opcodes for this).

But meanwhile, we still have the original issue too, where a single stray
bad sector can blow a system out of the water, because the mid-layer
currently aborts everything after it from a large merged request.

Thus the original patch from this thread.  :)

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR
  - From: Alan <[email protected]>

References:
- [PATCH] scsi_lib.c: continue after MEDIUM_ERROR
  - From: Mark Lord <[email protected]>
- Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR
  - From: James Bottomley <[email protected]>
- Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR
  - From: Mark Lord <[email protected]>
- Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR
  - From: Ric Wheeler <[email protected]>

Prev by Date: Re: [PATCH 06/23] timekeeping: create kernel/time/timekeeping.c
Next by Date: Re: [PATCH 07/23] clocksource: rating sorted list
Previous by thread: Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR
Next by thread: Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]