Re: [RFC][WIP] DIO simplification and AIO-DIO stability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2006-02-24 at 16:42 +0530, Suparna Bhattacharya wrote:
> On Thu, Feb 23, 2006 at 05:21:08PM -0800, Zach Brown wrote:
> > Suparna Bhattacharya wrote:
> > 
> > > A recent AIO-DIO bug reported by Kenneth Chen, came very close
> > > to being the proverbial last straw for me.
> > 
> > Me too, though I found out about it from a different path.  Our QA guys
> > were pulling drives under load and it got stuck.  Trying to fix that bug
> > (io error setting dio->result to -EIO stops finished_one_bio() from
> > calling aio_complete()) without introducing other regressions involved
> > an incredible amount of squinting and head scratching.  In wandering
> > around I found what seem to be other additional bugs:
> > 
> > - errors that hit after dio->result is sampled in the buffered fallback
> > case are lost.  dio->result should be checked again after waiting.
> > 
> > - a few paths try to do arithmetic with dio->result assuming it's the
> > number of bytes transferred when it could be -EIO.
> 
> Yes there is a race in the way dio->result is used both by completion
> path and the post submission path.
> 
> > 
> > - the AIO path seems to forget to check dio->page_errors, but I didn't
> > look very hard to see what that means.
> > 
> > - the AIO bio completion paths don't populate dio->bio_list so reaping
> > doesn't happen in the AIO issuing case.. maybe that's intentional?
> 
> It is intentional. The async case operates differently in that it
> doesn't need/use the reaping logic at all. It just submits the entire
> IO outright, without the pipelining sophistication of the original
> synchronous DIO code. That's yet another point of divergence between
> AIO and synchronous path, perhaps it would have been simpler if both
> followed the same logic.
> 
> > 
> > > It would be quite pointless (and painful!), if the rewrite ends up becoming
> > > just as tricky and error prone as before. Such a patch will need a very
> > > close critical review by many sharp eyes, to avoid disrupting the current
> > > state of stability.
> > 
> > So, I'm all for wringing the current bugs and confusion out of the
> > current code.  But the words "a patch" and "rewrite" terrify me.  It
> 
> Perhaps I shouldn't have used the term rewrite. The proposal retains
> much of the current core logic, but mainly alters the way we
> serialise vs concurrent buffered IO, and other pain points. But it
> would certainly be more than incremental patches to fix individual
> problems.

Yes. locking and error handling desperately needs a re-write, especially
keeping AIO in mind. I would love to see "kicking back to buffered mode"
completely go away. If Ken and Zach are willing to provide help on
looking over & testing error handling cases (with pulling drives :)), 
I have no problem with re-write :)

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux