Re: [PATCH, RFC/T] Fix handling of write failures to swap devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Richard Purdie wrote:
On Sat, 2006-10-28 at 14:55 +1000, Nick Piggin wrote:

Richard Purdie wrote:

On Fri, 2006-10-27 at 18:22 +1000, Nick Piggin wrote:

This is the right approach to handling swap write errors. However, you need
to cut down on the amount of code duplication.

The code is subtly different to the swapoff code but I'll take another
look and see if I can refactor it now I have it all working.

Subtly different code is the worst kind of code to be duplicating. It
really needs improving, I think.


I'm open to suggestions as to how to do it. The main problem was that in
in order to mark the page bad you needed to hold an extra reference on
the page so that the "mark bad" code would be the last code to touch the
page. The swapoff code doesn't need this and I don't like the idea of
passing some count value to a common function as that would be
confusing. I guess swapoff could start taking an extra reference but I
can see people objecting to that too as it doesn't need it.

If you have the page locked (which you can't from where you're trying,
but we'll tackle that next), and the page is swapcache, then doesn't
that pin you a reference to the swap entry as well? So I still don't
see why you need that extra reference.... I know there is a
try_to_unuse_page/entry hiding in try_to_unuse somewhere ;)

It's just the wrong thing to do if the page has been set writeback with
a valid mapping. Presently we don't do any mapping specific accounting
in that path, but we could.


Ok, I couldn't find a specific problem with calling that code with a
page set as writeback and I don't know enough about that code to realise
its the "wrong thing to do". I didn't really want to duplicate the set
of delete_from_swap_cache functions.

That's just how the pagecache works in general. You might be right
that there isn't anything wrong in this case, but hopefully we can
avoid it.

But now that I look at your patch, I don't think it is going to work.
end_swap_bio_write can be called from interrupt context, so you can't
lock the page, and you can't take any of those swap specific spinlocks
either.


This would seem to be a major problem. My tests were done with a driver
that wouldn't use interrupt context there. Is the writeback lock enough
or is the page lock really needed? I suspect some of the functions don't
like being called without the page lock.

The writeback bit isn't a lock, and yes the page lock is needed when
dealing with swapcache.

That isn't your only problem though, and we simply don't want to do
this (potentially expensive) unusing from interrupt context. Noting
the error and dealing with it in process context I think is the best
way to do it.

BTW. what's the driver you're using? It might be useful to have an
option to schedule a timer for the completions (at least the ones
which generate errors).

You say that SetPageError makes the processes die unexpectedly? How and
where? We use SetPageError for IO errors, and it doesn't mean the page
has errors AFAIK.


Most of the testing I did was on ARM. If you mark a page with
SetPageError, the next time userspace tries to access it, you get a data
abort and the process dies with a SIGBUS or other faults (even if the
page is marked as up to date). The effect was very clear and I just
assumed it was behaving as intended. It was this problem I started to
address as I didn't like processes randomly dieing...

I can't for the life of me see how or why this is happening. I don't
doubt it is a problem, but it smells like another bug.

Ignoring the filesystem calls to PageError(), the remaining two are
wait_on_page_writeback_range() and write_one_page() which something
like:

if (PageError(page))
	return -EIO;

I'd guess something somewhere acts on the EIO and ignores the fact the
page is uptodate. I'm struggling a bit to work out the exact codepaths
though.

I don't think we have to worry about any of the filesystem code, even
if we are talking about a swap file. It should all go straight through
page_io.c. And wait_on_page_writeback doesn't have its return value
checked in any of the swap code AFAIK...

Still, something must be triggering it somewhere.

The best policy would probably be to keep the end_page_writeback path as
it is, and then detect the PageError in the swap out path somewhere.


I wanted to do it this way but couldn't as you end up with processes
dieing if you don't correct the PageError before the page is accessed.
Can someone comment on exactly what PageError should/shouldn't meant? Is
this an ARM specific issue or does it happen on other architectures too?

The possible solutions I can see are:

1. ARM has the PageError behaviour wrong somewhere and if so, there
might be different ways to handle this. 2. The PageError behaviour is correct so we need to find some other way
to mark a page as needing marking as bad and do so out of interrupt
context.

PageError means that the page could not be written / read from disk.
This is true in the pagecache / filesystem code too, and it is
PageUptodate that tells whether the page itself is valid or not.

3. Any other ideas?

Can you give me a hint as to where in the swap out path you might want
to detect the error? The only way I can see is to have the end of
swap_writepage() wait_on_page_writeback() but that would appear to have
significant performance implications. I'm struggling to see a way you
can do this which is always outside interrupt context.

swap_writepage sounds better than . You've even got the page
already locked for you, so the job's half done ;)

What performance implications did you imagine? The fastpath will
just be a single PageError test, and error case slowpath doesn't
matter too much beyond having something that actually works.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux