Re: [RFC] splice() and readahead interaction

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Fengguang Wu a écrit :
2007/5/2, Eric Dumazet <[email protected] <mailto:[email protected]>>:

    Since you work on readahead, could you please find the reason
    following program triggers a problem in splice() syscall ?

    Description :

    I tried to use splice(SPLICE_F_NONBLOCK) in a non blocking
    environnement, in an attempt to implement cheap AIO, and zero-copy
    splice() feature.

    I quicky found that readahead in splice() is not really working.

    To demonstrate the problem, just compile the attached program, and
    use it to pipe a big file (not yet in cache) to /dev/null :

    $ gcc -o spliceout spliceout.c
    $ spliceout -d BIGFILE | cat >/dev/null
    offset=49152 ret=49152
    offset=65536 ret=16384
    offset=131072 ret=65536
    ...no more progress...   (splice() returns -1 and EAGAIN)

    reading splice(SPLICE_F_NONBLOCK) syscall implementation, I expected
    to exploit its ability to call readahead(), and do some progress if
    pages are ready in cache.

    But apparently, even on an idle machine, it is not working as expected.



Eric Dumazet, thank you for disclosing this bug.

Readahead logic somehow fails to populate the page range with data.
It can be because
1) the readahead routine is not always called in the following lines of fs/splice.c:
        if (!loff || nr_pages > 1)
page_cache_readahead(mapping, &in->f_ra, in, index, nr_pages);
2) even called, page_cache_readahead() wont guarantee the pages are there.
It wont submit readahead I/O for pages already in the radix tree, or when (ra_pages == 0), or after 256 cache hits.

In your case, it should be because of the retried reads, which lead to excessive cache hits, and disables readahead at some time.

And that _one_ failure of readahead blocks the whole read process.
The application receives EAGAIN and retries the read, but __generic_file_splice_read() refuse to make progress: - in the previous invocation, it has allocated a blank page and inserted it into the radix tree, but never has the chance to start I/O for it: the test of SPLICE_F_NONBLOCK goes before that. - in the retried invocation, the readahead code will neither get out of the cache hit mode, nor will it submit I/O for an already existing page.

The attached patch should fix the critical splice bug. Sorry for not being able to test it locally for now - I'm at home and running knoppix. And the readahead bug will be fixed by the upcoming on-demand readahead patch. I should be back and submit it after a week.

Thank you,
Fengguang Wu


------------------------------------------------------------------------

--- linux-2.6.21.1/fs/splice.c.old	2007-05-05 04:40:38.000000000 -0400
+++ linux-2.6.21.1/fs/splice.c	2007-05-05 04:41:59.000000000 -0400
@@ -378,10 +378,11 @@
 			 * If in nonblock mode then dont block on waiting
 			 * for an in-flight io page
 			 */
-			if (flags & SPLICE_F_NONBLOCK)
-				break;
-
-			lock_page(page);
+			if (flags & SPLICE_F_NONBLOCK) {
+				if (TestSetPageLocked(page))
+					break;
+			} else
+				lock_page(page);
/*
 			 * page was truncated, stop here. if this isn't the

Sorry for the delay.

This patches solves the problem, thank you !


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux