Re: [PATCH] Prevent large file writeback starvation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



David Chinner <[email protected]> wrote:
>
>     306 static void
>     307 sync_sb_inodes(struct super_block *sb, struct writeback_control *wbc)
>     308 {
>     309         const unsigned long start = jiffies;    /* livelock avoidance */
>     310
>     311         if (!wbc->for_kupdate || list_empty(&sb->s_io))
>     312                 list_splice_init(&sb->s_dirty, &sb->s_io);
>     313
>     314         while (!list_empty(&sb->s_io)) {
> 
> Correct me if I'm wrong, but my reading of this is that for
> wb_kupdate, we only ever move s_dirty to s_io when s_io is empty.
> then we iterate over s_io until all inodes are moved off this list
> or we hit someother termination criteria. This is why i left the
> large inode on the head of the s_io list until congestion was
> encountered - so that wb_kupdate returned to it first in it's next
> pass.
> 
> So when we get to a young inode on the s_io list, we abort the
> writeback loop for that filesystem with wbc->nr_to_write > 0 and
> return to wb_kupdate....
> 
> However, we still have an inode with lots of dirty data on the head of
> s_dirty, which we can do nothing with until s_io is emptied by
> wb_kupdate.

That sounds right.  I guess what is happening is that we get into the state
where your big-dirty-file is on s_dirty and there are a few
small-dirty-files on s_io.  sync_sb_inodes() writes the small files,
returns "small number of pages written" and that causes wb_kupdate() to
terminate.

I think the problem here is that the wb_kupdate() termination test is wrong
- it should just keep going.

We have another bug due to this: if you create a large number of
zero-length files on a traditional filesystem (ext2, minix, ...), the
writeout of these inodes doesn't work correctly - it takes ages.  Because
the wb_kupdate logic is driven by "number of dirty pages", and all those
dirty inodes have zero dirty pages associated with them.  wb_kupdate says
"oh, nothing to do" and gives up.

So to fix both these problems we need to be smarter about terminating the
wb_kupdate() loop.  Something like "loop until no expired inodes have been
written".

Wildly untested patch:


diff -puN include/linux/writeback.h~wb_kupdate-fix-termination-condition include/linux/writeback.h
--- 25/include/linux/writeback.h~wb_kupdate-fix-termination-condition	Mon Feb  6 15:09:32 2006
+++ 25-akpm/include/linux/writeback.h	Mon Feb  6 15:10:33 2006
@@ -58,6 +58,7 @@ struct writeback_control {
 	unsigned for_kupdate:1;		/* A kupdate writeback */
 	unsigned for_reclaim:1;		/* Invoked from the page allocator */
 	unsigned for_writepages:1;	/* This is a writepages() call */
+	unsigned wrote_expired_inode:1;	/* 1 or more expired inodes written */
 };
 
 /*
diff -puN fs/fs-writeback.c~wb_kupdate-fix-termination-condition fs/fs-writeback.c
--- 25/fs/fs-writeback.c~wb_kupdate-fix-termination-condition	Mon Feb  6 15:09:32 2006
+++ 25-akpm/fs/fs-writeback.c	Mon Feb  6 15:11:58 2006
@@ -367,6 +367,8 @@ sync_sb_inodes(struct super_block *sb, s
 		__iget(inode);
 		pages_skipped = wbc->pages_skipped;
 		__writeback_single_inode(inode, wbc);
+		if (unlikely(wbc->wrote_expired_inode == 0))
+			wbc->wrote_expired_inode = 1;
 		if (wbc->sync_mode == WB_SYNC_HOLD) {
 			inode->dirtied_when = jiffies;
 			list_move(&inode->i_list, &sb->s_dirty);
diff -puN mm/page-writeback.c~wb_kupdate-fix-termination-condition mm/page-writeback.c
--- 25/mm/page-writeback.c~wb_kupdate-fix-termination-condition	Mon Feb  6 15:09:32 2006
+++ 25-akpm/mm/page-writeback.c	Mon Feb  6 15:12:43 2006
@@ -414,8 +414,9 @@ static void wb_kupdate(unsigned long arg
 	while (nr_to_write > 0) {
 		wbc.encountered_congestion = 0;
 		wbc.nr_to_write = MAX_WRITEBACK_PAGES;
+		wbc.wrote_expired_inode = 0;
 		writeback_inodes(&wbc);
-		if (wbc.nr_to_write > 0) {
+		if (wbc.wrote_expired_inode == 0) {
 			if (wbc.encountered_congestion)
 				blk_congestion_wait(WRITE, HZ/10);
 			else
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux