David Chinner <[email protected]> wrote:
>
> 306 static void
> 307 sync_sb_inodes(struct super_block *sb, struct writeback_control *wbc)
> 308 {
> 309 const unsigned long start = jiffies; /* livelock avoidance */
> 310
> 311 if (!wbc->for_kupdate || list_empty(&sb->s_io))
> 312 list_splice_init(&sb->s_dirty, &sb->s_io);
> 313
> 314 while (!list_empty(&sb->s_io)) {
>
> Correct me if I'm wrong, but my reading of this is that for
> wb_kupdate, we only ever move s_dirty to s_io when s_io is empty.
> then we iterate over s_io until all inodes are moved off this list
> or we hit someother termination criteria. This is why i left the
> large inode on the head of the s_io list until congestion was
> encountered - so that wb_kupdate returned to it first in it's next
> pass.
>
> So when we get to a young inode on the s_io list, we abort the
> writeback loop for that filesystem with wbc->nr_to_write > 0 and
> return to wb_kupdate....
>
> However, we still have an inode with lots of dirty data on the head of
> s_dirty, which we can do nothing with until s_io is emptied by
> wb_kupdate.
That sounds right. I guess what is happening is that we get into the state
where your big-dirty-file is on s_dirty and there are a few
small-dirty-files on s_io. sync_sb_inodes() writes the small files,
returns "small number of pages written" and that causes wb_kupdate() to
terminate.
I think the problem here is that the wb_kupdate() termination test is wrong
- it should just keep going.
We have another bug due to this: if you create a large number of
zero-length files on a traditional filesystem (ext2, minix, ...), the
writeout of these inodes doesn't work correctly - it takes ages. Because
the wb_kupdate logic is driven by "number of dirty pages", and all those
dirty inodes have zero dirty pages associated with them. wb_kupdate says
"oh, nothing to do" and gives up.
So to fix both these problems we need to be smarter about terminating the
wb_kupdate() loop. Something like "loop until no expired inodes have been
written".
Wildly untested patch:
diff -puN include/linux/writeback.h~wb_kupdate-fix-termination-condition include/linux/writeback.h
--- 25/include/linux/writeback.h~wb_kupdate-fix-termination-condition Mon Feb 6 15:09:32 2006
+++ 25-akpm/include/linux/writeback.h Mon Feb 6 15:10:33 2006
@@ -58,6 +58,7 @@ struct writeback_control {
unsigned for_kupdate:1; /* A kupdate writeback */
unsigned for_reclaim:1; /* Invoked from the page allocator */
unsigned for_writepages:1; /* This is a writepages() call */
+ unsigned wrote_expired_inode:1; /* 1 or more expired inodes written */
};
/*
diff -puN fs/fs-writeback.c~wb_kupdate-fix-termination-condition fs/fs-writeback.c
--- 25/fs/fs-writeback.c~wb_kupdate-fix-termination-condition Mon Feb 6 15:09:32 2006
+++ 25-akpm/fs/fs-writeback.c Mon Feb 6 15:11:58 2006
@@ -367,6 +367,8 @@ sync_sb_inodes(struct super_block *sb, s
__iget(inode);
pages_skipped = wbc->pages_skipped;
__writeback_single_inode(inode, wbc);
+ if (unlikely(wbc->wrote_expired_inode == 0))
+ wbc->wrote_expired_inode = 1;
if (wbc->sync_mode == WB_SYNC_HOLD) {
inode->dirtied_when = jiffies;
list_move(&inode->i_list, &sb->s_dirty);
diff -puN mm/page-writeback.c~wb_kupdate-fix-termination-condition mm/page-writeback.c
--- 25/mm/page-writeback.c~wb_kupdate-fix-termination-condition Mon Feb 6 15:09:32 2006
+++ 25-akpm/mm/page-writeback.c Mon Feb 6 15:12:43 2006
@@ -414,8 +414,9 @@ static void wb_kupdate(unsigned long arg
while (nr_to_write > 0) {
wbc.encountered_congestion = 0;
wbc.nr_to_write = MAX_WRITEBACK_PAGES;
+ wbc.wrote_expired_inode = 0;
writeback_inodes(&wbc);
- if (wbc.nr_to_write > 0) {
+ if (wbc.wrote_expired_inode == 0) {
if (wbc.encountered_congestion)
blk_congestion_wait(WRITE, HZ/10);
else
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]