> None of the above.
>
> [PATCH] vm: pageout throttling
>
> With silly pageout testcases it is possible to place huge amounts of memory
> under I/O. With a large request queue (CFQ uses 8192 requests) it is
> possible to place _all_ memory under I/O at the same time.
>
> This means that all memory is pinned and unreclaimable and the VM gets
> upset and goes oom.
>
> The patch limits the amount of memory which is under pageout writeout to be
> a little more than the amount of memory at which balance_dirty_pages()
> callers will synchronously throttle.
>
> This means that heavy pageout activity can starve heavy writeback activity
> completely, but heavy writeback activity will not cause starvation of
> pageout. Because we don't want a simple `dd' to be causing excessive
> latencies in page reclaim.
>
> afaict that problem is still there. It is possible to get all of
> ZONE_NORMAL dirty on a highmem machine. With a large queue (or lots of
> queues), vmscan can them place all of ZONE_NORMAL under IO.
>
> It could be that we've fixed this problem via other means in the interrim,
> but from a quick peek to seems to me that the scanner will still do a 100%
> CPU burn when all of a zone's pages are under writeback.
Ah, OK.
I did read the changelog, but you added quite a bit of translation ;)
> throttle_vm_writeout() should be a per-zone thing, I guess. Perhaps fixing
> that would fix your deadlock. That's doubtful, but I don't know anything
> about your deadlock so I cannot say.
No, doing the throttling per-zone won't in itself fix the deadlock.
Here's a deadlock example:
Total memory = 32M
/proc/sys/vm/dirty_ratio = 10
dirty_threshold = 3M
ratelimit_pages = 1M
Some program dirties 4M (dirty_threshold + ratelimit_pages) of mmap on
a fuse fs. Page balancing is called which turns all these into
writeback pages.
Then userspace filesystem gets a write request, and tries to allocate
memory needed to complete the writeout.
That will possibly trigger direct reclaim, and throttle_vm_writeout()
will be called. That will block until nr_writeback goes below 3.3M
(dirty_threshold + 10%). But since all 4M of writeback is from the
fuse fs, that will never happen.
Does that explain it better?
Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]