On Tue, 2007-09-11 at 22:36 -0400, John Stoffel wrote: > Peter> Scale writeback cache per backing device, proportional to its > Peter> writeout speed. By decoupling the BDI dirty thresholds a > Peter> number of problems we currently have will go away, namely: > > Ah, this clarifies my questions! Thanks! > > Peter> - mutual interference starvation (for any number of BDIs); > Peter> - deadlocks with stacked BDIs (loop, FUSE and local NFS mounts). > > Peter> It might be that all dirty pages are for a single BDI while > Peter> other BDIs are idling. By giving each BDI a 'fair' share of the > Peter> dirty limit, each one can have dirty pages outstanding and make > Peter> progress. > > Question, can you change (shrink) the limit on a BDI while it has IO > in flight? And what will that do to the system? I.e. if you have one > device doing IO, so that it has a majority of the dirty limit. Then > another device starts IO, and it's a *faster* device, how > quickly/slowly does the BDI dirty limits change for both the old and > new device? Yes, it can change while in use. A measure of how quickly it can change is roughly: it can half in a dirty_limit worth of writeout. What will happen is that those processes doing heavy IO on the slower device will get throttled more aggressively until its below its new threshold again - however all the time it will keep on writing at (full) speed because it will have this backlog to rid itself of, and by doing that it completes writeouts which ensure it will keep part of the dirty limit for itself, and thus can always make progress. You can monitor this by looking at /sys/block/sd*/queue/cache_size while doing such a thing. It should stabilise quite 'quickly'. > Peter> A global threshold also creates a deadlock for stacked BDIs; > Peter> when A writes to B, and A generates enough dirty pages to get > Peter> throttled, B will never start writeback until the dirty pages > Peter> go away. Again, by giving each BDI its own 'independent' dirty > Peter> limit, this problem is avoided. > > Peter> So the problem is to determine how to distribute the total > Peter> dirty limit across the BDIs fairly and efficiently. A DBI that > > You mean BDI here, not DBI. Uhh, yeah, obviously :-) > Peter> has a large dirty limit but does not have any dirty pages > Peter> outstanding is a waste. > > Peter> What is done is to keep a floating proportion between the DBIs > Peter> based on writeback completions. This way faster/more active > Peter> devices get a larger share than slower/idle devices. > > Does a slower device get a BDI which is calculated to keep it's limit > under a certain number of seconds of outstanding IO? This way no > device can build up more than say 15 seconds of outstanding IO to > flush at any one time. Perhaps already answered above, as long as there is dirty stuff to write out it will keep completing writes and thus gain a stable share of the dirty limit.
Attachment:
signature.asc
Description: This is a digitally signed message part
- References:
- [PATCH 00/23] per device dirty throttling -v10
- From: Peter Zijlstra <[email protected]>
- [PATCH 21/23] mm: per device dirty threshold
- From: Peter Zijlstra <[email protected]>
- Re: [PATCH 21/23] mm: per device dirty threshold
- From: "John Stoffel" <[email protected]>
- [PATCH 00/23] per device dirty throttling -v10
- Prev by Date: Re: [PATCH 01/15] Extract the file descriptor search logic in SPU coredump code
- Next by Date: Re: [PATCH 15/15] Add DEFINE_SPUFS_ATTRIBUTE()
- Previous by thread: Re: [PATCH 21/23] mm: per device dirty threshold
- Next by thread: [PATCH 01/23] nfs: remove congestion_end()
- Index(es):