Re: [RFC] [PATCH] A clean approach to writeout throttling

Daniel Phillips wrote:

On Wednesday 05 December 2007 17:24, Andrew Morton wrote:
On Wed, 5 Dec 2007 16:03:01 -0800 Daniel Phillips <[email protected]> wrote:
...a block device these days may not be just a singledevice, but may be a stack of devices connected together by a genericmechanism such as device mapper, or a hardcoded stack such asmulti-disk or network block device. It is necessary to consider theresource requirements of the stack as a whole _before_ letting atransfer proceed into any layer of the stack, otherwise deadlock onmany partially completed transfers becomes a possibility. For thisreason, the bio throttling is only implemented at the initial, highestlevel submission of the bio to the block layer and not for any recursivesubmission of the same bio to a lower level block device in a stack.
This in turn has rather far reaching implications: the top level devicein a stack must take care of inspecting the entire stack in order todetermine how to calculate its resource requirements, thus becomingthe boss device for the entire stack. Though this intriguing idea couldeasily become the cause of endless design work and many thousands oflines of fancy code, today I sidestep the question entirely usingthe "just provide lots of reserve" strategy. Horrifying as it may seemto some, this is precisely the strategy that Linux has used in thecontext of resource management in general, from the very beginning andlikely continuing for quite some time into the future My strongly heldopinion in this matter is that we need to solve the real, underlyingproblems definitively with nice code before declaring the opening offancy patch season. So I am leaving further discussion of automaticresource discovery algorithms and the like out of this post.
Rather than asking the stack "how much memory will this request consume"
you could instead ask "how much memory are you currently using".
ie: on entry to the stack, do
	current->account_block_allocations = 1;
	make_request(...);
	rq->used_memory += current->pages_used_for_block_allocations;

and in the page allocator do

	if (!in_interrupt() && current->account_block_allocations)
		current->pages_used_for_block_allocations++;

and then somehow handle deallocation too ;)
Ah, and how do you ensure that you do not deadlock while making this
inquiry?  Perhaps send a dummy transaction down the pipe?  Even so,
deadlock is possible, quite evidently so in the real life example I have
at hand.

Yours is essentially one of the strategies I had in mind, the other major
one being simply to examine the whole stack, which presupposes some
as-yet-nonexistant kernel wide method of representing block device
stacks in all there glorious possible topology variations.
The basic idea being to know in real time how much memory a particular
block stack is presently using.  Then, on entry to that stack, if the
stack's current usage is too high, wait for it to subside.
We do not wait for high block device resource usage to subside before
submitting more requests.  The improvement you suggest is aimed at
automatically determining resource requirements by sampling a
running system, rather than requiring a programmer to determine them
arduously by hand.  Something like automatically determining a
workable locking strategy by analyzing running code, wouldn't that be
a treat?  I will hope for one of those under my tree at Christmas.

The problem is that you (a) may or may not know just how bad a worstcase can be, and (b) may block unnecessarily by being pessimistic.

The dummy transaction would be nice, but it would be perfect if youcould send the real transaction down with a max memory limit and a flag,have each level check and decrement the max by what's actually needed,and then return some pass/fail status for that particular transaction.Clearly every level in the stack would have to know how to do that. Itwould seem that once excess memory use was detected the transactioncould be failed without deadlock.


--
Bill Davidsen <[email protected]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [RFC] [PATCH] A clean approach to writeout throttling
  - From: Daniel Phillips <[email protected]>

References:
- [RFC] [PATCH] A clean approach to writeout throttling
  - From: Daniel Phillips <[email protected]>
- Re: [RFC] [PATCH] A clean approach to writeout throttling
  - From: Andrew Morton <[email protected]>
- Re: [RFC] [PATCH] A clean approach to writeout throttling
  - From: Daniel Phillips <[email protected]>

Prev by Date: Re: Allow (O=...) from file
Next by Date: Re: git guidance
Previous by thread: Re: [RFC] [PATCH] A clean approach to writeout throttling
Next by thread: Re: [RFC] [PATCH] A clean approach to writeout throttling
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]