Re: [patch] drivers: wait for threaded probes between initcall levels

On Oct 28, 2006, at 19:55:42, Linus Torvalds wrote:

On Sun, 29 Oct 2006, Adam J. Richter wrote:
If only calls to execute_in_parallel nest, your originalimplementation would always deadlock when the nesting depthexceeds the allowed number of threads, and also potentially insome shallower nesting depths given a very unlucky order ofexecution. In your original message, you mentioned allowing theparallelism limit to be set as low as 1.
No, I'm saying that nesting simply shouldn't be _done_. There's noreal reason. Any user would be already either parallel or doesn'tneed to be parallel at all. Why would something that already _is_parallel start another parallel task?

Well, I would argue that there actually _is_ a reason; the samereason that GNU make communicates between recursive invocations tocontrol the maximum number of in-progress execution threads ("-J4"will have 4 make targets running at once, _even_ in the presence ofrecursive make invocations and nested directories). Likewise in thecontext of recursively nested busses and devices; multiple PCIdomains, USB, Firewire, etc.

IOW, what I was trying to say (perhaps badly) is that "nesting"really isn't a sensible operation - you'd never do it. You'd do the"startup" and "shutdown" things at the very highest level, and thenin between those calls you can start a parallel activity at anydepth of the call stack, but at no point does it really make senseto start it from within something that is already parallel.

Well, perhaps it does. If I have (hypothetically) a 64-way systemwith several PCI domains, I should be able to not only start scanningeach PCI domain individually, but once each domain has been scannedit should be able to launch multiple probing threads, one for eachdevice on the PCI bus. That is, assuming that I have properly set upmy udev to statically name devices.

Perhaps it would make more sense for the allow_parallel() call tospecify instead a number of *additional* threads to spawn, such thatallow_parallel(0) on the top level would force the normal serial bootorder, allow_parallel(1) would allow one probing thread and the initthread to both probe hardware at once, etc.

With a little per-thread context on the stack, you could fairlyeasily keep track of the number of allowed sub-threads on a per-allow_parallel() basis. Before you spawn each new thread, create itsnew per-thread state for it and pass its pointer to the childthread. With each new do_in_parallel() call it would down thesemaphores for each "context" up the tree until it hit the top, andthen it would allocate a new context and fork off a new thread forthe _previous_ call to do_in_parallel(). The last call would remainunforked, and so finalize_parallel() would first execute that call inthe current thread and then reap all of the children by waiting ontheir completions then freeing their contexts.

I admit the complexity is a bit high, but since the maximum nestingis bounded by the complexity of the hardware and the number ofbusses, and the maximum memory-allocation is strictly limited in thesingle-threaded case this could allow 64-way systems to probe alltheir hardware an order of magnitude faster than today withoutnoticeably impacting an embedded system even in the absolute worst case.

I _believe_ that this should also be coupled with a bit of cleanup ofprobe-order dependencies. If a subsystem depends on another beinginitialized, the depended-on one could very easily export await_for_foo_init() function:


DECLARE_COMPLETION(foo_init_completion);
static int foo_init_result;

int wait_for_foo_init()
{
	wait_for_completion(&foo_init_completion);
	return foo_init_result;
}

int foo_init(struct parallel_state *state)
{
	struct foo_device *dev;
	
	allow_parallel(state, 3);

#if 1
	/* Assumes: int foo_probe_device(void *dev); */
	for_each_foo_device(dev)
		do_in_parallel(state, foo_probe_device, dev);
#else
	/* Assumes: int foo_probe_device(struct parallel_state *state,
			void *dev); */
	for_each_foo_device(dev)
		do_in_parallel_nested(state, foo_probe_device, dev);
#endif

	foo_init_result = finalize_parallel(state);
	complete(&foo_init_completion);
	return foo_init_result;
}

And of course if you wanted to init both the foo and bar busses inparallel you could implement a virtually identical function using thedo_in_parallel_nested() variant on top of the foo_init() function.

I'm working on a sample implementation of the allow_parallel()do_in_parallel() and finalize_parallel() functions, but I'm going totake the time to make sure its right. In the mean-time, I'minterested in any comments.


Cheers,
Kyle Moffett
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [patch] drivers: wait for threaded probes between initcall levels
  - From: Matthew Wilcox <[email protected]>
- Re: [patch] drivers: wait for threaded probes between initcall levels
  - From: Arjan van de Ven <[email protected]>

References:
- Re: [patch] drivers: wait for threaded probes between initcall levels
  - From: "Adam J. Richter" <[email protected]>
- Re: [patch] drivers: wait for threaded probes between initcall levels
  - From: Linus Torvalds <[email protected]>

Prev by Date: Re: [PATCH 2.6.19-rc3] VFS: per-sb dentry lru list
Next by Date: Re: [PATCH 2.6.19-rc1 update4] drivers: add LCD support
Previous by thread: Re: [patch] drivers: wait for threaded probes between initcall levels
Next by thread: Re: [patch] drivers: wait for threaded probes between initcall levels
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]