On Oct 28, 2006, at 19:55:42, Linus Torvalds wrote:
On Sun, 29 Oct 2006, Adam J. Richter wrote:
If only calls to execute_in_parallel nest, your original
implementation would always deadlock when the nesting depth
exceeds the allowed number of threads, and also potentially in
some shallower nesting depths given a very unlucky order of
execution. In your original message, you mentioned allowing the
parallelism limit to be set as low as 1.
No, I'm saying that nesting simply shouldn't be _done_. There's no
real reason. Any user would be already either parallel or doesn't
need to be parallel at all. Why would something that already _is_
parallel start another parallel task?
Well, I would argue that there actually _is_ a reason; the same
reason that GNU make communicates between recursive invocations to
control the maximum number of in-progress execution threads ("-J4"
will have 4 make targets running at once, _even_ in the presence of
recursive make invocations and nested directories). Likewise in the
context of recursively nested busses and devices; multiple PCI
domains, USB, Firewire, etc.
IOW, what I was trying to say (perhaps badly) is that "nesting"
really isn't a sensible operation - you'd never do it. You'd do the
"startup" and "shutdown" things at the very highest level, and then
in between those calls you can start a parallel activity at any
depth of the call stack, but at no point does it really make sense
to start it from within something that is already parallel.
Well, perhaps it does. If I have (hypothetically) a 64-way system
with several PCI domains, I should be able to not only start scanning
each PCI domain individually, but once each domain has been scanned
it should be able to launch multiple probing threads, one for each
device on the PCI bus. That is, assuming that I have properly set up
my udev to statically name devices.
Perhaps it would make more sense for the allow_parallel() call to
specify instead a number of *additional* threads to spawn, such that
allow_parallel(0) on the top level would force the normal serial boot
order, allow_parallel(1) would allow one probing thread and the init
thread to both probe hardware at once, etc.
With a little per-thread context on the stack, you could fairly
easily keep track of the number of allowed sub-threads on a per-
allow_parallel() basis. Before you spawn each new thread, create its
new per-thread state for it and pass its pointer to the child
thread. With each new do_in_parallel() call it would down the
semaphores for each "context" up the tree until it hit the top, and
then it would allocate a new context and fork off a new thread for
the _previous_ call to do_in_parallel(). The last call would remain
unforked, and so finalize_parallel() would first execute that call in
the current thread and then reap all of the children by waiting on
their completions then freeing their contexts.
I admit the complexity is a bit high, but since the maximum nesting
is bounded by the complexity of the hardware and the number of
busses, and the maximum memory-allocation is strictly limited in the
single-threaded case this could allow 64-way systems to probe all
their hardware an order of magnitude faster than today without
noticeably impacting an embedded system even in the absolute worst case.
I _believe_ that this should also be coupled with a bit of cleanup of
probe-order dependencies. If a subsystem depends on another being
initialized, the depended-on one could very easily export a
wait_for_foo_init() function:
DECLARE_COMPLETION(foo_init_completion);
static int foo_init_result;
int wait_for_foo_init()
{
wait_for_completion(&foo_init_completion);
return foo_init_result;
}
int foo_init(struct parallel_state *state)
{
struct foo_device *dev;
allow_parallel(state, 3);
#if 1
/* Assumes: int foo_probe_device(void *dev); */
for_each_foo_device(dev)
do_in_parallel(state, foo_probe_device, dev);
#else
/* Assumes: int foo_probe_device(struct parallel_state *state,
void *dev); */
for_each_foo_device(dev)
do_in_parallel_nested(state, foo_probe_device, dev);
#endif
foo_init_result = finalize_parallel(state);
complete(&foo_init_completion);
return foo_init_result;
}
And of course if you wanted to init both the foo and bar busses in
parallel you could implement a virtually identical function using the
do_in_parallel_nested() variant on top of the foo_init() function.
I'm working on a sample implementation of the allow_parallel()
do_in_parallel() and finalize_parallel() functions, but I'm going to
take the time to make sure its right. In the mean-time, I'm
interested in any comments.
Cheers,
Kyle Moffett
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]