Yuly Finkelberg wrote:
Hi,
I sent a message regarding this issue earlier, but after re-reading
it, I realized that it wasn't very clear. Hopefully, this will
clarify things a little bit:
I have a strange scheduling issue: a bunch of worker tasks are all waiting
on a wait queue. Each task is woken up by the preceeding, does some work,
wakes up the next one, and then sends a SIGSTOP to itself. The last task however
does not stop itself, but instead yield()s until all tasks have reached state
TASK_STOPPED.
The code looks like this (irrelevant parts cut out):
...
ret = wait_event_interruptible(waitq, next_in_line == myself);
...
(some work)
...
next_in_line = next;
ret = wakeup_next_one();
if (!last_one)
send_sig(SIGSTOP, current, 1);
else
spin_until_all_stopped()
When run with 50 tasks, normally this works well. However sometimes one of the
tasks (never the last one) gets stuck between calling wakeup_next_one() and
between sending the signal. It accumulates system time, and its stack looks
like (no pending signals, ti_flags is clear):
c55e7ad0 00000086 c55e6000 c55e7a94 00000046 c55e6000 c55e7ad0 c0109c2d
00000000 c0497800 00000001 d38da344 0013bc9c c5632840 00071931 d3d93161
0013bc9c c55d546c c05d3960 0000270f c05d3960 c55e6000 c0106f25 c05d3960
Call Trace:
[<c0106f25>] need_resched+0x27/0x32
(yes, this is not a mistake: this is ALL the stack reported by show_stack())
Normally the spinning task will magically get released after "a while", where
few seconds < "a while" < 10 minutes and sometimes even longer.
So the mystery is -
1. Why does the task spin for so long ?
2. Where does it spin ? (the kernel stack doesn't hint on anything...)
3. How can I find out #2 ?
4. How to fix it ?
5. Is there a better way to make sure a specific task is STOPPED ?
Currently running 2.6.8.1 and 2.6.9 (UP, PREEMPT). I'd appreciate any
help here...
You're doing this in the *kernel*? It sounds like it should be done
in userspace or done a different way (ie. not with 50 tasks).
And using signals and spinning on yield for synchronisation and
process control in the kernel like this is fairly crazy.
Can't you use a semaphore or something?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]