Re: [PATCH] Fix deadlock in RPC scheduling code.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Trond Myklebust wrote:

The real fix is the one I posted in response to this thread last week.

Oops, I missed it.

Ok for the patch, the list iteration will be better, but I don't
understand how this will prevent the race condition?

I do not think it is not a good idea to keep this lock order in
rpc_wake_up_task() anyway. I must be missing something but I
think this function should be modified in order to be in accordance with
the lock hierarchy in rpc code. It seems to me that the potential race is still there.

Even if we cannot certify task->u.tk_wait.rpc_waitq is valid, the
current kernel code cannot either (err... I think it can't). So let's try at least to improve it, even if we cannot set it totally harmless.
Warn me if I'm wrong :
When rpc_wake_up_task() is called, the calling context is helpless. So we have absolutely no information on the task queue. We must atomically check the "queued-ness" of the task and grab the queue lock to prevent any error? Hmmm... So the matter is : the queue mustn't be modified between the test and the lock? Have we some "magical" lock somewhere which could help up? I didn't find it.
  With the patch I propose (quite the same than Simon Derr's last week
proposal), the insertion of the RPC task is not a problem since the
QUEUED flag is set when the task is totally enqueued (in __rpc_add_wait_queue(): task->u.tk_wait.list is modified and task->u.tk_wait.rpc_waitq set), so if the test is
true, the task does have a queue. The task removal is more problematic.
The task could be executed and removed from the queue between the bit
test and the lock grabbing. I've checked the code responsible for this,
and it seems that the task->u.tk_wait.rpc_waitq pointer is still valid,
the task is just removed from wait queue list. So, we could still take
the queue lock and, with the lock taken, we just need to do one more
test to verify the task was not woken up and removed from the queue
since the test.

I do not really like this solution, that's not really clean, but, at
least, I hope my analysis is not too far from correct?
Please let me now where I wrong.

Cordially

--
Aurelien Degremont

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux