RE: [PATCH] fix-flush_workqueue-vs-cpu_dead-race-update

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 

>-----Original Message-----
>From: [email protected] 
>[mailto:[email protected]] On Behalf Of Oleg Nesterov
>Sent: Monday, January 08, 2007 9:07 AM
>To: Srivatsa Vaddagiri
>Cc: Andrew Morton; David Howells; Christoph Hellwig; Ingo 
>Molnar; Linus Torvalds; [email protected]; Gautham shenoy
>Subject: Re: [PATCH] fix-flush_workqueue-vs-cpu_dead-race-update
>
>On 01/08, Srivatsa Vaddagiri wrote:
>>
>> On Mon, Jan 08, 2007 at 06:56:38PM +0300, Oleg Nesterov wrote:
>> > > 2.
>> > >
>> > > CPU_DEAD->cleanup_workqueue_thread->(cwq->thread = 
>NULL)->kthread_stop() ..
>> > > 				    ^^^^^^^^^^^^^^^^^^^^
>> > > 						|___ Problematic
>> > 
>> > Hmm... This should not be possible? cwq->thread != NULL on 
>CPU_DEAD event.
>> 
>> sure, cwq->thread != NULL at CPU_DEAD event. However
>> cleanup_workqueue_thread() will set it to NULL and block in
>> kthread_stop(), waiting for the kthread to finish run_workqueue and
>> exit.
>
>Ah, missed you point, thanks. Yet another old problem which 
>was not introduced
>by recent changes. And yet another indication we should avoid 
>kthread_stop()
>on CPU_DEAD event :) I believe this is easy to fix, but need 
>to think more.

The current code is workqueue-hptplug path is full of races. I stumbled
upon atleast couple of different deadlock situations being discussed
here with ondemand governor using workqueue and trying to flush during
cpu hot remove.

Specifically, a three way deadlock involving kthread_stop() with
workqueue_mutex held and work itself blocked on some other mutex held by
another task trying to flush the workqueue.

One other approach I was thinking about, was to do all the hardwork in
workqueue CPU_DOWN_PREPARE callback rather than in CPU_DEAD.
We can call cleanup_workqueue_thread and take_over_work in DOWN_PREPARE,
With that, I don't think we need to hold the workqueue_mutex across 
these two callbacks and eliminate the deadlocks related to
flush_workqueue.
Do you think this approach would simply things around here?

Thanks,
Venki 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux