On Mon, 2007-10-29 at 12:29 -0400, Jeff Garzik wrote:
> James Bottomley wrote:
> > On Mon, 2007-10-29 at 12:07 -0400, Jeff Garzik wrote:
> >> James Bottomley wrote:
> >>> This still doesn't solve the fundamental corruption problem:
> >>> sdev->event_work has to contain the work entry until the workqueue has
> >>> finished executing it (which is some unspecified time in the future).
> >>> As soon as you drop the sdev->list_lock, the system thinks
> >>> sdev->event_work is available for reuse.  If we fire another event
> >>> before the work queue finished processing the prior event, the queue
> >>> will be corrupted.
> >> I think you're misunderstanding the workqueue code?  You can call 
> >> schedule_work(&sdev->event_work) from anywhere, any time you like, as 
> >> many times as you like.
> > 
> > OK, take me through it slowly then ... I think schedule_work(work)
> > inserts work->entry onto the workqueue list (in
> > workqueue.c:insert_work()).  If the event hasn't fired, it will already
> > be on the list, so adding the same entry to a list twice causes a list
> > corruption problem.
> It does a test_and_set_bit() first thing in queue_work().  Similar 
> exclusivity logic is found in net device land.  Ah, the fun of locking 
> without locks that benh grumbles about :)

Ah, OK, sorry ... I was actually looking at __queue_work().

> > Plus, unfortunately, the CC/UA events are going to have to carry extra
> > sense data; they're not simply going to be triggers saying something
> > happened.
> OK this is a fair criticism.
> If additional data must be carried, then I must ditch the beloved bitmap 
> implementation and go back to a list (with associated GFP_ATOMIC alloc).
> I will fix this, unless I receive email to the contrary...

Yes, unfortunately, thanks.  If all events were a simple number, it's
easy, but the CC/UA events carry data as well.


