Re: [PATCH] Export current_is_keventd() for libphy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Dec 5, 2006, at 14:39, Andrew Morton wrote:

On Tue, 5 Dec 2006 17:48:05 +0000 (GMT)
"Maciej W. Rozycki" <[email protected]> wrote:

 Essentially there is a race when disconnecting from a PHY, because
interrupt delivery uses the event queue for processing. The function to handle interrupts that is called from the event queue is phy_change (). It takes a pointer to a structure that is associated with the PHY. At the time phy_stop_interrupts() is called there may be one or more calls to phy_change() still pending on the event queue. They may not be able to be processed until the structure passed to phy_change() have been freed, at
which point calling the function is wrong.

 One way of avoiding it is calling flush_scheduled_work() from
phy_stop_interrupts().  This is fine as long as a caller of
phy_stop_interrupts() (not necessarily the immediate one calling into
libphy) does not hold the netlink lock.

So let me try to rephrase...

- phy_change() is the workqueue callback function.  It is executed by
  keventd.

- Something under phy_change() takes rtnl_lock() (but what??)


I don't think it's phy_change(). It's something else that may be scheduled.



- phy_stop_interrupts() does flush_scheduled_work().  This has to
  following logic:

  - if I am kevetnd, run phy_change() directly.

  - If I am not keventd, wait for keventd() to run phy_change()

- So if the caller of phy_stop_interrupt() already holds rtnl_lock(),
and if that caller is keventd then it will recur onto rntl_lock() and
  will deadlock.

Problem is, if the caller of phy_stop_interrupt() is *not* keventd, that caller will still deadlock, because that caller is waiting for keventd to
run phy_change(), and keventd cannot do that, because the not-keventd
process already holds rtnl_lock.


Now, afaict, there are only two callers of phy_stop_interrupts(): the
close() handlers of gianfar.c and fs_enet-main.c (confusingly held in
netdevice.stop (confusingly called by dev_close())). Via phy_disconnect.
Did I miss anything?


Right now, that's probably about right.



And the dev_close() caller holds rtnl_lock.



Ok, I think this is the summary:

- phy_change() is the work queue callback function (scheduled when a PHY interrupt occurs)

- dev_close() invokes the controller's stop/close/whatever function, and it calls phy_disconnect()

- phy_disconnect() calls phy_stop_interrupts(). To prevent any pending phy_change() calls from getting confused, phy_stop_interrupts () needs to flush the queue. Otherwise, subsequent memory freeings will leave phy_change() hanging.

- If phy_stop_interrupts() calls flush_scheduled_work(), keventd will execute its queues while rtnl_lock is held, providing opportunity for other callbacks to deadlock.

- innocent puppies are slaughtered, and the world mourns.


Maciej's solution is to schedule phy_disconnect() to be called from a work queue. That solution should work, but it sounds like it doesn't require the check for if keventd is running.

Of course, my objection to it is that it now requires the ethernet controller to be excessively aware of the details of how the PHY Lib is handling the PHY interrupts (by scheduling them on a work queue).

Andy


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux