Re: Fw: Re: + add-locking-to-evdev.patch added to -mm tree

Hi Paul,

On 4/8/07, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

On Fri, Mar 30, 2007 at 02:06:05PM -0700, akpm@linux-foundation.org wrote:
>
> Input: evdev - implement proper locking

OK, so I have to ask -- this is protecting multiple clients of a given
mouse or keyboard, right?  Doesn't look like it has much to do with
connecting multiple mice/keyboards/joysticks/whatever to a given system,
but thought I should ask.

Yes, the task of evdev (and joydev, tsdecv, mousedev - input handlers
- which should have been named input interfaces) is to provide access
to an input device (keyboard, mouse, joystick, button, etc) for
multiple clients (user processes).

> diff -puN drivers/input/evdev.c~add-locking-to-evdev drivers/input/evdev.c
> --- a/drivers/input/evdev.c~add-locking-to-evdev
> +++ a/drivers/input/evdev.c
> @@ -31,6 +31,8 @@ struct evdev {
>       wait_queue_head_t wait;
>       struct evdev_client *grab;
>       struct list_head client_list;
> +     spinlock_t client_lock;

OK, what does this one protect?

Client list of course! Are you trying to coax me into adding comments? ;)

o       ev_attach_client(): client_list field (permitting RCU readers).
       Adds element.

o       evdev_detach_client(): ditto, but deletes element.

o       evdev_hangup(): scans the list hanging off of the client_list
       field, invoking kill_fasync() on each.  Looks to be delivering
       a POLL_HUP to all parties receiving events.

       Apparently the lock is preventing an entry from being
       deleted out from under evdev_hangup().  Need to check races
       with close(), I guess...  (For example, it would be bad
       to have the process torn down to the point that it could
       not tolerate receiving (or ignoring) a signal before
       removing itself from the list.)

Also it makes sure that evdev_attach_client and evdev_detail_client do
not race with each other. One is called from evdev_open and another
from evdev_release which can ran simultaneously (for different clients
- user processes).

o       Readers of the evdev->client_list can use RCU.

> +     struct mutex mutex;

And what does this one protect?

o       evdev_flush(): evdev->exist flag (which handles race with RCU removal?)
       Also invokes input_flush_device(), which invokes some flush-handler
       function.  There may be more issues here, but they would be with
       users of evdev rather than with evdev itself, I am guessing.

o       evdev_release(): invokes evdev_ungrab().  This NULLs the
       evdev->grab field using rcu_assign_pointer().

o       evdev_write(): invokes evdev_event_from_user() and
       input_inject_event().  The former copies from user space, so
       ->mutex indeed cannot be a spinlock.  Not sure what we are
       protecting here -- perhaps event traffic?  @@@


We are trying to make sure that we delivering events to a live device.

o       evdev_ioctl_handler(): protecting ioctl.  Consistent with
       the thought of protecting event traffic.

o       evdev_mark_dead(): protect setting evdev->exist to zero, adding
       weight to the speculation under evdev_flush() above that
       ->exist handles the race with RCU removal.

o       Readers of evdev->grab can use RCU.  RCU readers caring about
       concurrent deletion should check for evdev->exist under evdev->mutex.

Lock order:

o       evdev->client_lock => fown_struct->lock

o       fown_struct->lock => tasklist_lock

o       tasklist_lock => sighand_struct->siglock

o       evdev_table_mutex => evdev->client_lock.

>       struct device dev;
>  };
>
> @@ -38,39 +40,48 @@ struct evdev_client {
>       struct input_event buffer[EVDEV_BUFFER_SIZE];
>       int head;
>       int tail;
> +     spinlock_t buffer_lock;

And what does this one protect?  Presumably a buffer!  ;-)

o       evdev_pass_event(): adding an event to evdev_client->buffer.
       This includes the evdev_client->head field.

       !!!  Why doesn't this function need to check the
       evdev_client->tail field???  How do we know we won't overflow
       the buffer???

We can't do anything if we overflow so if user process can't fetch
events fast enough oldest ones will simply be lost.

o       evdev_new_client() [was evdev_open()]: evdev_client->client
       field (attaching the evdev to its client, apparently).
       Invokes evdev_attach_client() to do the list manipulation
       (protected in turn by evdev->client_lock).

       Argh...  Strike that -- spin_lock_init() rather than spin_lock().

o       evdev_fetch_next_event(): removing an event from
       evdev_client->buffer.  This includes evdev_client->head and
       evdev_client->tail.

>       struct fasync_struct *fasync;
>       struct evdev *evdev;
>       struct list_head node;
>  };
>
>  static struct evdev *evdev_table[EVDEV_MINORS];
> +static DEFINE_MUTEX(evdev_table_mutex);

One would guess that this one protects evdev_table[], but why guess?

o       evdev_open(): adding a new client to the evdev_table[].

o       evdev_install_chrdev() [was evdev_connect()]: Adding a
       character device to the list.  Invokes evdev_attach_client(),
       which acquires evdev->client_lock.

o       evdev_remove_chrdev(): remove a client from the evdev_table[].


Summary of RCU protection:

o       Readers of the evdev->client_list can use RCU.

o       Readers of evdev->grab can use RCU.  RCU readers caring about
       concurrent deletion should check for evdev->exist under evdev->mutex.

> +
> +static void evdev_pass_event(struct evdev_client *client, struct input_event *event)
> +{
> +     unsigned long flags;
> +
> +     spin_lock_irqsave(&client->buffer_lock, flags);
> +     client->buffer[client->head++] = *event;
> +     client->head &= EVDEV_BUFFER_SIZE - 1;

!!! Why don't we need to check for overflow here???

We can't do anything if we overflow so if user process can't fetch
events fast enough solest one will simply be lost.

> +     spin_unlock_irqrestore(&client->buffer_lock, flags);
> +
> +     kill_fasync(&client->fasync, SIGIO, POLL_IN);
> +}
>
>  static void evdev_event(struct input_handle *handle, unsigned int type, unsigned int code, int value)
>  {
>       struct evdev *evdev = handle->private;
>       struct evdev_client *client;
> +     struct input_event event;
>
> -     if (evdev->grab) {
> -             client = evdev->grab;
> -
> -             do_gettimeofday(&client->buffer[client->head].time);
> -             client->buffer[client->head].type = type;
> -             client->buffer[client->head].code = code;
> -             client->buffer[client->head].value = value;
> -             client->head = (client->head + 1) & (EVDEV_BUFFER_SIZE - 1);
> -
> -             kill_fasync(&client->fasync, SIGIO, POLL_IN);
> -     } else
> -             list_for_each_entry(client, &evdev->client_list, node) {
> -
> -                     do_gettimeofday(&client->buffer[client->head].time);
> -                     client->buffer[client->head].type = type;
> -                     client->buffer[client->head].code = code;
> -                     client->buffer[client->head].value = value;
> -                     client->head = (client->head + 1) & (EVDEV_BUFFER_SIZE - 1);
> +     do_gettimeofday(&event.time);
> +     event.type = type;
> +     event.code = code;
> +     event.value = value;
> +
> +     rcu_read_lock();
> +
> +     client = rcu_dereference(evdev->grab);

OK: protecting evdev->grab.  Not clear why we don't need to check
evdev->exist!!!

input_unregister_handle will stop delivery of events to evdev even
before we get to evdev_ark_dead where we set dev->exist = 0;

> +     if (client)
> +             evdev_pass_event(client, &event);
> +     else
> +             list_for_each_entry_rcu(client, &evdev->client_list, node)

evdev->client_list is under rcu_read_lock(), so OK.

> +                     evdev_pass_event(client, &event);
>
> -                     kill_fasync(&client->fasync, SIGIO, POLL_IN);
> -             }
> +     rcu_read_unlock();
>
>       wake_up_interruptible(&evdev->wait);
>  }
> @@ -89,33 +100,98 @@ static int evdev_flush(struct file *file
>  {
>       struct evdev_client *client = file->private_data;
>       struct evdev *evdev = client->evdev;
> +     int retval;
> +
> +
> +     retval = mutex_lock_interruptible(&evdev->mutex);
> +     if (retval)
> +             return retval;
>
>       if (!evdev->exist)

OK, holding ->mutex, so ->exist is stable.

> -             return -ENODEV;
> +             retval = -ENODEV;
> +     else
> +             retval = input_flush_device(&evdev->handle, file);
>
> -     return input_flush_device(&evdev->handle, file);
> +     mutex_unlock(&evdev->mutex);
> +     return retval;
>  }
>
>  static void evdev_free(struct device *dev)
>  {
>       struct evdev *evdev = container_of(dev, struct evdev, dev);
>
> -     evdev_table[evdev->minor] = NULL;
>       kfree(evdev);
>  }
>
> +static int evdev_grab(struct evdev *evdev, struct evdev_client *client)
> +{
> +     int error;
> +
> +     if (evdev->grab)

Need either rcu_read_lock() or to be holding evdev->mutex.  All callers
do in fact hold evdev_mutex, see below.

Yep.

> +             return -EBUSY;
> +
> +     error = input_grab_device(&evdev->handle);
> +     if (error)
> +             return error;
> +
> +     rcu_assign_pointer(evdev->grab, client);

OK, but does every caller hold evdev->mutex?  Yes, as follows:

o       evdev_do_ioctl(): yes, via the only caller, evdev_ioctl_handler().

o       evdev_ioctl_handler(): yes.

> +     synchronize_rcu();
> +
> +     return 0;
> +}
> +
> +static int evdev_ungrab(struct evdev *evdev, struct evdev_client *client)
> +{
> +     if (evdev->grab != client)

Need either rcu_read_lock() or to be holding evdev->mutex.  All callers
do in fact hold evdev_mutex, see below.

Yep.

> +             return  -EINVAL;
> +
> +     rcu_assign_pointer(evdev->grab, NULL);

Do all our callers hold evdev->mutex?

o       evdev_release(): yes.

o       evdev_do_ioctl(): yes, via the only caller, evdev_ioctl_handler().

o       evdev_ioctl_handler(): yes.

> +     synchronize_rcu();
> +     input_release_device(&evdev->handle);

OK -- we apparently tolerate manipulation of evdev->grab for up to a grace
period after NULLing the pointer.  People picking up evdev->grab therefore
should not be picking it up twice -- at least not without holding the
->mutex or without tolerating the second grab coming up NULL.

> +
> +     return 0;
> +}
> +
> +static void evdev_attach_client(struct evdev *evdev, struct evdev_client *client)
> +{
> +     spin_lock(&evdev->client_lock);
> +     list_add_tail_rcu(&client->node, &evdev->client_list);

OK, ->client_list modified while holding ->client_lock.

> +     spin_unlock(&evdev->client_lock);
> +     synchronize_rcu();
> +}
> +
> +static void evdev_detach_client(struct evdev *evdev, struct evdev_client *client)
> +{
> +     spin_lock(&evdev->client_lock);
> +     list_del_rcu(&client->node);

OK, assuming that clients->node is only ever added to the evdev->client_list.
Which does appear to be the case.

> +     spin_unlock(&evdev->client_lock);
> +     synchronize_rcu();
> +}
> +
> +static void evdev_hangup(struct evdev *evdev)
> +{
> +     struct evdev_client *client;
> +
> +     wake_up_interruptible(&evdev->wait);
> +
> +     spin_lock(&evdev->client_lock);
> +     list_for_each_entry(client, &evdev->client_list, node)

OK, traversing ->client_list while holding ->client_lock.

> +             kill_fasync(&client->fasync, SIGIO, POLL_HUP);

The kill_fasync() function in turn calls __kill_fasync(), but while
read-holding fasync_lock.  But bails if the file pointer is already NULL.

Oddly enough, __kill_fasync() has a debug check on a magic-number field
-- not sure why this isn't conditioned on a debug build or some such.
Maybe someone is chasing problems with this function?  And maybe that
is why we have a patch to add locking?  ;-)

The __kill_fasync() function in turn calls send_sigio(), which
read-acquires both the fown_struct lock and tasklist_lock, then does
send_sigio_to_task() for each thread in the task.

The send_sigio_to_task() function invokes group_send_sig_info(), which
calls lock_task_sighand(), which expects one of its callers to have
done an rcu_read_lock().  But I believe that read-holding tasklist_lock
also suffices.  Oleg, could you please either confirm or educate?  @@@

(I think this is OK, just been awhile since I dug through the signal
code.)

> +     spin_unlock(&evdev->client_lock);
> +}
> +
>  static int evdev_release(struct inode *inode, struct file *file)
>  {
>       struct evdev_client *client = file->private_data;
>       struct evdev *evdev = client->evdev;
>
> -     if (evdev->grab == client) {
> -             input_release_device(&evdev->handle);
> -             evdev->grab = NULL;
> -     }
> +     mutex_lock(&evdev->mutex);
> +     if (evdev->grab == client)

OK, ->grab stable because we hold ->mutex.

> +             evdev_ungrab(evdev, client);
> +     mutex_unlock(&evdev->mutex);
>
>       evdev_fasync(-1, file, 0);
> -     list_del(&client->node);
> +     evdev_detach_client(evdev, client);
>       kfree(client);
>
>       if (!--evdev->open && evdev->exist)

!!!  We don't hold ->mutex, so ->exist can change without notice.
Is this really safe???  Do we need to capture the value of ->exist
in a local variable while we hold the lock above?

No, this is not safe. It is pending changes to input_open_device and
input_close_device to perform all serialization and counting there. I
did not want to introduce locking that will be removed in the next
patch.

> @@ -126,47 +202,53 @@ static int evdev_release(struct inode *i
>       return 0;
>  }
>
> -static int evdev_open(struct inode *inode, struct file *file)
> +static int evdev_new_client(struct evdev *evdev, struct file *file)
>  {
>       struct evdev_client *client;
> -     struct evdev *evdev;
> -     int i = iminor(inode) - EVDEV_MINOR_BASE;
>       int error;
>
> -     if (i >= EVDEV_MINORS)
> -             return -ENODEV;
> -
> -     evdev = evdev_table[i];
> -
>       if (!evdev || !evdev->exist)

!!!  We don't hold ->mutex, so this is racy.  The value of ->exist might
well go to zero immediately after we check it.  We don't appear to
be in an RCU read-side critical section.  Why is this safe?

I simply need to remove || !evdev->exist. We are holding
evdev_table_mutex and therefore evdev_remove_chrdev can't be running
and evdev_mark_dead() comes after evdev_remove_chrdev() in
evdev_disconnect().

>               return -ENODEV;
>
> -     get_device(&evdev->dev);
> -
>       client = kzalloc(sizeof(struct evdev_client), GFP_KERNEL);
> -     if (!client) {
> -             error = -ENOMEM;
> -             goto err_put_evdev;
> -     }
> +     if (!client)
> +             return -ENOMEM;
>
> +     spin_lock_init(&client->buffer_lock);
>       client->evdev = evdev;
> -     list_add_tail(&client->node, &evdev->client_list);
> +     evdev_attach_client(evdev, client);
>
>       if (!evdev->open++ && evdev->exist) {

!!!  We don't hold ->mutex, so this is racy.  The value of ->exist might
well go to zero immediately after we check it.  We don't appear to
be in an RCU read-side critical section.  Why is this safe?

The same as for evdev_release - not safe, pending further changes in input core.

>               error = input_open_device(&evdev->handle);
> -             if (error)
> -                     goto err_free_client;

...

> @@ -272,26 +354,56 @@ static ssize_t evdev_write(struct file *
>       struct evdev_client *client = file->private_data;
>       struct evdev *evdev = client->evdev;
>       struct input_event event;
> -     int retval = 0;
> +     int retval;
>
> -     if (!evdev->exist)
> -             return -ENODEV;
> +     retval = mutex_lock_interruptible(&evdev->mutex);
> +     if (retval)
> +             return retval;
> +
> +     if (!evdev->exist) {

We hold ->mutex, so this check is stable.  OK!

> +             retval = -ENODEV;
> +             goto out;
> +     }
>
>       while (retval < count) {
>
> -             if (evdev_event_from_user(buffer + retval, &event))
> -                     return -EFAULT;
> +             if (evdev_event_from_user(buffer + retval, &event)) {
> +                     retval = -EFAULT;
> +                     goto out;
> +             }
> +
>               input_inject_event(&evdev->handle, event.type, event.code, event.value);
>               retval += evdev_event_size();
>       }
>
> + out:
> +     mutex_unlock(&evdev->mutex);
>       return retval;
>  }
>
> +static int evdev_fetch_next_event(struct evdev_client *client,
> +                               struct input_event *event)
> +{
> +     int have_event;
> +
> +     spin_lock_irq(&client->buffer_lock);
> +
> +     have_event = client->head != client->tail;
> +     if (have_event) {
> +             *event = client->buffer[client->tail++];
> +             client->tail &= EVDEV_BUFFER_SIZE - 1;
> +     }
> +
> +     spin_unlock_irq(&client->buffer_lock);
> +
> +     return have_event;
> +}
> +
>  static ssize_t evdev_read(struct file *file, char __user *buffer, size_t count, loff_t *ppos)
>  {
>       struct evdev_client *client = file->private_data;
>       struct evdev *evdev = client->evdev;
> +     struct input_event event;
>       int retval;
>
>       if (count < evdev_event_size())
> @@ -308,14 +420,12 @@ static ssize_t evdev_read(struct file *f
>       if (!evdev->exist)

!!!  We don't hold ->mutex, so this is racy.  The value of ->exist might
well go to zero immediately after we check it.  We don't appear to
be in an RCU read-side critical section.  Why is this safe?

Nowhere in evdev_read we are trying to access underlying input device.
We just delivering events we already have in the buffer. Therefore it
will not hurt if we complete delivering staged events even if device
goes away in the middle of this process but it saves us mutex lock.

We do lock mutex in evdev_write because there events flow from
userspace into the kernel and potentially into the device and we don't
want to deliver events to a dead device.

I guess I'll have to add some comments ;)

...

>
> -     evdev_table[minor] = evdev;
> +     error = evdev_install_chrdev(evdev);

From here on, we are globally accessible!!!

!!!  Is it going to be OK if a user accesses the character device before
the following initialization completes?

Yes, device_add just completes instantiation of device into sysfs
driver structure and delivers uevents to userspace. Evdev is fully
functional at this point.

Input_register_handle installs the list between a device and evdev so
that events can flow through it. Before handle is registered input
events from the device will simply not reach evdev.

> +     if (error)
> +             goto err_free_evdev;
>
>       error = device_add(&evdev->dev);
>       if (error)
> -             goto err_free_evdev;
> +             goto err_remove_chrdev;
>
>       error = input_register_handle(&evdev->handle);
>       if (error)

Thank you for reviewing this!

--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- Fw: Re: + add-locking-to-evdev.patch added to -mm tree
  - From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Prev by Date: Re: Add a norecovery option to ext3/4?
Next by Date: Re: Ten percent test
Previous by thread: Re: Fw: Re: + add-locking-to-evdev.patch added to -mm tree
Next by thread: Re: Kernel 2.6.20.4: Software RAID 5: ata13.00: (irq_stat 0x00020002, failed to transmit command FIS)
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]