On Sun, Nov 19, 2006 at 11:55:16PM +0300, Oleg Nesterov wrote:
> On 11/19, Alan Stern wrote:
> >
> > On Sun, 19 Nov 2006, Oleg Nesterov wrote:
> >
> > > int xxx_read_lock(struct xxx_struct *sp)
> > > {
> > > int idx;
> > >
> > > idx = sp->completed & 0x1;
> > > atomic_inc(sp->ctr + idx);
> > > smp_mb__after_atomic_inc();
> > >
> > > return idx;
> > > }
> > >
> > > void xxx_read_unlock(struct xxx_struct *sp, int idx)
> > > {
> > > if (atomic_dec_and_test(sp->ctr + idx))
> > > wake_up(&sp->wq);
> > > }
> > >
> > > void synchronize_xxx(struct xxx_struct *sp)
> > > {
> > > wait_queue_t wait;
> > > int idx;
> > >
> > > init_wait(&wait);
> > > mutex_lock(&sp->mutex);
> > >
> > > idx = sp->completed++ & 0x1;
> > >
> > > for (;;) {
> > > prepare_to_wait(&sp->wq, &wait, TASK_UNINTERRUPTIBLE);
> > >
> > > if (!atomic_add_unless(sp->ctr + idx, -1, 1))
> > > break;
> > >
> > > schedule();
> > > atomic_inc(sp->ctr + idx);
> > > }
> > > finish_wait(&sp->wq, &wait);
> > >
> > > mutex_unlock(&sp->mutex);
> > > }
> > >
> > > Very simple. Note that synchronize_xxx() is O(1), doesn't poll, and could
> > > be optimized further.
> >
> > What happens if synchronize_xxx manages to execute inbetween
> > xxx_read_lock's
> >
> > idx = sp->completed & 0x1;
> > atomic_inc(sp->ctr + idx);
> >
> > statements?
>
> Oops. I forgot about explicit mb() before sp->completed++ in synchronize_xxx().
>
> So synchronize_xxx() should do
>
> smp_mb();
> idx = sp->completed++ & 0x1;
>
> for (;;) { ... }
>
> > You see, there's no way around using synchronize_sched().
>
> With this change I think we are safe.
>
> If synchronize_xxx() increments ->completed in between, the caller of
> xxx_read_lock() will see all memory ops (started before synchronize_xxx())
> completed. It is ok that synchronize_xxx() returns immediately.
Let me take Alan's example one step further:
o CPU 0 starts executing xxx_read_lock(), but is interrupted
(or whatever) just before the atomic_inc().
o CPU 1 executes synchronize_xxx() to completion, which it
can because CPU 0 has not yet incremented the counter.
o CPU 0 returns from interrupt and completes xxx_read_lock(),
but has incremented the wrong counter.
o CPU 0 continues into its critical section, picking up a
pointer to an xxx-protected data structure (or, in Jens's
case starting an xxx-protected I/O).
o CPU 1 executes another synchronize_xxx(). This completes
immediately because CPU 1 has the wrong counter incremented.
o CPU 1 continues, either freeing a data structure while
CPU 0 is still referencing it, or, in Jens's case, completing
an I/O barrier while there is still outstanding I/O.
I agree with Alan -- unless I am missing something, we need a
synchronize_sched() in synchronize_xxx(). One thing missing in
the I/O-barrier case might be the possible restrictions I call
out in my earlier email.
Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]