On Tue, Jul 11, 2006 at 10:56:05AM -0400, Alan Stern wrote:
> On Tue, 11 Jul 2006, Oleg Nesterov wrote:
>
> > Let's look at the 3-rd synchronize_sched() call.
> >
> > Suppose that the reader writes to the rcu-protected memory and then
> > does srcu_read_unlock(). It is possible that the writer sees the result
> > of srcu_read_unlock() (so that srcu_readers_active_idx() returns 0)
> > but does not see other writes yet. The writer does synchronize_sched(),
> > the reader (according to 2) above) does mb(). But this doesn't guarantee
> > that all preceding mem ops were finished, so it is possible that
> > synchronize_srcu() returns and probably frees that memory too early.
> >
> > If it were possible to "insert" mb() into srcu_read_unlock() before ->c[]--,
> > we are ok, but synchronize_sched() is a "shot in the dark".
> >
> > In a similar manner, the 2-nd synchronize_sched() requires (I think) that
> > all changes which were done by the current CPU should be visible to other
> > CPUs before return.
> >
> > Does it make sense?
>
> Yes, it's a valid concern.
>
> Paul will correct me if this is wrong... As I understand it, the code
> which responds to a synchronize_sched() call (that is, the code which runs
> on other CPUs) does lots of stuff, involving context changes and
> who-knows-what else. There are lots of memory barriers in there, buried
> in the middle, along with plenty of other memory reads and writes.
>
> In outline, the reader's CPU responds to synchronize_sched() something
> like this:
>
> At the next safe time (context switch):
> mb();
> Tell the writer's CPU that the preempt count on
> the reader's CPU is 0
>
> Since that "tell the writer's CPU" step won't complete until after the
> mb() has forced all preceding memory operations to complete, we are safe.
>
> So perhaps the documentation for synchronize_sched() and synchronize_rcu()
> should say that when the call returns, all CPUs (including the caller's)
> will have executed a memory barrier and so will have completed all memory
> operations that started before the synchronize_xxx() call was issued.
I would instead think of this in the context of memory-barrier pairings
as documented in Documentation/memory-barriers.txt.
As Oleg notes above, synchronize_sched() is a "shot in the dark", since
there is nothing that lines up the synchronize_srcu() with any concurrent
srcu_read_lock() or srcu_read_unlock(). The trick here is that the
synchronize_sched() eliminates one of the possibilities (taking this
from the viewpoint of the last synchronize_sched() in synchronize_srcu()
on CPU 0 interacting with a concurrent srcu_read_unlock() on CPU 1):
1. srcu_read_unlock()'s counter decrement is perceived by CPU 1
as happening after all of the read-side critical-section
accesses. This is OK, because this means that the critical
section completes before synchronize_srcu()'s checks, and thus
before any destructive operations following the synchronize_srcu().
2. srcu_read_unlock()'s counter decrement is perceived by CPU 1
as happening -before- some of the read-side critical-section
accesses. This could be bad, but... The synchronize_srcu()
forces CPU 0 to execute a memory barrier on each active CPU,
including CPU 1. CPU 0 must subsequently execute a memory
barrier to become aware of the grace period ending, so that
it sees -all- CPU 1's earlier memory accesses (required, because
CPU 0 had to have seen, perhaps indirectly, CPU 1's indication
that it had gone through a quiescent state).
Therefore, by the time CPU 0 is done with the synchronize_sched(),
the memory accesses that CPU 1 executed as part of the SRCU
read-side critical section must all be visible.
Any sequence of events that has the srcu_read_unlock() happening after
the synchronize_sched() starts is prohibited, since the synchronize_sched()
won't start until the counters all hit zero.
In case people were wondering, the ornateness of the above reasoning is
one reason I have been resisting any attempt to further complicate
the SRCU stuff. ;-)
Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]