Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync

On Mon, Nov 20, 2006 at 03:22:58PM -0500, Alan Stern wrote:
> On Mon, 20 Nov 2006, Paul E. McKenney wrote:
> 
> > On Mon, Nov 20, 2006 at 12:19:19PM -0500, Alan Stern wrote:
> > > Paul:
> > >
> > > Here's my version of your patch from yesterday.  It's basically the same,
> > > but I cleaned up the code in a few places and fixed a bug (the sign of idx
> > > in srcu_read_unlock).  Also I changed the init routine back to void, since
> > > it's no longer an error if the per-cpu allocation fails.
> >
> > I don't see any changes in the sign of idx.
> 
> Your earlier patch had srcu_read_unlock doing:
> 
> +	if (likely(idx <= 0)) {
> +		preempt_disable();
> +		smp_mb();
> +		per_cpu_ptr(rcu_dereference(sp->per_cpu_ref),
> +			    smp_processor_id())->c[idx]--;
> +		preempt_enable();
> +		return;
> +	}
> 
> Obviously you meant the test to be "idx >= 0".

That would explain the oops upon kicking off rcutorture.  ;-) Good catch --
must be time for me to visit the optometrist or something...

> > > -int init_srcu_struct(struct srcu_struct *sp)
> > > +void init_srcu_struct(struct srcu_struct *sp)
> > >  {
> > >  	sp->completed = 0;
> > >  	mutex_init(&sp->mutex);
> > > -	sp->per_cpu_ref = alloc_percpu(struct srcu_struct_array);
> > > -	return (sp->per_cpu_ref ? 0 : -ENOMEM);
> > > +	sp->per_cpu_ref = alloc_srcu_struct_percpu();
> > > +	atomic_set(&sp->hardluckref[0], 0);
> > > +	atomic_set(&sp->hardluckref[1], 0);
> > >  }
> >
> > Nack -- the caller is free to ignore the error return, but should be
> > able to detect it in case the caller is unable to tolerate the overhead
> > of running in hardluckref mode, perhaps instead choosing to fail a
> > user-level request in order to try to reduce memory pressure.
> 
> Okay.  Remember to change back the declaration in srcu.h as well.

I did manage to get this one right.  ;-)

> > I did update the comment to say that you can use SRCU_INITIALIZER()
> > instead.
> 
> Good.
> 
> > >  int srcu_read_lock(struct srcu_struct *sp)
> > >  {
> > >  	int idx;
> > > +	struct srcu_struct_array *sap;
> > > +
> > > +	if (unlikely(sp->per_cpu_ref == NULL &&
> > > +			mutex_trylock(&sp->mutex))) {
> > > +		if (sp->per_cpu_ref == NULL)
> > > +			sp->per_cpu_ref = alloc_srcu_struct_percpu();
> > > +		mutex_unlock(&sp->mutex);
> > > +	}
> > >
> > >  	preempt_disable();
> > >  	idx = sp->completed & 0x1;
> > >  	barrier();  /* ensure compiler looks -once- at sp->completed. */
> > > -	per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]++;
> > > -	srcu_barrier();  /* ensure compiler won't misorder critical section. */
> > > +	sap = rcu_dereference(sp->per_cpu_ref);
> > > +	if (likely(sap != NULL)) {
> > > +		per_cpu_ptr(sap, smp_processor_id())->c[idx]++;
> > > +		smp_mb();
> > > +	} else {
> > > +		atomic_inc(&sp->hardluckref[idx]);
> > > +		smp_mb__after_atomic_inc();
> > > +		idx = -1 - idx;
> > > +	}
> > >  	preempt_enable();
> > >  	return idx;
> > >  }
> >
> > Good restructuring -- took this, though I restricted the unlikely() to
> > cover only the comparison to NULL, since the mutex_trylock() has a
> > reasonable chance of success assuming that the read path has substantial
> > overhead from other sources.
> 
> I vacillated over that.  It's not clear what difference it will make to
> the compiler, but I guess you would make it a little clearer to a reader
> that the unlikely part is the test for NULL.

Agreed, probably mostly for the benefit of the human reader.

> > > @@ -131,10 +166,16 @@ int srcu_read_lock(struct srcu_struct *s
> > >   */
> > >  void srcu_read_unlock(struct srcu_struct *sp, int idx)
> > >  {
> > > -	preempt_disable();
> > > -	srcu_barrier();  /* ensure compiler won't misorder critical section. */
> > > -	per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]--;
> > > -	preempt_enable();
> > > +	if (likely(idx >= 0)) {
> > > +		smp_mb();
> > > +		preempt_disable();
> > > +		per_cpu_ptr(rcu_dereference(sp->per_cpu_ref),
> > > +			    smp_processor_id())->c[idx]--;
> > > +		preempt_enable();
> > > +	} else {
> > > +		smp_mb__before_atomic_dec();
> > > +		atomic_dec(&sp->hardluckref[-1 - idx]);
> > > +	}
> > >  }
> >
> > I took the moving smp_mb() out from under preempt_disable().
> > I left the "return" -- same number of lines either way.
> > I don't see any changes in sign compared to what I had, FWIW.
> 
> Compare your "if" statement to mine.

<red face>

> > > @@ -168,71 +214,35 @@ void synchronize_srcu(struct srcu_struct
> > >  	 * either (1) wait for two or (2) supply the second ourselves.
> > >  	 */
> > >
> > > -	if ((sp->completed - idx) >= 2) {
> > > -		mutex_unlock(&sp->mutex);
> > > -		return;
> > > -	}
> > > +	if ((sp->completed - idx) >= 2)
> > > +		goto done;
> >
> > I don't see the benefit of gathering all the mutex_unlock()s together.
> > If the unwinding was more complicated, I would take this change, however.
> 
> It's not a big deal.  A couple of extra function calls won't add much
> code.

And some compilers do this kind of gathering automatically anyway.

> > > +#ifdef	SMP__STORE_MB_LOAD_WORKS	/* The fast path */
> > > +	if (srcu_readers_active_idx(sp, idx) == 0)
> > > +		goto done;
> > > +#endif
> >
> > I still don't trust this one.  I would trust it a bit more if it were
> > srcu_readers_active() rather than srcu_readers_active_idx(), but even
> > then I suspect that external synchronization is required.
> 
> There should be some sort of assertion at the start of synchronize_srcu
> (just after the mutex is acquired) that srcu_readers_active_idx() yields 0
> for the inactive idx, and a similar assertion at the end (just before the
> mutex is released).  Hence there's no need to check both indices.

OK, good point -- the non-active set is indeed supposed to sum to zero
unless someone is in synchronize_srcu().

> >  Or is there
> > something that I am missing that makes it safe in face of the sequence
> > of events that you, Oleg, and I were discussing?
> 
> Consider the sequence of events we were discussing.
> 
> 	srcu_read_lock and synchronize_srcu are invoked at the same
> 	time.  The writer sees that the count is 0 and takes the fast
> 	path, leaving sp->completed unchanged.  The reader sees the
> 	new value of the data pointer and starts using it.
> 
> 	While the reader is still running, synchronize_srcu is called
> 	again.  This time it sees that the count is > 0, so it doesn't
> 	take the fast path.  Everything follows according to your
> 	original code.
> 
> The underlying problem with Oleg's code was that the reader could end up
> incrementing the wrong counter.  That can't happen with the fast path,
> because sp->completed doesn't change.

OK, then the remaining area of my paranoia centers around unfortunate
sequences of preemptions, increments, and decrements of the counters
while synchronize_srcu() is reading them out (and also being preempted).
Intuitively, it seems like this should always be safe, but I am worried
about it nonetheless.

> > For the moment, this optimization should be done by the caller, and
> > should be prominently commented.
> 
> Prominent commenting is certainly a good idea.  However I don't see how
> the caller can make this optimization without getting into the guts of the
> SRCU implementation.  After all, a major part of the optimization is
> to avoid calling synchronize_sched().

In general, I agree.  There may be special cases where the caller knows
it is safe due to things unknown to synchronize_srcu() -- Jens's example
might be a case in point.

							Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync
  - From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
- Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync
  - From: Alan Stern <stern@rowland.harvard.edu>

Prev by Date: Re: [take24 0/6] kevent: Generic event handling mechanism.
Next by Date: Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync
Previous by thread: Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync
Next by thread: Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]