Re: [PATCH RFC 3/9] RCU: Preemptible RCU

On Fri, Sep 21, 2007 at 09:15:03PM -0400, Steven Rostedt wrote:
> On Fri, 21 Sep 2007, Paul E. McKenney wrote:
> > On Fri, Sep 21, 2007 at 10:40:03AM -0400, Steven Rostedt wrote:
> > > On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote:

[ . . . ]

> > > > +	/*
> > > > +	 * Take the next transition(s) through the RCU grace-period
> > > > +	 * flip-counter state machine.
> > > > +	 */
> > > > +
> > > > +	switch (rcu_try_flip_state) {
> > > > +	case rcu_try_flip_idle_state:
> > > > +		if (rcu_try_flip_idle())
> > > > +			rcu_try_flip_state = rcu_try_flip_waitack_state;
> > >
> > > Just trying to understand all this. Here at flip_idle, only a CPU with
> > > no pending RCU calls will flip it. Then all the cpus flags will be set
> > > to rcu_flipped, and the ctrl.completed counter is incremented.
> >
> > s/no pending RCU calls/at least one pending RCU call/, but otherwise
> > spot on.
> >
> > So if the RCU grace-period machinery is idle, the first CPU to take
> > a scheduling-clock interrupt after having posted an RCU callback will
> > get things going.
> 
> I said 'no' becaues of this:
> 
> +rcu_try_flip_idle(void)
> +{
> +       int cpu;
> +
> +       RCU_TRACE_ME(rcupreempt_trace_try_flip_i1);
> +       if (!rcu_pending(smp_processor_id())) {
> +               RCU_TRACE_ME(rcupreempt_trace_try_flip_ie1);
> +               return 0;
> +       }
> 
> But now I'm a bit more confused. :-/
> 
> Looking at the caller in kernel/timer.c I see
> 
> 	if (rcu_pending(cpu))
> 		rcu_check_callbacks(cpu, user_tick);
> 
> And rcu_check_callbacks is the caller of rcu_try_flip. The confusion is
> that we call this when we have a pending rcu, but if we have a pending
> rcu, we won't flip the counter ??

We don't enter unless there is something for RCU to do (might be a
pending callback, for example, but might also be needing to acknowledge
a counter flip).  If, by the time we get to rcu_try_flip_idle(), there
is no longer anything to do (!rcu_pending()), we bail.

So a given CPU kicks the state machine out of idle only if it -still-
has something to do once it gets to rcu_try_flip_idle(), right?

[ . . . ]

> > > Is there a chance that overflow of a counter (although probably very
> > > very unlikely) would cause any problems?
> >
> > The only way it could cause a problem would be if there was ever
> > more than 4,294,967,296 outstanding rcu_read_lock() calls.  I believe
> > that lockdep screams if it sees more than 30 nested locks within a
> > single task, so for systems that support no more than 100M tasks, we
> > should be OK.  It might sometime be necessary to make this be a long
> > rather than an int.  Should we just do that now and be done with it?
> 
> Sure, why not. More and more and more overkill!!!
> 
> (rostedt hears in his head the Monty Python "Spam" song).

;-)  OK!

> > > Also, all the CPUs have their "check_mb" set.
> > >
> > > > +			rcu_try_flip_state = rcu_try_flip_waitmb_state;
> > > > +		break;
> > > > +	case rcu_try_flip_waitmb_state:
> > > > +		if (rcu_try_flip_waitmb())
> > >
> > > I have to admit that this seems a bit of an overkill, but I guess you
> > > know what you are doing.  After going through three states, we still
> > > need to do a memory barrier on each CPU?
> >
> > Yep.  Because there are no memory barriers in rcu_read_unlock(), the
> > CPU is free to reorder the contents of the RCU read-side critical section
> > to follow the counter decrement.  This means that this CPU would still
> > be referencing RCU-protected data after it had told the world that it
> > was no longer doing so.  Forcing a memory barrier on each CPU guarantees
> > that if we see the memory-barrier acknowledge, we also see any prior
> > RCU read-side critical section.
> 
> And this seem reasonable to me that this would be enough to satisfy a
> grace period. But the CPU moving around the rcu_read_(un)lock's around.
> 
> Are we sure that adding all these grace periods stages is better than just
> biting the bullet and put in a memory barrier?

Good question.  I believe so, because the extra stages don't require
much additional processing, and because the ratio of rcu_read_lock()
calls to the number of grace periods is extremely high.  But, if I
can prove it is safe, I will certainly decrease GP_STAGES or otherwise
optimize the state machine.

[ . . . ]

> > > OK, that's all I have on this patch (will take a bit of a break before
> > > reviewing your other patches).  But I will say that RCU has grown quite
> > > a bit, and is looking very good.
> >
> > Glad you like it, and thank you again for the careful and thorough review.
> 
> I'm scared to do the preempt portion %^O

Ummm...  This -was- the preempt portion.  ;-)

> > > Basically, what I'm saying is "Great work, Paul!".  This is looking
> > > good. Seems that we just need a little bit better explanation for those
> > > that are not up at the IQ level of you.  I can write something up after
> > > this all gets finalized. Sort of a rcu-design.txt, that really tries to
> > > explain it to the simpleton's like me ;-)
> >
> > I do greatly appreciate the compliments, especially coming from someone
> > like yourself, but it is also true that I have been implementing and
> > using RCU in various forms for longer than some Linux-community members
> > (not many, but a few) have been alive, and programming since 1972 or so.
> > Lots and lots of practice!
> 
> `72, I was 4.

What, and you weren't programming yet???  ;-)

> > Hmmm...  I started programming about the same time that I started
> > jogging consistently.  Never realized that before.
> 
> Well, I hope you keep doing both for a long time to come.

Me too!  ;-)

> > I am thinking in terms of getting an improved discussion of RCU design and
> > use out there -- after all, the fifth anniversary of RCU's addition to
> > the kernel is coming right up.  This does deserve better documentation,
> > especially given that for several depressing weeks near the beginning
> > of 2005 I believed that a realtime-friendly RCU might not be possible.
> 
> That is definitely an accomplishment. And I know as well as you do that it
> happened because of a lot of people sharing ideas. Some good, some bad,
> but all helpful for heathy development!

Indeed!  The current version is quite a bit different than my early-2005
posting (which relied on locks!), and a -lot- of people had a hand in
making it what it is today.

							Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [PATCH RFC 3/9] RCU: Preemptible RCU
  - From: Steven Rostedt <rostedt@goodmis.org>

References:
- [PATCH RFC 0/9] RCU: Preemptible RCU
  - From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
- [PATCH RFC 3/9] RCU: Preemptible RCU
  - From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
- Re: [PATCH RFC 3/9] RCU: Preemptible RCU
  - From: Steven Rostedt <rostedt@goodmis.org>
- Re: [PATCH RFC 3/9] RCU: Preemptible RCU
  - From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
- Re: [PATCH RFC 3/9] RCU: Preemptible RCU
  - From: Steven Rostedt <rostedt@goodmis.org>

Prev by Date: Re: [PATCH 1/2] bnx2: factor out gzip unpacker
Next by Date: Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
Previous by thread: Re: [PATCH RFC 3/9] RCU: Preemptible RCU
Next by thread: Re: [PATCH RFC 3/9] RCU: Preemptible RCU
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]