Re: [PATCH RFC 3/9] RCU: Preemptible RCU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[ sneaks away from the family for a bit to answer emails ]

--
On Fri, 21 Sep 2007, Paul E. McKenney wrote:

> On Fri, Sep 21, 2007 at 09:19:22PM -0400, Steven Rostedt wrote:
> >
> > --
> > On Fri, 21 Sep 2007, Paul E. McKenney wrote:
> > > >
> > > > In any case, I will be looking at the scenarios more carefully.  If
> > > > it turns out that GP_STAGES can indeed be cranked down a bit, well,
> > > > that is an easy change!  I just fired off a POWER run with GP_STAGES
> > > > set to 3, will let you know how it goes.
> > >
> > > The first attempt blew up during boot badly enough that ABAT was unable
> > > to recover the machine (sorry, grahal!!!).  Just for grins, I am trying
> > > it again on a machine that ABAT has had a better record of reviving...
> >
> > This still frightens the hell out of me. Going through 15 states and
> > failing. Seems the CPU is holding off writes for a long long time. That
> > means we flipped the counter 4 times, and that still wasn't good enough?
>
> Might be that the other machine has its 2.6.22 version of .config messed
> up.  I will try booting it on a stock 2.6.22 kernel when it comes back
> to life -- not sure I ever did that before.  Besides, the other similar
> machine seems to have gone down for the count, but without me torturing
> it...
>
> Also, keep in mind that various stages can "record" a memory misordering,
> for example, by incrementing the wrong counter.
>
> > Maybe I'll boot up my powerbook to see if it has the same issues.
> >
> > Well, I'm still finishing up on moving into my new house, so I wont be
> > available this weekend.
>
> The other machine not only booted, but has survived several minutes of
> rcutorture thus far.  I am also trying POWER5 machine as well, as the
> one currently running is a POWER4, which is a bit less aggressive about
> memory reordering than is the POWER5.
>
> Even if they pass, I refuse to reduce GP_STAGES until proven safe.
> Trust me, you -don't- want to be unwittingly making use of a subtely
> busted RCU implementation!!!

I totally agree. This is the same reason I want to understand -why- it
fails with 3 stages. To make sure that adding a 4th stage really does fix
it, and doesn't just make the chances for the bug smaller.

I just have that funny feeling that we are band-aiding this for POWER with
extra stages and not really solving the bug.

I could be totally out in left field on this. But the more people have a
good understanding of what is happening (this includes why things fail)
the more people in general can trust this code.  Right now I'm thinking
you may be the only one that understands this code enough to trust it. I'm
just wanting you to help people like me to trust the code by understanding
and not just having faith in others.

If you ever decide to give up jogging, we need to make sure that there are
people here that can still fill those running shoes (-:


 -- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux