Re: [patch] increase spinlock-debug looping timeouts from 1 sec to 1 min

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 19 Jun 2006 13:39:44 +0200
Ingo Molnar <[email protected]> wrote:

> 
> * Andrew Morton <[email protected]> wrote:
> 
> > > The write_trylock + __delay in the loop is not a problem or a bug, as 
> > > the trylock will at most _increase_ the delay - and our goal is to not 
> > > have a false positive, not to be absolutely accurate about the 
> > > measurement here.
> > 
> > Precisely.  We have delays of over a second (but we don't know how 
> > much more than a second).  Let's say two seconds.  The NMI watchdog 
> > timeout is, what?  Five seconds?
> 
> i dont see the problem.

It's taking over a second to acquire a write_lock.  A lock which is
unlikely to be held for more than a microsecond anywhere.  That's really
bad, isn't it?  Being on the edge of an NMI watchdog induced system crash
is bad, too.

> We'll have tried that lock hundreds of thousands 
> of times before this happens. The NMI watchdog will only trigger if we 
> do this with IRQs disabled.

tree_lock uses write_lock_irq().

> And it's not like the normal 
> __write_lock_failed codepath would be any different: for heavily 
> contended workloads the overhead is likely in the cacheline bouncing, 
> not in the __delay().

Yes, it might also happen with !CONFIG_DEBUG_SPINLOCK.  We need to find out
if that's so and if so, why.

> > That's getting too close.  The result will be a total system crash.  
> > And RH are shipping this.
> 
> I dont see a connection. Pretty much the only thing the loop condition 
> impacts is the condition under which we print out a 'i think we 
> deadlocked' message.

I'm assuming that the additional delay in the debug code has worsened the
situation.

> Have i missed your point perhaps?

I get that impression ;) If it takes 1-2 seconds to get this lock then it
can take five seconds.  a) that's just gross and b) the NMI watchdog will
nuke the box.

Why is it taking so long to get the lock?

Does it happen in non-debug mode?

What do we do about it?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux