[patch] fix spinlock-debug looping

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Andrew Morton <[email protected]> wrote:

> > > > The write_trylock + __delay in the loop is not a problem or a bug, as 
> > > > the trylock will at most _increase_ the delay - and our goal is to not 
> > > > have a false positive, not to be absolutely accurate about the 
> > > > measurement here.
> > > 
> > > Precisely.  We have delays of over a second (but we don't know how 
> > > much more than a second).  Let's say two seconds.  The NMI watchdog 
> > > timeout is, what?  Five seconds?
> > 
> > i dont see the problem.
> 
> It's taking over a second to acquire a write_lock.  A lock which is 
> unlikely to be held for more than a microsecond anywhere.  That's 
> really bad, isn't it?  Being on the edge of an NMI watchdog induced 
> system crash is bad, too.

i obviously agree that any such crash is a serious problem, but is it 
caused by the spinlock-debugging code? I doubt it is, unless __delay() 
is seriously buggered.

in any case, to move this problem forward i'd suggest we go with the 
patch below in -mm and wait for feedback. It fixes a potential overflow 
in the comparison (if HZ*lpj overflows 32-bits) That needs a really fast 
box to run the 32-bit kernel though, so i doubt this is the cause of the 
problems. In any case, this change makes it easier to increase the 
looping timeout from 1 second to 10 seconds later on or so - at which 
point the overflow can happen for real and must be handled .

	Ingo

------------
Subject: fix spinlock-debug looping
From: Ingo Molnar <[email protected]>

make sure the right hand side of the comparison does not overflow
on 32-bits. Also print out more info when detecting a lockup, so
that we see how many times the code tried (and failed) to get the
lock.

Signed-off-by: Ingo Molnar <[email protected]>
---
 lib/spinlock_debug.c |   24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

Index: linux/lib/spinlock_debug.c
===================================================================
--- linux.orig/lib/spinlock_debug.c
+++ linux/lib/spinlock_debug.c
@@ -104,7 +104,7 @@ static void __spin_lock_debug(spinlock_t
 	u64 i;
 
 	for (;;) {
-		for (i = 0; i < loops_per_jiffy * HZ; i++) {
+		for (i = 0; i < (u64)loops_per_jiffy * HZ; i++) {
 			if (__raw_spin_trylock(&lock->raw_lock))
 				return;
 			__delay(1);
@@ -112,10 +112,10 @@ static void __spin_lock_debug(spinlock_t
 		/* lockup suspected: */
 		if (print_once) {
 			print_once = 0;
-			printk(KERN_EMERG "BUG: spinlock lockup on CPU#%d, "
-					"%s/%d, %p\n",
+			printk(KERN_EMERG "BUG: possible spinlock lockup on CPU#%d, "
+					"%s/%d, %p [%Ld/%ld]\n",
 				raw_smp_processor_id(), current->comm,
-				current->pid, lock);
+				current->pid, lock, i, loops_per_jiffy);
 			dump_stack();
 		}
 	}
@@ -169,7 +169,7 @@ static void __read_lock_debug(rwlock_t *
 	u64 i;
 
 	for (;;) {
-		for (i = 0; i < loops_per_jiffy * HZ; i++) {
+		for (i = 0; i < (u64)loops_per_jiffy * HZ; i++) {
 			if (__raw_read_trylock(&lock->raw_lock))
 				return;
 			__delay(1);
@@ -177,10 +177,10 @@ static void __read_lock_debug(rwlock_t *
 		/* lockup suspected: */
 		if (print_once) {
 			print_once = 0;
-			printk(KERN_EMERG "BUG: read-lock lockup on CPU#%d, "
-					"%s/%d, %p\n",
+			printk(KERN_EMERG "BUG: possible read-lock lockup on CPU#%d, "
+					"%s/%d, %p [%Ld/%ld]\n",
 				raw_smp_processor_id(), current->comm,
-				current->pid, lock);
+				current->pid, lock, i, loops_per_jiffy);
 			dump_stack();
 		}
 	}
@@ -242,7 +242,7 @@ static void __write_lock_debug(rwlock_t 
 	u64 i;
 
 	for (;;) {
-		for (i = 0; i < loops_per_jiffy * HZ; i++) {
+		for (i = 0; i < (u64)loops_per_jiffy * HZ; i++) {
 			if (__raw_write_trylock(&lock->raw_lock))
 				return;
 			__delay(1);
@@ -250,10 +250,10 @@ static void __write_lock_debug(rwlock_t 
 		/* lockup suspected: */
 		if (print_once) {
 			print_once = 0;
-			printk(KERN_EMERG "BUG: write-lock lockup on CPU#%d, "
-					"%s/%d, %p\n",
+			printk(KERN_EMERG "BUG: possible write-lock lockup on CPU#%d, "
+					"%s/%d, %p [%Ld/%ld]\n",
 				raw_smp_processor_id(), current->comm,
-				current->pid, lock);
+				current->pid, lock, i, loops_per_jiffy);
 			dump_stack();
 		}
 	}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux