Re: BUG: assertion failure in fs/jbd/checkpoint.c persists in 2.6.11.12

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* David Wilk ([email protected]) wrote:
> We've been plagued buy this ext3 bug since 2.6.10, and it only happens
> on heavily loaded postgres systems.  We run our postgres DB on ext3
> data=journal on a dmcrypt partition.  Our kernel is also patched with
> grsec, but that doesn't appear to play any role.

Can you recreate w/out grsec, and w/out dm-crypt?  IOW, just ext3 plus
your load?

> After upgrading to 2.6.11.12 (specifically for the ext3 checkpoint.c
> fix) we noticed two things.  The assertion failure persists, and now
> we get a condition where a postgres process will spin in state 'D'
> forever and hog 100% of a CPU (in system, not user).
> 
> I've attached the trace in plain text so the formatting doesn't get screwed.
> 
> Let me know if anyone would like more information.  I'm no programmer,
> but I'd like to help in any way that I can.

Would you mind trying this patch:

From: Jan Kara <[email protected]>

On one path, cond_resched_lock() fails to return true if it dropped the lock. 
We think this might be causing the crashes in JBD's log_do_checkpoint().

(chrisw: backport to 2.6.11.12)
---

 kernel/sched.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

Index: release-2.6.11/kernel/sched.c
===================================================================
--- release-2.6.11.orig/kernel/sched.c
+++ release-2.6.11/kernel/sched.c
@@ -3788,11 +3788,14 @@ EXPORT_SYMBOL(cond_resched);
  */
 int cond_resched_lock(spinlock_t * lock)
 {
+	int ret = 0;
+
 #if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT)
 	if (lock->break_lock) {
 		lock->break_lock = 0;
 		spin_unlock(lock);
 		cpu_relax();
+		ret = 1;
 		spin_lock(lock);
 	}
 #endif
@@ -3800,10 +3803,10 @@ int cond_resched_lock(spinlock_t * lock)
 		_raw_spin_unlock(lock);
 		preempt_enable_no_resched();
 		__cond_resched();
+		ret = 1;
 		spin_lock(lock);
-		return 1;
 	}
-	return 0;
+	return ret;
 }
 
 EXPORT_SYMBOL(cond_resched_lock);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux