Re: Assertion failure in log_do_checkpoint(): "drop_count != 0 || cleanup_ret != 0"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> ons 2005-05-04 klockan 21:43 +0400 skrev cron:
> > Got a 2.6.12-rc3 spontaneous reboot on i386 SMP after about 20h of 
> > moderate web serving. 2.6.11-rc1-bk6 works well for a week or more but
> > leaks memory at about 16 Mb/day and needs reboot thereafter. :(
> > 
> > Can you please give advice how to interpret this log:
> 
> I'm quite certain I've hit this twice today. Can't be 100% sure but the
> bottom of the traces looked just like this and RIP was at
> log_do_checkpoint(). Once sporadically and another time trying to hit
> it. The workload both times was having two separate kernel trees, both
> were 2.6.12-rc2-mm something and then in both of the trees I did ketchup
> 2.6.12-rc3-mm3 in both trees more or less at the same time.
  Could you please try the attached patch? It should fix the bug.
Stephen has thought off a better long-term fix but was busy with other
things last weeks so I didn't get to coding it yet...

> So then I ran two concurrent while loops ketchuping trees. Something
> like:
> #!/bin/sh
> 
> while true
> do
>         ~/bin/ketchup 2.6.11-rc3-mm2
>         ~/bin/ketchup 2.6-mm
>         ~/bin/ketchup 2.6.11
>         ~/bin/ketchup 2.6.11-rc1
> done
> 
> 
> It took less than an hour to purposefully hit the bug with this here. No
> guarantees it will happen consistently however...
> 
> This is a dual cpu x64 with preempt running 2.6.12-rc3. I'm
> attaching .config if it is of interest to anyone.

									Honza
-- 
Jan Kara <[email protected]>
SuSE CR Labs
Fix possible false assertion failure in JBD checkpointing code.

Signed-off-by: Jan Kara <[email protected]>

diff -rupX /home/jack/.kerndiffexclude linux-2.6.12-rc2/fs/jbd/checkpoint.c linux-2.6.12-rc2-checkpoint/fs/jbd/checkpoint.c
--- linux-2.6.12-rc2/fs/jbd/checkpoint.c	2005-03-03 18:58:29.000000000 +0100
+++ linux-2.6.12-rc2-checkpoint/fs/jbd/checkpoint.c	2005-04-05 13:26:42.000000000 +0200
@@ -339,8 +339,10 @@ int log_do_checkpoint(journal_t *journal
 			}
 		} while (jh != last_jh && !retry);
 
-		if (batch_count)
+		if (batch_count) {
 			__flush_batch(journal, bhs, &batch_count);
+			retry = 1;
+		}
 
 		/*
 		 * If someone cleaned up this transaction while we slept, we're

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux