Re: .17rc5 cfq slab corruption.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 30 2006, Dave Jones wrote:
> On Tue, May 30, 2006 at 03:17:28PM +0200, Jens Axboe wrote:
>  > On Sat, May 27 2006, Dave Jones wrote:
>  > > On Sat, May 27, 2006 at 09:07:24AM +0200, Jens Axboe wrote:
>  > >  > On Fri, May 26 2006, Andrew Morton wrote:
>  > >  > > Dave Jones <[email protected]> wrote:
>  > >  > > >
>  > >  > > > Was playing with googles new picasa toy, which hammered the disks
>  > >  > > > hunting out every image file it could find, when this popped out:
>  > >  > > > 
>  > >  > > > Slab corruption: (Not tainted) start=ffff810012b998c8, len=168
>  > >  > > > Redzone: 0x5a2cf071/0x5a2cf071.
>  > >  > > > Last user: [<ffffffff8032c319>](cfq_free_io_context+0x2f/0x74)
>  > >  > > > 090: 10 bd 28 1b 00 81 ff ff 6b 6b 6b 6b 6b 6b 6b 6b
>  > >  > > > Prev obj: start=ffff810012b99808, len=168
>  > >  > > > Redzone: 0x5a2cf071/0x5a2cf071.
>  > >  > > > Last user: [<ffffffff8032c319>](cfq_free_io_context+0x2f/0x74)
>  > >  > > > 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
>  > >  > > > 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
>  > >  > > > Next obj: start=ffff810012b99988, len=168
>  > >  > > > Redzone: 0x5a2cf071/0x5a2cf071.
>  > >  > > > Last user: [<ffffffff8032c319>](cfq_free_io_context+0x2f/0x74)
>  > >  > > > 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
>  > >  > > > 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
>  > >  > 
>  > >  > Pretty baffling... cfq has been hammered pretty thoroughly over the
>  > >  > last months and _nothing_ has shown up except some performance anomalies
>  > >  > that are now fixed. Since daves case (at least) seems to be
>  > >  > use-after-free, I'll see if I can reproduce with some contrived case.
>  > >  > I'm asuming that picasa forks and exits a lot with submitted io in
>  > >  > between than may not have finished at exit.
>  > > 
>  > > The second time I hit it, was actually during boot up.
>  > 
>  > Dave, do you have any io scheduler switching going on?
> 
> Here's something interesting (possibly unrelated).
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=193534
> 
> I added this patch to our devel kernel (based on 17rc5-git5 right now)
> 
> It's similar to the list_head debugging patch from -mm
> 
> --- linux-2.6.12/include/linux/list.h~	2005-08-08 15:34:50.000000000 -0400
> +++ linux-2.6.12/include/linux/list.h	2005-08-08 15:35:22.000000000 -0400
> @@ -5,7 +5,9 @@
>  
>  #include <linux/stddef.h>
>  #include <linux/prefetch.h>
> +#include <linux/kernel.h>
>  #include <asm/system.h>
> +#include <asm/bug.h>
>  
>  /*
>   * These are non-NULL pointers that will result in page faults
> @@ -52,6 +52,16 @@ static inline void __list_add(struct lis
>  			      struct list_head *prev,
>  			      struct list_head *next)
>  {
> +	if (next->prev != prev) {
> +		printk("List corruption. next->prev should be %p, but was %p\n",
> +				prev, next->prev);
> +		BUG();
> +	}
> +	if (prev->next != next) {
> +		printk("List corruption. prev->next should be %p, but was %p\n",
> +				next, prev->next);
> +		BUG();
> +	}
>  	next->prev = new;
>  	new->next = next;
>  	new->prev = prev;
> @@ -162,6 +162,16 @@ static inline void __list_del(struct lis
>   */
>  static inline void list_del(struct list_head *entry)
>  {
> +	if (entry->prev->next != entry) {
> +		printk("List corruption. prev->next should be %p, but was %p\n",
> +				entry, entry->prev->next);
> +		BUG();
> +	}
> +	if (entry->next->prev != entry) {
> +		printk("List corruption. next->prev should be %p, but was %p\n",
> +				entry, entry->next->prev);
> +		BUG();
> +	}
>  	__list_del(entry->prev, entry->next);
>  	entry->next = LIST_POISON1;
>  	entry->prev = LIST_POISON2;
> 
> 
> And then it turned up this:
> 
> List corruption. next->prev should be f74a5e2c, but was ea7ed31c
> Pointing at cfq_set_request.

I think I'm missing a piece of this - what list was corrupted, in what
function did it trigger?


-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux