Re: discriminate single bit error hardware failure from slab corruption.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2 Feb 2006, Dave Jones wrote:

> In the case where we detect a single bit has been flipped, we spew
> the usual slab corruption message, which users instantly think
> is a kernel bug.  In a lot of cases, single bit errors are
> down to bad memory, or other hardware failure.
>
> This patch adds an extra line to the slab debug messages in those
> cases, in the hope that users will try memtest before they report a bug.
>
> 000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> Single bit error detected. Possibly bad RAM. Please run memtest86.

does memtest86 run on all $ARCHes ?
or this is good for <large percentage>, so it's Good.  :)
Just checking; it is a good idea.

> Signed-off-by: Dave Jones <[email protected]>
>
> --- linux-2.6.15/mm/slab.c~	2006-01-09 13:25:17.000000000 -0500
> +++ linux-2.6.15/mm/slab.c	2006-01-09 13:26:01.000000000 -0500
> @@ -1313,8 +1313,11 @@ static void poison_obj(kmem_cache_t *cac
>  static void dump_line(char *data, int offset, int limit)
>  {
>  	int i;
> +	unsigned char total=0;
>  	printk(KERN_ERR "%03x:", offset);
>  	for (i = 0; i < limit; i++) {
> +		if (data[offset+i] != POISON_FREE)
> +			total += data[offset+i];
>  		printk(" %02x", (unsigned char)data[offset + i]);
>  	}
>  	printk("\n");
> @@ -1019,6 +1023,18 @@ static void dump_line(char *data, int of
>  		}
>  	}
>  	printk("\n");
> +	switch (total) {
> +		case 0x36:
> +		case 0x6a:
> +		case 0x6f:
> +		case 0x81:
> +		case 0xac:
> +		case 0xd3:
> +		case 0xd5:
> +		case 0xea:
> +			printk (KERN_ERR "Single bit error detected. Possibly bad RAM. Please run memtest86.\n");
> +			return;
> +	}
>  }
>  #endif
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

-- 
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux