Re: Slab corruption in 2.6.16-rc5-mm2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mon, 6 Mar 2006, Jesper Juhl wrote:
>
> Not a git user (I need to become one but haven't found the time to read up 
> on it yet), but no problem, I'll dig out the patch and try reverting it.

It's attached here.

NOTE! I'm not at all sure it's the re-try logic. It could be something 
else. Anything that completes the request before it's actually totally 
done - or possibly re-uses the sense data for something else would be 
wrong and buggy.

> Btw, the messages turn out slightly different on each boot, here are the 
> ones from this current boot of my box:
> 
> Slab corruption: start=f72b6b98, len=64
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<c02934eb>](sr_do_ioctl+0x11b/0x270)
> 000: 70 00 02 00 00 00 00 0a 00 00 00 00 3a 01 00 00
> 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Ok, same deal. "Medium not present - tray closed" sense data.

> Slab corruption: start=f72b6b98, len=64
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<c02934eb>](sr_do_ioctl+0x11b/0x270)
> 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Hmm. Totally empty sense data? Strange.

> Slab corruption: start=f72b6b98, len=64
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<c01d3769>](ext3_clear_inode+0x29/0x40)
> 000: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00
> 010: 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

This is different. But it looks similar. It looks like the thing was 
actually re-allocated for something else (posix acl data?) but then 
overwritten. However, the overwritten data does look like SCSI sense 
information again ("Invalid field in cdb"), so I think it's the same 
thing despite the fact that it had gotten re-allocated for something else.

> Would gathering more of these help you out?

It's always interesting when trying to find the pattern, but I think the 
pattern is already pretty clear. sr_do_ioctl() seems to be the thing, and 
sense data is written too late.

> I have no USB, SATA or similar devices in the box, only a floppy drive, a 
> SCSI harddisk, a SCSI CD writer and a SCSI DVD-ROM.

Well, the fact that you have a CDSI CD-writer and a SCSI DVD-ROM explains 
the thing, so that's all good.

> scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
>         <Adaptec 29160N Ultra160 SCSI adapter>
>         aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

So it's either an aic7xxx bug, or it's generic SCSI.

Considering that we've had other slab corruption issues (the reason I was 
looking closely at yours), generic SCSI isn't out of the question.

If you were a git user, doing a bisection run would be useful since you 
seem to be able to recreate it at will. Oh, well. Testign that one patch 
would still help.

		Linus
diff-tree 17e01f216b611fc46956dcd9063aec4de75991e3 (from 6e68af666f5336254b5715dca591026b7324499a)
Author: Mike Christie <[email protected]>
Date:   Fri Nov 11 05:31:37 2005 -0600

    [SCSI] add retries field to request for REQ_BLOCK_PC use
    
    For tape we need to control the retries. This patch adds a retries
    counter on the request for REQ_BLOCK_PC commands originating from
    scsi_execute* to use. REQ_BLOCK_PC commands comming from the block
    layer SG_IO path continue to use the retires set in the ULD init_command.
    (scsi_execute* does not set the gendisk so we do not execute
    the init_command in that path).
    
    Signed-off-by: Mike Christie <[email protected]>
    Signed-off-by: James Bottomley <[email protected]>

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index eb0cfbf..365843a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -259,6 +259,7 @@ int scsi_execute(struct scsi_device *sde
 	memcpy(req->cmd, cmd, req->cmd_len);
 	req->sense = sense;
 	req->sense_len = 0;
+	req->retries = retries;
 	req->timeout = timeout;
 	req->flags |= flags | REQ_BLOCK_PC | REQ_SPECIAL | REQ_QUIET;
 
@@ -472,6 +473,7 @@ int scsi_execute_async(struct scsi_devic
 	req->sense = sioc->sense;
 	req->sense_len = 0;
 	req->timeout = timeout;
+	req->retries = retries;
 	req->flags |= REQ_BLOCK_PC | REQ_QUIET;
 	req->end_io_data = sioc;
 
@@ -1393,7 +1395,7 @@ static int scsi_prep_fn(struct request_q
 				cmd->sc_data_direction = DMA_NONE;
 			
 			cmd->transfersize = req->data_len;
-			cmd->allowed = 3;
+			cmd->allowed = req->retries;
 			cmd->timeout_per_command = req->timeout;
 			cmd->done = scsi_generic_done;
 		}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 9a68716..509e9a0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -184,6 +184,7 @@ struct request {
 	void *sense;
 
 	unsigned int timeout;
+	int retries;
 
 	/*
 	 * For Power Management requests

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux