Re: [bug] ata subsystem related crash with latest -git

On Wed, Oct 17 2007, Jens Axboe wrote:
> On Wed, Oct 17 2007, Ingo Molnar wrote:
> > 
> > ok, here's a different but similar crash that triggers on the testbox:
> > 
> > [  233.438890] BUG: unable to handle kernel paging request at virtual address 7d93e000
> > [  233.446390] printing eip: 784e9480 *pde = 01000067 *pte = 0593e000 
> > [  233.452630] Oops: 0000 [#1] DEBUG_PAGEALLOC
> > [  233.456790] 
> > [  233.458264] Pid: 0, comm: swapper Not tainted (2.6.23 #5)
> > [  233.463637] EIP: 0060:[<784e9480>] EFLAGS: 00010087 CPU: 0
> > [  233.469101] EIP is at ata_qc_issue+0x90/0x380
> > [  233.473429] EAX: 7d93dff0 EBX: 0000001f ECX: 7d93dff0 EDX: 798daf80
> > [  233.479668] ESI: 00000020 EDI: 7d93de00 EBP: 7b54007c ESP: 78a13e14
> > [  233.485908]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> > [  233.491282] Process swapper (pid: 0, ti=78a12000 task=789753e0 task.ti=78a12000)
> > [  233.498473] Stack: 7d93de00 7b540000 7b540000 00000000 7d93dfe0 7b54007c 7d93db00 7b5417a4 
> > [  233.506793]        784c2490 784ef69e 784f21f3 7b52de98 7d93db00 7b540000 7b5417a4 7d93db00 
> > [  233.515112]        7b540000 7b524004 784f22e0 784ef380 784c2490 7d93db00 00000202 7b524004 
> > [  233.523432] Call Trace:
> > [  233.526033]  [<784c2490>] scsi_done+0x0/0x20
> > [  233.530279]  [<784ef69e>] ata_scsi_translate+0xbe/0x140
> > [  233.535478]  [<784f21f3>] ata_scsi_queuecmd+0x33/0x200
> > [  233.540591]  [<784f22e0>] ata_scsi_queuecmd+0x120/0x200
> > [  233.545791]  [<784ef380>] ata_scsi_rw_xlat+0x0/0x220
> > [  233.550730]  [<784c2490>] scsi_done+0x0/0x20
> > [  233.554976]  [<784c2d12>] scsi_dispatch_cmd+0x152/0x290
> > [  233.560177]  [<78135c67>] trace_hardirqs_on+0x67/0xb0
> > [  233.565202]  [<784c8c7e>] scsi_request_fn+0x1be/0x370
> > [  233.570229]  [<78408086>] blk_run_queue+0x36/0x80
> > [  233.574909]  [<784c7520>] scsi_next_command+0x30/0x50
> > [  233.579935]  [<784c76ab>] scsi_end_request+0xab/0xe0
> > [  233.584875]  [<784c83f9>] scsi_io_completion+0xa9/0x3d0
> > [  233.590075]  [<78135c67>] trace_hardirqs_on+0x67/0xb0
> > [  233.595100]  [<78405125>] blk_done_softirq+0x45/0x80
> > [  233.600040]  [<78405153>] blk_done_softirq+0x73/0x80
> > [  233.604981]  [<7811d4c3>] __do_softirq+0x53/0xb0
> > [  233.609573]  [<7811d588>] do_softirq+0x68/0x70
> > [  233.613993]  [<78105351>] do_IRQ+0x51/0x90
> > [  233.618066]  [<78135c9c>] trace_hardirqs_on+0x9c/0xb0
> > [  233.623092]  [<7810f2d0>] pgd_dtor+0x0/0x50
> > [  233.627252]  [<7810388e>] common_interrupt+0x2e/0x40
> > [  233.632192]  [<7810f2d0>] pgd_dtor+0x0/0x50
> > [  233.636352]  [<7815f3be>] quicklist_trim+0x5e/0x90
> > [  233.641118]  [<7810f2cb>] check_pgt_cache+0x1b/0x20
> > [  233.645971]  [<78100c52>] cpu_idle+0x32/0x60
> > [  233.650217]  [<78a14b35>] start_kernel+0x265/0x300
> > [  233.654983]  [<78a14380>] unknown_bootoption+0x0/0x1e0
> > [  233.660097]  =======================
> > [  233.663649] Code: 00 00 00 8b 45 34 a8 02 0f 84 ed 00 00 00 8b bd 88 00 00 00 31 db 89 3c 24 8b 75 3c 89 f8 c7 44 24 10 00 00 00 00 eb 1b 8d 76 00 <8b> 50 10 8d 48 10 f6 c2 01 0f 85 be 02 00 00 89 44 24 10 83 c3 
> > [  233.682455] EIP: [<784e9480>] ata_qc_issue+0x90/0x380 SS:ESP 0068:78a13e14
> > [  233.689302] Kernel panic - not syncing: Fatal exception in interrupt
> > 
> > (gdb) list *0x784e9480
> > 0x784e9480 is in ata_qc_issue (include/linux/scatterlist.h:48).
> > 43       */
> > 44      static inline struct scatterlist *sg_next(struct scatterlist *sg)
> > 45      {
> > 46              sg++;
> > 47
> > 48              if (unlikely(sg_is_chain(sg)))
> > 49                      sg = sg_chain_ptr(sg);
> > 50
> > 51              return sg;
> > 52      }
> > (gdb)
> > 
> > so there's sg_next() involvement too. Below is the disassembly.
> 
> You must have a magic test box :-)
> 
> Will investigate... libata doesn't actually enable chaining, but since
> i386 supports it, it ends up using the chain helpers anyway.
> 
> There seems to be some automatic inlining involved here, it must be
> dying inside ata_sg_setup().

OK, something to try out - libata doesn't enable sg chaining, since the
support isn't complete yet. Here's a dumb check just to verify that
scsi_alloc_sgtable() will NEVER return a chain entry for a host that
doesn't have it enabled. If this triggers for you, then that could
explain your problem.

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index aac8a02..cc89c86 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -777,8 +777,12 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
 		 * sglist must be the biggest one, or we would not have
 		 * ended up doing another loop.
 		 */
-		if (prev)
+		if (prev) {
+			struct Scsi_Host *shost = cmd->device->host;
+
+			BUG_ON(!shost->use_sg_chaining);
 			sg_chain(prev, SCSI_MAX_SG_SEGMENTS, sgl);
+		}
 
 		/*
 		 * don't allow subsequent mempool allocs to sleep, it would

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [bug] ata subsystem related crash with latest -git
  - From: Ingo Molnar <[email protected]>
- Re: [bug] ata subsystem related crash with latest -git
  - From: Ingo Molnar <[email protected]>

References:
- [bug] block subsystem related crash with latest -git
  - From: Ingo Molnar <[email protected]>
- Re: [bug] block subsystem related crash with latest -git
  - From: Linus Torvalds <[email protected]>
- Re: [bug] block subsystem related crash with latest -git
  - From: Jens Axboe <[email protected]>
- Re: [bug] block subsystem related crash with latest -git
  - From: Jens Axboe <[email protected]>
- Re: [bug] block subsystem related crash with latest -git
  - From: Jens Axboe <[email protected]>
- Re: [bug] block subsystem related crash with latest -git
  - From: Jens Axboe <[email protected]>
- Re: [bug] block subsystem related crash with latest -git
  - From: Ingo Molnar <[email protected]>
- [bug] ata subsystem related crash with latest -git
  - From: Ingo Molnar <[email protected]>
- Re: [bug] ata subsystem related crash with latest -git
  - From: Jens Axboe <[email protected]>

Prev by Date: [RESEND] file operations: release can race with read/write?
Next by Date: [PATCH] kvm: Actually define EFLG_IF used in commit ad6c935c4963ee5577210ba47434c7c59aec81fa
Previous by thread: Re: [bug] ata subsystem related crash with latest -git
Next by thread: Re: [bug] ata subsystem related crash with latest -git
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]