Re: ATA scsi driver misbehavior under kdump capture kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, Cliff.

Cliff Wickman wrote:
> I've run into a problem with the ATA SCSI disk driver when running in a
> kdump dump-capture kernel.
> 
> I'm running on 2-processor x86_64 box.  It has 2 scsi disks, /dev/sda and
> /dev/sdb
> 
> My kernel is 2.6.22, and built to be a dump capturing kernel loaded by kexec.
> When I boot this kernel by itself, it finds both sda and sdb.
> 
> But when it is loaded by kexec and booted on a panic it only finds sda.
> 
> Any ideas from those familiar with the ATA driver?
> 
> 
> -Cliff Wickman
>  SGI
> 
> 
> I put some printk's into it and get this:
> 
> Standalone:
> 
>                                                    [nv_adma_error_handler]
> cpw: ata_host_register probe port 1 (error_handler:ffffffff81348625)
> cpw: ata_host_register call ata_port_probe
> cpw: ata_host_register call ata_port_schedule
> cpw: ata_host_register call ata_port_wait_eh
> cpw: ata_port_wait_eh entered
> cpw: ata_port_wait_eh, preparing to wait
> ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> cpw: ata_dev_configure entered
> cpw: ata_dev_configure testing class
> cpw: ata_dev_configure class is ATA_DEV_ATA
> ata2.00: ATA-6: ST3200822AS, 3.01, max UDMA/133
> ata2.00: 390721968 sectors, multi 16: LBA48
> cpw: ata_dev_configure exiting
> cpw: ata_dev_configure entered
> cpw: ata_dev_configure testing class
> cpw: ata_dev_configure class is ATA_DEV_ATA
> cpw: ata_dev_configure exiting
> cpw: ata_dev_set_mode printing:
> ata2.00: configured for UDMA/133
> cpw: ata_port_wait_eh, finished wait
> cpw: ata_port_wait_eh exiting
> cpw: ata_host_register done with probe port 1
> 
> 
> When loaded with kexec and booted on a panic:
> 
> cpw: ata_host_register probe port 1 (error_handler:ffffffff81348625)
> cpw: ata_host_register call ata_port_probe
> cpw: ata_host_register call ata_port_schedule
> cpw: ata_host_register call ata_port_wait_eh
> cpw: ata_port_wait_eh entered
> cpw: ata_port_wait_eh, preparing to wait
> ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> cpw: ata_port_wait_eh, finished wait
> cpw: ata_port_wait_eh exiting
> cpw: ata_host_register done with probe port 1

Hmmm.. PHY is up but for some reason libata thought there was no device
there.  Can you try the attached patch and report the result?

-- 
tejun
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 6001aae..4bc042d 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -732,6 +732,9 @@ ata_dev_try_classify(struct ata_port *ap, unsigned int device, u8 *r_err)
 	if (r_err)
 		*r_err = err;
 
+	ata_port_printk(ap, KERN_ERR, "XXX: try_classif dev%d e=%02x %02x:%02x:%02x\n",
+			device, err, tf.lbal, tf.lbam, tf.lbah);
+
 	/* see if device passed diags: if master then continue and warn later */
 	if (err == 0 && device == 0)
 		/* diagnostic fail : do nothing _YET_ */
@@ -3006,8 +3009,10 @@ int ata_wait_ready(struct ata_port *ap, unsigned long deadline)
 
 		if (!(status & ATA_BUSY))
 			return 0;
-		if (!ata_port_online(ap) && status == 0xff)
+		if (!ata_port_online(ap) && status == 0xff) {
+			ata_port_printk(ap, KERN_ERR, "XXX: wait_ready status=0xff\n");
 			return -ENODEV;
+		}
 		if (time_after(now, deadline))
 			return -EBUSY;
 
@@ -3037,8 +3042,10 @@ static int ata_bus_post_reset(struct ata_port *ap, unsigned int devmask,
 	if (dev0) {
 		rc = ata_wait_ready(ap, deadline);
 		if (rc) {
-			if (rc != -ENODEV)
+			if (rc != -ENODEV) {
+				ata_port_printk(ap, KERN_ERR, "XXX: post_reset dev0 rc=%d\n", rc);
 				return rc;
+			}
 			ret = rc;
 		}
 	}
@@ -3067,8 +3074,10 @@ static int ata_bus_post_reset(struct ata_port *ap, unsigned int devmask,
 
 		rc = ata_wait_ready(ap, deadline);
 		if (rc) {
-			if (rc != -ENODEV)
+			if (rc != -ENODEV) {
+				printk("XXX: post_reset dev1 rc=%d\n", rc);
 				return rc;
+			}
 			ret = rc;
 		}
 	}
@@ -3113,8 +3122,11 @@ static int ata_bus_softreset(struct ata_port *ap, unsigned int devmask,
 	 * the bus shows 0xFF because the odd clown forgets the D7
 	 * pulldown resistor.
 	 */
-	if (ata_check_status(ap) == 0xFF)
+	if (ata_check_status(ap) == 0xFF) {
+		ata_port_printk(ap, KERN_ERR, "XXX: status=0xFF altstatus=0x%x\n",
+				ata_altstatus(ap));
 		return -ENODEV;
+	}
 
 	return ata_bus_post_reset(ap, devmask, deadline);
 }
@@ -3402,6 +3414,8 @@ int ata_std_softreset(struct ata_port *ap, unsigned int *classes,
 	if (slave_possible && ata_devchk(ap, 1))
 		devmask |= (1 << 1);
 
+	ata_port_printk(ap, KERN_ERR, "XXX: devmask=0x%x\n", devmask);
+
 	/* select device 0 again */
 	ap->ops->dev_select(ap, 0);
 

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux