Re: [PATCH]: PCI Error Recovery: Symbios SCSI device driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 20, 2006 at 01:05:10PM -0500, Linas Vepstas wrote:
> Index: linux-2.6.19-rc1-git11/drivers/scsi/sym53c8xx_2/sym_glue.c
> ===================================================================
> --- linux-2.6.19-rc1-git11.orig/drivers/scsi/sym53c8xx_2/sym_glue.c	2006-10-20 12:25:11.000000000 -0500
> +++ linux-2.6.19-rc1-git11/drivers/scsi/sym53c8xx_2/sym_glue.c	2006-10-20 12:41:15.000000000 -0500
> @@ -659,6 +659,11 @@ static irqreturn_t sym53c8xx_intr(int ir
>  
>  	if (DEBUG_FLAGS & DEBUG_TINY) printf_debug ("[");
>  
> +	/* Avoid spinloop trying to handle interrupts on frozen device */
> +	if ((np->s.device->error_state != pci_channel_io_normal) &&
> +	    (np->s.device->error_state != 0))
> +		return IRQ_HANDLED;
> +

This needs to be before the printf_debug call.

> @@ -726,6 +731,19 @@ static int sym_eh_handler(int op, char *
>  
>  	dev_warn(&cmd->device->sdev_gendev, "%s operation started.\n", opname);
>  
> +	/* We may be in an error condition because the PCI bus
> +	 * went down. In this case, we need to wait until the
> +	 * PCI bus is reset, the card is reset, and only then
> +	 * proceed with the scsi error recovery.  There's no
> +	 * point in hurrying; take a leisurely wait.
> +	 */
> +#define WAIT_FOR_PCI_RECOVERY	35
> +	if ((np->s.device->error_state != pci_channel_io_normal) &&
> +	    (np->s.device->error_state != 0) &&
> +	    (wait_for_completion_timeout(&np->s.io_reset_wait,
> +		                         WAIT_FOR_PCI_RECOVERY*HZ) == 0))
> +			return SCSI_FAILED;
> +

Is it safe / reasonable / a good idea to sleep for 35 seconds in the EH
handler?  I'm not that familiar with how the EH code works.  It has its
own thread, so I suppose that's OK.

Are the driver's data structures still intact after a reset?

I generally prefer not to be so perlish in conditionals, ie:

	if ((np->s.device->error_state != pci_channel_io_normal) &&
	    (np->s.device->error_state != 0) {
		int timed_out = wait_for_completion_timeout(
			&np->s.io_reset_wait, WAIT_FOR_PCI_RECOVERY*HZ);
		if (!timed_out)
			return SCSI_FAILED;
	}

Why is the condition so complicated though?  What does 0 mean if it's
not io_normal?  At least let's hide that behind a convenience macro:

	if (abnormal_error_state(np->s.device->error_state)) {
		...
	}

> Index: linux-2.6.19-rc1-git11/drivers/scsi/sym53c8xx_2/sym_hipd.c
> ===================================================================
> --- linux-2.6.19-rc1-git11.orig/drivers/scsi/sym53c8xx_2/sym_hipd.c	2006-10-20 12:25:11.000000000 -0500
> +++ linux-2.6.19-rc1-git11/drivers/scsi/sym53c8xx_2/sym_hipd.c	2006-10-20 12:41:16.000000000 -0500
> @@ -2761,6 +2761,7 @@ void sym_interrupt (struct sym_hcb *np)
>  	u_char	istat, istatc;
>  	u_char	dstat;
>  	u_short	sist;
> +	u_int    icnt;

The cryptic names in this routine are actually register names.  Calling
a counter 'icnt' is unhelpful (rather than fitting in with the style).
Just 'i' will do.

>  	/*
>  	 *  interrupt on the fly ?
> @@ -2802,6 +2803,7 @@ void sym_interrupt (struct sym_hcb *np)
>  	sist	= 0;
>  	dstat	= 0;
>  	istatc	= istat;
> +	icnt = 0;
>  	do {
>  		if (istatc & SIP)
>  			sist  |= INW(np, nc_sist);
> @@ -2809,6 +2811,14 @@ void sym_interrupt (struct sym_hcb *np)
>  			dstat |= INB(np, nc_dstat);
>  		istatc = INB(np, nc_istat);
>  		istat |= istatc;
> +
> +		/* Prevent deadlock waiting on a condition that may never clear. */
> +		icnt ++;
> +		if (icnt > 100) {
> +			if ((np->s.device->error_state != pci_channel_io_normal)
> +			   && (np->s.device->error_state != 0))
> +				return;
> +		}
>  	} while (istatc & (SIP|DIP));

Though, since INB and INW will return 0xff and 0xffff, why not use that
as our test rather than using a counter?

		if (sist == 0xffff && dstat == 0xff) {
			if (abnormal_error_state(np->s.device->error_state)
				return;
		}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux