[PATCH scsi-misc-2.6 00/08] scsi: small fixes & cleanups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 Hello, James.
 Hello, Jens.

 These are series of small fixes & cleanups.  The last two patches
deal with reference counting and hot unplugging oops.  Patches are
against scsi-misc-2.6 tree (this is the devel tree, right?).

 Jens, please try #08 and tell me if you still get oops.  AFAICT,
reference counting isn't the issue here.  We're over-counting not
under-counting (see description of #07).

 All compile cleanly and I haven't found any problem yet.  USB
hot-unplugging doesn't create oops or offline device anymore for me.

[ Start of patch descriptions ]

01_scsi_remove_scsi_release_buffers.patch
	: remove unused bounce-buffer release path

	Buffer bouncing hasn't been done inside the scsi midlayer for
	quite sometime now, but bounce-buffer release paths are still
	around.  This patch removes these unused paths.	

02_scsi_no_special_on_requeue.patch
	: don't use blk_insert_request() for requeueing

	blk_insert_request() has 'reinsert' argument, which, when set,
	turns on REQ_SPECIAL and REQ_SOFTBARRIER and requeues the
	request.  SCSI midlayer was the only user of this feature and
	all requeued requests become special requests defeating
	quiesce state.  This patch makes scsi midlayer use
	blk_requeue_request() for requeueing and removes 'reinsert'
	feature from blk_insert_request().

	Note: In drivers/scsi/scsi_lib.c, scsi_single_lun_run() and
	scsi_run_queue() are moved upward unchanged.

03_scsi_remove_internal_timeout.patch
	: remove unused scsi_cmnd->internal_timeout field

	scsi_cmnd->internal_timeout field doesn't have any meaning
	anymore.  Kill the field.

04_scsi_remove_volatile.patch
	: remove meaningless volatile qualifiers from structure definitions

	scsi_device->device_busy, Scsi_Host->host_busy and
	->host_failed have volatile qualifiers, but the qualifiers
	don't serve any purpose.  Kill them.  While at it, protect
	->host_failed update in scsi_error for consistency and clarity.

05_scsi_timer_cleanup.patch
	: remove a timer race from scsi_queue_insert() and cleanup timer

	scsi_queue_insert() has four callers.  Three callers call with
	timer disabled and one (the second invocation in
	scsi_dispatch_cmd()) calls with timer activated.
	scsi_queue_insert() used to always call scsi_delete_timer()
	and ignore the return value.  This results in race with timer
	expiration.  Remove scsi_delete_timer() call from
	scsi_queue_insert() and make the caller delete timer and check
	the return value.

	While at it, as, once a scsi timer is added, it should expire
	or be deleted before reused, make scsi_add_timer() strict
	about timer reuses.  Now timer expiration function clears
	->function and all timer deletion should go through
	scsi_delete_timer().  Also, remove bogus ->eh_action tests
	from scsi_eh_{done|times_out} functions.  The condition is
	always true and the test is somewhat misleading.

06_scsi_remove_serial_number_at_timeout.patch
	: remove meaningless scsi_cmnd->serial_number_at_timeout field

	scsi_cmnd->serial_number_at_timeout doesn't serve any purpose
	anymore.  All serial_number == serial_number_at_timeout tests
	are always true in abort callbacks.  Kill the field.  Also, as
	->pid always equals ->serial_number and ->serial_number
	doesn't have any special meaning anymore, update comments
	above ->serial_number accordingly.  Once we remove all uses of
	this field from all lldd's, this field should go.

07_scsi_refcnt_cleanup.patch
	: remove bogus {get|put}_device() calls

	SCSI request submission paths can be categorized like the
	following.

	* through high-level driver (sd, st, sg...)
		+ requests (fs / pc)
		+ ioctls
		+ flushes (issue_flush / barrier rqs)
		+ backing dev (unplug fn / field referencing)
		+ high-level specific (init / revalidation...)
	* through scsi-midlayer
		+ midlevel specific (init...)
		+ transport specific (domain validations...)

	All accesses either

	* open high-level driver before submitting requests and
	  closes with no request left.
	* get_device() before submitting requests and put_device()
          with no request left.

	So, basically, SCSI high-level object (scsi_disk) and
	mid-level object (scsi_device) are reference counted by users,
	not the requests they submit.  Reference count cannot go zero
	with active users and users cannot access the object once the
	reference count reaches zero.

	So, the {get/put}_device() calls in scsi_get_command() and
	scsi_request_fn() are bogus and misleading.  In addition,
	get_device() cannot synchronize 1->0 and 0->1 transitions and
	always returns the device pointer given as the argument.  The
	== NULL tests are just misleading.

08_scsi_hot_unplug_fix.patch
	: fix hot unplug sequence

	When hot-unplugging using scsi_remove_host() function (as usb
	does), scsi_forget_host() used to be called before
	scsi_host_cancel().  So, the device gets removed first without
	request cleanup and scsi_host_cancel() never gets to call
	scsi_device_cancel() on the removed devices.  This results in
	premature completion of hot-unplugging process with active
	requests left in queue, eventually leading to hang/offlined
	device or oops when the active command times out.

	This patch makes scsi_remove_host() call scsi_host_cancel()
	first such that the host is first transited into cancel state
	and all requests of all devices are killed, and then, the
	devices are removed.  This patch fixes the oops in eh after
	hot-unplugging bug.

[ End of patch descriptions ]

 Thanks a lot.

--
tejun

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux