Re: OOPS during FC-aware-driver module reload...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 10, 2006 at 11:31:44AM -0700, Andrew Vasquez wrote:
> Recent linux-2.6.git trees introduced some oddities while performing a
> simple load/unload/load of the qla2xxx (a FC transport aware driver)
> driver.
> 
> Basically on module reload, I'd consistently hit the following
> 
> 	# insmod qla2xxx
> 	# rmmod qla2xxx
> 	# insmod qla2xxx
> 
> 	BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
> 	 printing eip: 00000000
> 	*pde = 00000000
> 	Oops: 0000 [#1]
> 	SMP
> 	Modules linked in: qla2xxx scsi_transport_fc
> 	CPU:    1
> 	EIP:    0060:[<00000000>]    Not tainted VLI
> 	EFLAGS: 00010246   (2.6.17-rc3 #32)
> 	EIP is at _stext+0x3feffd68/0x23
> 	eax: 00000000   ebx: f7c6c448   ecx: f3e92128   edx: 00000000
> 	esi: f3e92128   edi: c02fe420   ebp: 6b6b6b6b   esp: ea290dcc
> 	ds: 007b   es: 007b   ss: 0068
> 	Process insmod (pid: 10001, threadinfo=ea290000 task=f40ada90)
> 	Stack: <0>c0226794 6b6b6b6b f3e92128 f8b216e0 f3e92128 f3e92000 00000000 000000d0 
> 	       c0226cc3 f3e92128 c0226c90 c02598e4 f3e92128 f3e920f1 f3e92000 c02520c7 
> 	       f3e92000 ffff5e05 f3e92358 f3e93b14 f3e92000 f8b04d29 f3e92000 c1b2eb5c 
> 	Call Trace:
> 	 <c0226794> attribute_container_add_device+0x4b/0x135  <c0226cc3> transport_setup_device+0xe/0x11
> 	 <c0226c90> transport_setup_classdev+0x0/0x25   <c02598e4> scsi_sysfs_add_host+0x9e/0xac
> 	 <c02520c7> scsi_add_host+0x129/0x179   <f8b04d29> qla2x00_probe_one+0xa33/0xb34 [qla2xxx]
> 	 <c0125eb3> call_usermodehelper_keys+0xf7/0x104   <c0125d78> __call_usermodehelper+0x0/0x44
> 	 <f8b05e1b> qla2xxx_probe_one+0xe/0x11 [qla2xxx]   <c01e4ae6> pci_call_probe+0xf/0x12
> 	 <c01e4b1c> __pci_device_probe+0x33/0x47   <c01e4b4f> pci_device_probe+0x1f/0x34
> 	 <c0223ec4> driver_probe_device+0x43/0xa4   <c0223f95> __driver_attach+0x0/0x84
> 	 <c0223fee> __driver_attach+0x59/0x84   <c02235aa> bus_for_each_dev+0x47/0x6d
> 	 <c01dc91e> kobject_add+0xae/0xf7   <c022402d> driver_attach+0x14/0x18
> 	 <c0223f95> __driver_attach+0x0/0x84   <c02239d1> bus_add_driver+0x57/0x8d
> 	 <c02244b6> driver_register+0xb9/0xbe   <c01e4d4e> __pci_register_driver+0x85/0x96
> 	 <f883c07b> qla2x00_module_init+0x7b/0xa2 [qla2xxx]   <c01308c1> sys_init_module+0x8d/0x171
> 	 <c010266f> sysenter_past_esp+0x54/0x75  
> 	Code:  Bad EIP value.
> 	EIP: [<00000000>] _stext+0x3feffd68/0x23 SS:ESP 0068:ea290dcc
> 
> after some churning, git-bisect pointed to commit:
> 
> 	commit 56cf6504fc1c0c221b82cebc16a444b684140fb7
> 	Author: Russell King <[email protected]>
> 	Date:   Fri May 5 17:57:52 2006 +0100
> 
> 	    [BLOCK] Fix oops on removal of SD/MMC card
> 
> 	    The block layer keeps a reference (driverfs_dev) to the struct
> 	    device associated with the block device, and uses it internally
> 	    for generating uevents in block_uevent.
> 
> 	    Block device uevents include umounting the partition, which can
> 	    occur after the backing device has been removed.
> 
> 	    Unfortunately, this reference is not counted.  This means that
> 	    if the struct device is removed from the device tree, the block
> 	    layers reference will become stale.
> 
> 	    Guard against this by holding a reference to the struct device
> 	    in add_disk(), and only drop the reference when we're releasing
> 	    the gendisk kobject - in other words when we can be sure that no
> 	    further uevents will be generated for this block device.
> 
> 	    Signed-off-by: Russell King <[email protected]>
> 	    Acked-by: Jens Axboe <[email protected]>
> 
> after reverting the commit, all is fine again (no more panics during
> reload).  My fear though is that we're simply masking an underlying
> problem which the commit attempted to correct.

Not sure why that would be, but I think your assumption is correct (that
this fix has uncovered something else which is wrong.)

I'm not sure what because I don't do well when trying to decode x86
oopsen.  In fact, I'm not sure how it's getting from
attribute_container_add_device() to a NULL pointer.

The obvious thing is this:

                if (!cont->match(cont, dev))
                        continue;

but that means that "cont" must be invalid.  I don't know this code.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux