On Mon, Dec 26 2005, Pete Zaitcev wrote:
> On Fri, 9 Dec 2005 13:55:21 -0500, Dave Jones <[email protected]> wrote:
>
> > > > [311578.273186] BUG: spinlock bad magic on CPU#1, pdflush/30788 (Not tainted)
> >[...]
> > > > [311578.499972] RIP: 0010:[<ffffffff8021f8bd>] <ffffffff8021f8bd>{spin_bug+138}
> > > > [311578.798449] Call Trace:<ffffffff8021fbdb>{_raw_spin_lock+25} <ffffffff802174a4>{cfq_exit_single_io_context+85}
> > > > [311578.828782] <ffffffff80217527>{cfq_exit_io_context+33} <ffffffff8020d07d>{exit_io_context+137}
> > > > [311578.856762] <ffffffff8013f937>{do_exit+183} <ffffffff80152c90>{keventd_create_kthread+0}
> > > > [311578.883192] <ffffffff80110c25>{child_rip+15} <ffffffff80152c90>{keventd_create_kthread+0}
> > > > [311578.909852] <ffffffff80152d86>{kthread+0} <ffffffff80110c16>{child_rip+0}
>
> > Hmm, I may have also been experimenting at the time with Pete Zaitcev's
> > ub driver. Pete, could ub have been doing something bad here?
>
> Yes, this is ub's fault. I thought that blk_cleanup_queue frees the
> queue, but this is not the case. In recent kernels, it only decrements
> its refcount. If CFQ is around, it keeps the queue pinned and uses
> the queue's spinlock. But when ub calls blk_init_queue(), it passes a
> spinlock located in its data structure (ub_dev), which corresponds to
> a device. The ub_dev is refcounted and freed when the device is
> disconnected or closed. As you can see, this leaves the queue's
> spinlock pointer dangling.
That indeed looks like to be it. We've had the reference counted queue
for a long time. CFQ is the only io scheduler that is affected by such a
bug, although the core block layer coultd trigger it as well but the
window is much smaller (with CFQ the window is very large).
> The code was taken from Carmel, and it used to work fine for a long time.
> I suspect now that Carmel is vulnerable, if it's hot-removed while open.
> Maybe Jeff wants to look into it.
Indeed.
> The usb-storage is immune to this problem, because SCSI passes NULL to
> blk_init_queue.
SCSI did have the same problem recently, it just does the assignment
differently.
> Schedulers other than CFQ use their own spinlocks, so they do not hit
> this problem.
>
> The attached patch works around this issue by using spinlocks which are
> static to the ub module. Thus, it places ub into the same group as floppy.
> This is not ideal, in case someone manages to remove the module yet have
> queues remaining... But I am reluctant to copy what scsi_request_fn is
> doing. After all, ub is supposed to be simpler.
>
> Any comments before I send this to Greg?
It's probably the easiest way to fix it in a straight forward manner.
I'll see if I can add some debug code to detect this issue...
--
Jens Axboe
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]