Re: 2.6.17-rc5-mm3 - crash in cfq_queue_empty() after iosched change

On Tue, Jun 06 2006, Jens Axboe wrote:
> On Mon, Jun 05 2006, [email protected] wrote:
> > I've been hitting this about once every two weeks for a while now,
> > probably back to a 2.6.16-rc or so.  It always bites at the same time
> > while my laptop was at a point very late in bootup. I finally caught
> > one when I had pen, paper, *and* time to chase it a bit rather than
> > reboot.  Sorry for the very partial traceback, it's not a good CTS day
> > and I didn't have a digital camera handy.
> > 
> > BUG: Unable to handle kernel NULL pointer dereference at 0x0000005c
> > EIP at cfq_queue_empty+0x9/0x15
> > call trace:
> > 	elv_queue_empty+0x20/0x22
> > 	ide_do_request+0xa4/0x788
> > 	ide_intr+0x1ec/0x236
> > 	handle_IRQ_eent+0x27/0x52
> > 	handle_level_IRQ+0xb6
> > 	do_IRQ+0x5d/0x78
> > 	common_interrupt+0x1a/0x20
> > 
> > In my .config:
> > 
> > CONFIG_IOSCHED_NOOP=y
> > CONFIG_IOSCHED_AS=y
> > CONFIG_IOSCHED_DEADLINE=y
> > CONFIG_IOSCHED_CFQ=y
> > CONFIG_DEFAULT_IOSCHED="anticipatory"
> > 
> > This happened very soon (within a few milliseconds or two) after my /etc/rc.local did:
> > 
> > echo cfq >| /sys/block/hda/queue/scheduler
> > 
> > (The next executable statement in /etc/rc.local is this:
> > echo noop >| /sys/block/hdb/queue/scheduler  and 'last sysfs file' still
> > pointed at /dev/hda).
> > 
> > It *looks* like the problem is in elevator_switch() in block/elevator.c:
> > 
> >        while (q->rq.elvpriv) {
> >                 blk_remove_plug(q);
> >                 q->request_fn(q);
> >                 spin_unlock_irq(q->queue_lock);
> >                 msleep(10);
> >                 spin_lock_irq(q->queue_lock);
> >                 elv_drain_elevator(q);
> >         }
> > 
> > this--> spin_unlock_irq(q->queue_lock);
> > 
> >         /*
> >          * unregister old elevator data
> >          */
> >         elv_unregister_queue(q);
> >         old_elevator = q->elevator;
> > 
> >         /*
> >          * attach and start new elevator
> >          */
> >         if (elevator_attach(q, e))
> >                 goto fail;
> > 
> > should be down here someplace, after elevator_attach(), I suspect?
> > Looks like the disk popped an IRQ after we had installed the
> > iosched_cfq.ops[] but q->elevator->elevator_data hadn't been
> > initialized yet...
> > 
> > (I'd attach a patch, except I'm not positive I have the diagnosis
> > right?)
> 
> I think your analysis is pretty good, there's definitely a period there
> where we don't want the queueing invoked. Does this help?

Tested here, switched 50 times between the various io schedulers while
the queue was fully loaded. JFYI.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: 2.6.17-rc5-mm3 - crash in cfq_queue_empty() after iosched change
  - From: Jens Axboe <[email protected]>

References:
- 2.6.17-rc5-mm3 - crash in cfq_queue_empty() after iosched change
  - From: [email protected]
- Re: 2.6.17-rc5-mm3 - crash in cfq_queue_empty() after iosched change
  - From: Jens Axboe <[email protected]>

Prev by Date: Re: 2.6.17-rc5-mm3 - crash in cfq_queue_empty() after iosched change
Next by Date: Re: 2GB MMC/SD cards
Previous by thread: Re: 2.6.17-rc5-mm3 - crash in cfq_queue_empty() after iosched change
Next by thread: Re: 2.6.17-rc5-mm3 - crash in cfq_queue_empty() after iosched change
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]