Re: Fw: Re: oops in choose_configuration()

On Sun, 5 Mar 2006, Andrew Morton wrote:
> 
> For several days I've been getting repeatable oopses in the -mm kernel. 
> They occur once per ~30 boots, during initscripts.

Actually, having thought about this some more, I wonder if the bug isn't a 
hell of a lot simpler than we've given it credit for.

I think you're running with CONFIG_PREEMPT_VOLUNTARY, right?

And looking more closely, that thing is BROKEN. DaveJ - do Fedora kernels 
also enable that thing?

Ingo: as far as I can see, CONFIG_PREEMPT_VOLUNTARY is totally and utterly 
broken during bootup. It does:

	# define might_resched() cond_resched()

and then we have

	# define might_sleep() do { might_resched(); } while (0)

and but the fact is, we _know_ that "might_sleep()" is broken during early 
bootup. We know this, because when we ahev __might_sleep() enabled to 
warn about cases where we must not sleep, we've had those tests disabled 
during early boot for a long time, in order to avoid irritating and nasty 
known "sleeping function called from invalid context" messages:

	...
        if ((in_atomic() || irqs_disabled()) &&
            system_state == SYSTEM_RUNNING && !oops_in_progress) {
                if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)
	...

Note in particular the "system_state == SYSTEM_RUNNING". It's there for a 
reason. Namely that we know that we do things that aren't valid during 
early bootup, and that we call functions that might sleep while we have 
interrupts disabled, for example.

HOWEVER, the "cond_resched()" does not take that into account at all, and 
will happily conditionally reschedule things at early bootup before we 
have set system_state to SYSTEM_RUNNING.

In other words, unless I've totally lost it, I think that 
CONFIG_PREEMPT_VOLUNTARY currently makes us re-schedule at points in the 
early boot that we _know_ are unsafe. We happen to not hit it very often, 
because (a) some of the time it doesn't matter and (b) when it matters, we 
seldom have "need_resched()" returning true, but I would not be at all 
surprised if Andrew's problems are because the scheduler heuristics make 
it happen when it shouldn't.

And the end result? I don't know. But we've traditionally run _all_ of the 
early boot ignoring the "might_sleep()" warnings, up until the point where 
we unlock the kernel lock, long after things like kmem_cache_init().

So I would not be surprised, for example, if we had kmem_cache_init() 
doing bad things because it got interrupts enabled at a point where it 
shouldn't, because it went through the scheduler. 

I dunno. I can't actually see what would corrupt anything, but the point 
is that we definitely do scheduling in places that have gotten absolutely 
_zero_ coverage, because we turned off the checks on purpose during early 
boot because the checks gave false positives.

And CONFIG_PREEMPT_VOLUNTARY turns those false positives into potential 
rescheduling events.

Maybe I'm crazy. But it looks really really broken to me.

Andrew, if I'm right, then this ugly patch should make a difference.

Is there something else I've missed?

			Linus

----
diff --git a/kernel/sched.c b/kernel/sched.c
index 12d291b..3454bb8 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4028,6 +4028,8 @@ static inline void __cond_resched(void)
 	 */
 	if (unlikely(preempt_count()))
 		return;
+	if (unlikely(system_state != SYSTEM_RUNNING))
+		return;
 	do {
 		add_preempt_count(PREEMPT_ACTIVE);
 		schedule();
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: Fw: Re: oops in choose_configuration()
  - From: Andrew Morton <akpm@osdl.org>
- Re: Fw: Re: oops in choose_configuration()
  - From: Ingo Molnar <mingo@elte.hu>
- Re: Fw: Re: oops in choose_configuration()
  - From: Mike Galbraith <efault@gmx.de>

References:
- Re: Fw: Re: oops in choose_configuration()
  - From: Andrew Morton <akpm@osdl.org>

Prev by Date: Re: [Ocfs2-devel] Ocfs2 performance bugs of doom
Next by Date: Coverity Open Source Defect Scan of Linux
Previous by thread: [PATCH] usbcore: Don't assume a USB configuration includes any interfaces
Next by thread: Re: Fw: Re: oops in choose_configuration()
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]