Re: 2.6.24-rc1-gb4f5550 oops

On Mon, 12 Nov 2007 19:05:49 +0100
Peter Zijlstra <[email protected]> wrote:

> 
> On Thu, 2007-11-08 at 23:49 +0100, Rafael J. Wysocki wrote:
> > On Thursday, 8 of November 2007, Grant Wilson wrote:
> > > On Thu, 8 Nov 2007 22:42:21 +0100
> > > "Rafael J. Wysocki" <[email protected]> wrote:
> > > 
> > > > On Thursday, 8 of November 2007, Grant Wilson wrote:
> > > > > On Thu, 8 Nov 2007 16:53:10 +0100
> > > > > "Rafael J. Wysocki" <[email protected]> wrote:
> > > > > 
> > > > > > On Thursday, 8 of November 2007, Grant Wilson wrote:
> > > > > > > On Thu, 8 Nov 2007 01:06:21 +0100
> > > > > > > "Rafael J. Wysocki" <[email protected]> wrote:
> > > > > > > 
> > > > > > > > On Monday, 5 of November 2007, Grant Wilson wrote:
> > > > > > > > > Hi,
> > > > > > > > > I got this oops on 2.6.24-rc1-641-gb4f5550:
> > > > > > > > 
> > > > > > > > (1) Is this reproducible?
> > > > > > > > (2) Did it happen previously on your system?
> > > > > > > >
> > > > > > > > [18073.371126] Unable to handle kernel NULL pointer dereference at 0000000000000120 RIP: 
> > > > > > > > [18073.371134]  [<ffffffff8023572e>] check_preempt_wakeup+0x6e/0x110
> > > > > > > 
> > > > > > > This has now happened twice - the second time was last night when
> > > > > > > running 2.6.24-rc2.
> > > > > > > 
> > > > > > > Here's that second occurrence:
> > > > > > > 
> > > > > [snip]
> > > > > > 
> > > > > > Hmm.
> > > > > > 
> > > > > > Please run "gdb vmlinux" and see what code corresponds to
> > > > > > check_preempt_wakeup+0x6e in your kernel.
> > > > >
> > > > > 
> > > > > Dump of assembler code for function check_preempt_wakeup:
> > > > 
> > > > Well, thanks, but I meant the source code.  Please do "gdb vmlinux" and then
> > > > "l *check_preempt_wakeup+0x6e" in gdb.
> > > 
> > > Here's the requested output:
> > > 
> > > (gdb) l *check_preempt_wakeup+0x6e
> > > 0xffffffff802329ae is in check_preempt_wakeup (kernel/sched_fair.c:668).
> > > 663
> > > 664     /* Do the two (enqueued) entities belong to the same group ? */
> > > 665     static inline int
> > > 666     is_same_group(struct sched_entity *se, struct sched_entity *pse)
> > > 667     {
> > > 668             if (se->cfs_rq == pse->cfs_rq)
> > > 669                     return 1;
> > > 670
> > > 671             return 0;
> > > 672     }
> > 
> > Well, it looks like either se or pse is NULL.
> > 
> > Ingo, can you please have a look?
> 
> Most puzzling this, it should be guaranteed that the top sched_entities
> are of the same group, therefore avoiding this loop into NULL. Obviously
> something has gone wrong.
> 
> Grant, is there anything specific you can tell us about how to reproduce
> this?

I'm afraid not.  It has only happened twice and both times I was away
from the box in question at the time of failure, so it wasn't doing a
great deal.

I'm running 2.6.24-rc2 on two boxes and both times it happened on the
box running a quad core processor.

Grant

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- 2.6.24-rc1-gb4f5550 oops
  - From: Grant Wilson <[email protected]>
- Re: 2.6.24-rc1-gb4f5550 oops
  - From: "Rafael J. Wysocki" <[email protected]>
- Re: 2.6.24-rc1-gb4f5550 oops
  - From: Grant Wilson <[email protected]>
- Re: 2.6.24-rc1-gb4f5550 oops
  - From: "Rafael J. Wysocki" <[email protected]>
- Re: 2.6.24-rc1-gb4f5550 oops
  - From: Peter Zijlstra <[email protected]>

Prev by Date: Major mke2fs slowdown (reproducable, bisected)
Next by Date: Re: Major mke2fs slowdown (reproducable, bisected)
Previous by thread: Re: 2.6.24-rc1-gb4f5550 oops
Next by thread: Re: 2.6.24-rc1-gb4f5550 oops
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]