Re: 2.6.17-rc2-mm1 — Linux Kernel

Andy Whitcroft wrote:
> Andi Kleen wrote:
> 
>>On Wednesday 03 May 2006 08:47, Jan Beulich wrote:
>>
>>
>>>>>>Andi Kleen <[email protected]> 02.05.06 22:09 >>>
>>>>
>>>>On Tuesday 02 May 2006 22:00, Martin Bligh wrote:
>>>>
>>>>
>>>>
>>>>>>Index: linux/arch/x86_64/kernel/traps.c
>>>>>>===================================================================
>>>>>>--- linux.orig/arch/x86_64/kernel/traps.c
>>>>>>+++ linux/arch/x86_64/kernel/traps.c
>>>>>>@@ -238,6 +238,7 @@ void show_trace(unsigned long *stack)
>>>>>>			HANDLE_STACK (stack < estack_end);
>>>>>>			i += printk(" <EOE>");
>>>>>>			stack = (unsigned long *) estack_end[-2];
>>>>>>+			printk("new stack %lx (%lx %lx %lx %lx %lx)\n", stack, estack_end[0], estack_end[-1],
>>>
>>>estack_end[-2], estack_end[-3], estack_end[-4]);
>>>
>>>
>>>>>>			continue;
>>>>>>		}
>>>>>>		if (irqstack_end) {
>>>>>
>>>>>Thanks for running this Andy:
>>>>>
>>>>>http://test.kernel.org/abat/30183/debug/console.log 
>>>>
>>>>
>>>><EOE>new stack 0 (0 0 0 10082 10)
>>>
>>>Looks like <rubbish> <SS> <RSP> <RFLAGS> <CS> to me, ...
>>
>>
>>Hmm, right.
>> 
>>
>>
>>>>Hmm weird. There isn't anything resembling an exception frame at the top of the
>>>>stack.  No idea how this could happen.
>>>
>>>... which is a valid frame where the stack pointer was corrupted before the exception occurred. One more printed item
>>>(or rather, starting items at estack_end[-1]) would allow at least seeing what RIP this came from.
>>
>>
>>Any can you add that please and check? 
> 
> 
> Ok.  Just got some results (in full at the end of the message).  Seems
> that this is indeed a stack frame:
> 
> 	new stack 0 (0 0 10046 10 ffffffff8047c8e8)
> 
> And if my reading of the System.map is right, this is _just_ in schedule.
> 
> ffffffff8047c17e T sha_init
> ffffffff8047c1a8 T __sched_text_start
> ffffffff8047c1a8 T schedule
> ffffffff8047c8ed T thread_return
> ffffffff8047c9be T wait_for_completion
> ffffffff8047caa8 T wait_for_completion_timeout
> 
> By the looks of it that would make it here, at the call __switch_to?
> Which of course makes loads of sense _if_ the loaded stack pointer was
> crap say 0.
> 
> #define switch_to(prev,next,last) \
>         asm volatile(SAVE_CONTEXT     \
>                      "movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP
> */   \
>                      "movq %P[threadrsp](%[next]),%%rsp\n\t" /* restore
> RSP */   \
>                      "call __switch_to\n\t"   \
>                      ".globl thread_return\n" \
>                      "thread_return:\n\t"
> 
> I'll go shove some debug in there and see what pops out.


Ok.  I've been playing with this some.  Basically when we pick up the
new process to schedule it has a 0 rsp.  Dumping out the comm and flags
both reveal 0's throughout.  I tried another run poisoning the flags
field when freeing a task but the flags remain 0.

Anyone got any good ideas for patches to blame?

-apw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- Re: 2.6.17-rc2-mm1
  - From: Andi Kleen <[email protected]>
- Re: 2.6.17-rc2-mm1
  - From: "Jan Beulich" <[email protected]>
- Re: 2.6.17-rc2-mm1
  - From: Andi Kleen <[email protected]>
- Re: 2.6.17-rc2-mm1
  - From: Andy Whitcroft <[email protected]>

Prev by Date: RE: [PATCH] Fix CONFIG_PRINTK_TIME hangs on some systems
Next by Date: Re: [RFC] Advanced XIP File System
Previous by thread: Re: 2.6.17-rc2-mm1
Next by thread: Re: 2.6.17-rc2-mm1
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]