Re: 2.6.17-rc5-mm2 — Linux Kernel

* Jan Beulich <[email protected]> wrote:

> >- Make the code robust and able to detect "unexpected" states at all
> >  points through the process.  If at the end of the process we see that we
> >  have encountered an unexpected state,
> 
> The problem is that the unwind is expected to end with an odd state 
> (i.e. fail), at least until all possible root points of execution 
> (i.e. bottoms of call stacks) have a proper annotation forcing their 
> parent program counter to zero (which I don't expect to happen soon, 
> if ever, because I think this is something difficult to prove). Thus 
> the only reasonable thing to do is to check whether the first level of 
> unwinding failed.

firstly, i'd suggest to use another magic value for 'bottom of call 
stacks' - it is way too common to jump or call a NULL pointer. Something 
like 0xfedcba9876543210 would be better.

> >  - emit a diagnostic so Jan can work out if there's a way to improve
> >    the unwinder in this situation
> 
> >  - do a traditional backtrace as well.
> 
> This might be a config or boot option (and might be forced on for a 
> short while), but I generally don't think this is helpful, given that 
> the entire point of the added logic is to remove (useless) information 
> (even more that if you have to rely on the screen alone, you have to 
> live with its limited size, and pushing out an old-style stack trace 
> after the unwound one would likely make part or all of it as well as 
> the register information disappear).

for the RIP/EIP to get corrupted is a common occurance. So is stack 
corruption. So the fallback mechanism shouldnt be a 'short while' 
side-thought, it must be part of the design.

If we use a much better magic value to delimit the stack then our 
confidence that the stacktrace is correct will be much higher.

In all other cases (if we go outside of the stack page(s)) we _must_ 
fall back to the dump 'scan the stack pages for interesting entries' 
method, to get the information out! "Uh oh the unwind info somehow got 
corrupted, sorry" is not enough to debug a kernel bug.

What we _can_ ignore are stack/register corruptions that accidentally 
cause a valid unwind call chain to be printed. That is impossible to 
detect and it's much less likely than say normal stack or register 
corruption. But we _must not_ ignore clear parsing errors. This is a 
really basic thing ...

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: 2.6.17-rc5-mm2
  - From: Andrew Morton <[email protected]>

References:
- 2.6.17-rc5-mm2
  - From: Andrew Morton <[email protected]>
- Re: 2.6.17-rc5-mm2
  - From: Reuben Farrelly <[email protected]>
- Re: 2.6.17-rc5-mm2
  - From: Andrew Morton <[email protected]>
- Re: 2.6.17-rc5-mm2
  - From: Reuben Farrelly <[email protected]>
- Re: 2.6.17-rc5-mm2
  - From: Ingo Molnar <[email protected]>
- Re: 2.6.17-rc5-mm2
  - From: Ingo Molnar <[email protected]>
- Re: 2.6.17-rc5-mm2
  - From: "Jan Beulich" <[email protected]>
- Re: 2.6.17-rc5-mm2
  - From: "Jan Beulich" <[email protected]>

Prev by Date: Re: 2.6.17-rc5-mm2
Next by Date: Re: [linux-usb-devel] USB devices fail unnecessarily on unpowered hubs
Previous by thread: Re: 2.6.17-rc5-mm2
Next by thread: Re: 2.6.17-rc5-mm2
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]