Re: Simple script that locks up my box with recent kernels

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ok, some preliminary results on this before I go get some sleep + a
working day tomorrow...


On 07/10/06, Linus Torvalds <[email protected]> wrote:


On Sat, 7 Oct 2006, Jesper Juhl wrote:
>
> > Can I bother you to just bisect it?
>
> Sure, but it will take a little while since building + booting +
> starting the test + waiting for the lockup takes a fair bit of time
> for each kernel

Sure. That said, we've tried to narrow down things that took hours or days
(under real loads, not some nice test-script) to reproduce, and while it
doesn't always work, the real problem tends to be if the problem case
isn't really reproducible. It sounds like yours is pretty clear-cut, and
that will make things much easier.


Yeah, it seems pretty clear-cut, but I'm a bit nervous that it may
sometimes take longer than my observed 60min to reproduce, rendering
my git-bisection less than perfect (more on that below).


> and also due to the fact that my git skills are pretty
> limited, but I'll figure it out (need to improve those git skills
> anyway) :-)

"git bisect" in particular isn't that hard to use, and it will really do
a lot of heavy lifting for you.

(...)
Thanks a lot for the tutorial, that really helped.

For some reason I couldn't get git to accept 2.6.17.13 as a "good"
starting point, so I used 2.6.17 instead, and the sha1 you gave me for
2.6.18-git15 as the "bad" starting point.

Here's where I am right now (a log of what I've done) :

[bisection start]

Bisecting: 5188 revisions left to test after this
[92164c5dd1ade33f4e90b72e407910de6694de49] USB: OHCI hub code unaligned access

[git bisect good]

Bisecting: 2567 revisions left to test after this
[e41542f5167d6b506607f8dd111fa0a3e468ccb8] [DCCP]: Introduce dccp_probe

[git bisect good]

Bisecting: 1351 revisions left to test after this
[b98adfccdf5f8dd34ae56a2d5adbe2c030bd4674] Merge
master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6

[git bisect good]

Bisecting: 635 revisions left to test after this
[538d9d532b0e0320c9dd326a560b5a72d73f910d] irq: remove a extra line

[git bisect good]

Bisecting: 292 revisions left to test after this
[db1a19b38f3a85f475b4ad716c71be133d8ca48e] Merge branch
'intelfb-patches' of
master.kernel.org:/pub/scm/linux/kernel/git/airlied/intelfb-2.6

[git bisect bad]

Bisecting: 146 revisions left to test after this
[1db27c11e9a0c6d659040ac0b7c64a339e248fa1] istallion: Remove private
baud rate decoding, which is also broken in this case on some
platforms

[git bisect bad]

Bisecting: 73 revisions left to test after this
[3171a0305d62e6627a24bff35af4f997e4988a80] simplify update_times
(avoid jiffies/jiffies_64 aliasing problem)

[git bisect good]

Bisecting: 37 revisions left to test after this
[29b884921634e1e01cbd276e1c9b8fc07a7e4a90] set EXIT_DEAD state in
do_exit(), not in schedule()

[currently testing this kernel]


Looking at "git bisect visualize" the current status is this :

bisect/good: 3171a0305d62e6627a24bff35af4f997e4988a80
bisect/bad: 1db27c11e9a0c6d659040ac0b7c64a339e248fa1
Current bisect marker at: 29b884921634e1e01cbd276e1c9b8fc07a7e4a90


I'm a little worried though that my results may not be completely reliable.

There's no doubt that you can trust the kernels that I told git were
"bad" since those resultet in a hang and there's just no getting
around that. So we know for a fact that the bad commit is somewhere
between my last found bad kernel and 2.6.17, what we don't know with
the same amount of certainty is if the bad commit is between my last
found good kernel and the last found bad one.

What I'm worried about is the kernels I've marked as "good".  Before
starting this run I had never experienced a hang if the kernel
survived past the one hour mark, so I concluded that testing each
kernel for 80min would be enough to prove it good or bad. This now
seems to be not completely reliable since my second bad kernel
happened to hang after ~2hrs. This happened since I forgot to check my
computer after 80min and only came back to it some 3hrs later (I know
the time it hung since I had a xterm doing   while true;do sleep
10;uptime;done  running, so I could check.

This all means that my testing and concluding kernels were "good"
after 80min of test runtime may not be 100% reliable.

Is it useful for me to continue bisecting from the point I'm at, or
should I reset from good==2.6.17 and bad==the_last_bad_commit_I_found
?   Or do you have a likely culprit I should try revoking?

Whatever your answer it'll have to wait until tomorrow evening since
I'm going to go get some sleep now, but please let me know what you'd
like me to do ...


--
Jesper Juhl <[email protected]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux