Re: 2.6.14-rc4-rt7

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Fernando Lopez-Lezcano <[email protected]> wrote:

> Here's one with rc5-rt3:
> 
> Oct 21 15:01:46 cmn3 kernel: BUG: ktimer expired short without user
> signal! (hald-addon-stor:4309)

and no "BUG: foo:1234 waking up bar:4321, expiring ktimer short" message 
prior to that? Very weird, this line:

> Oct 21 15:01:46 cmn3 kernel: .. expires:   1012/751245500
> Oct 21 15:01:46 cmn3 kernel: .. expired:   1012/750908115
> Oct 21 15:01:46 cmn3 kernel: .. at line:   942

suggests that the ktimer was expired by ktimer_try_to_cancel() / 
ktimer_cancel(), in ktimer_schedule(). I.e. something must have woken 
the task early. Probably this theory of mine is incorrect then. I'll try 
extend the debug info a bit: it would be interesting to see a 'timer 
inserted at' timestamp as well (was it shortly before the problem 
happened?), and a 'which PID cancelled the timer' info.

a heavy-hitting but complex-to-set-up solution would be to add a serial 
console, and to enable WAKEUP_TIMING+LATENCY_TRACING in the .config, and 
to edit kernel/latency.c to initialize the default value of the 
following variables:

int wakeup_timing = 0;
int trace_all_cpus = 1;
int trace_freerunning = 1;
int trace_print_at_crash = 1;
int trace_user_triggered = 1;

these variables are in the top portion of latency.c. Important: if you 
try this then you should probably also enable IGNORE_PRINTK_LOGLEVEL, 
which will improve mass-output to the serial console. Another important 
thing is to add a stop_trace() call to kernel/ktimers.c's 
check_ktimer_signal() function:

        unlock_ktimer_base(timer, &flags);

        stop_trace();
        printk("BUG: ktimer expired short without user signal! (%s:%d)\n",
                current->comm, current->pid);

(otherwise all the trace output you'd be getting would be boring printk 
related trace entries.)

this will cause the dump_stack() to also output thousands of trace 
entries - all the kernel activity (from all CPUs) that preceded the 
ktimer problem. Hopefully this pinpoints the bug.

> In both cases the machine goes catatonic, I don't know if right after 
> this or not. It responds to the SysRQ key but that's pretty much it, I 
> should probably try to get a serial console going somehow.

would it be easy for you to try the UP kernel? One possibility is that 
this is some sort of SMP/APIC-timer related problem.

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux