* William Weston <[email protected]> wrote: > OK. Running on -50-25 now. The burnP6 starvation doesn't seem to > affect the whole system, but comes close enough to require the reset > button every time. I usually, but not always, lose network, X, the > keyboard, mouse, and serial console. I'm still unable to get any sort > of a trace from these lockups, since it's looking more like a bunch of > processes starving than a kernel crash or a full lockup. > > Once, with VLC (viewing a 5mbit/s mcast/udp stream) and two burnP6 > instances running, I was able to fire up top on the serial console and > found out that the IRQ thread for the ns83820 nic was using 99% of one > CPU. aha! that's an important clue. It seems you've got a screaming interrupt or some other loop in ns83820 irq handling. Could you do this: chrt -o 0 -p `pidof 'IRQ 18'` (assuming your ns83820 device is still on IRQ18) To check the command was effective, enter the following command: chrt -o -p `pidof 'IRQ 18'` and you should see output like: pid 748's current scheduling policy: SCHED_OTHER pid 748's current scheduling priority: 0 i.e. the thread is not SCHED_FIFO anymore. this will not fix the ns83820 problem for you, but will make it more debuggable - you will still probably lose networking, but keyboard and the local console should work fine. You should see a 99% CPU-time looping ns83820 IRQ thread when the condition triggers. To debug the condition further, could you do something like: vmstat 1 what kind of interrupt rate ('in' field) does it report? If it's very high then it's a screaming interrupt, if it's low then the IRQ thread is looping for some other reason. (but both would be bugs of the -RT kernel.) also, could you try to get a trace of what the IRQ thread is doing. I've attached trace-it.c, just run it as root (on a LATENCY_TRACING-enabled -RT kernel) to get a finegrained trace of what's going on in the system. Whenever the thread-spinning occurs, just run this utility: trace-it > trace.txt and you should get a couple of milliseconds worth of system activity. The trace output should look like a really long latency-trace. (The latency traces usually compress really well with bzip2 -9, so you can attach it to public replies too, if compressed - that way others can have a look too.) > Once, with a normal desktop load and a yum update, this came across on > the serial console: > > cat/2100[CPU#1]: BUG in update_out_trace at kernel/latency.c:587 on SMP this could occur if the TSCs of different CPUs are too apart from each other. I'll probably put an automatic check for this into the /proc/latency_trace code. Ingo
/* * Copyright (C) 1999, Ingo Molnar <[email protected]> */ #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <signal.h> #include <sys/wait.h> #include <linux/unistd.h> int main (int argc, char **argv) { int ret; if (getuid() != 0) { fprintf(stderr, "needs to run as root.\n"); exit(1); } ret = system("cat /proc/sys/kernel/mcount_enabled >/dev/null 2>/dev/null"); if (ret) { fprintf(stderr, "CONFIG_LATENCY_TRACING not enabled?\n"); exit(1); } system("echo 1 > /proc/sys/kernel/trace_all_cpus"); system("echo 1 > /proc/sys/kernel/trace_enabled"); system("echo 0 > /proc/sys/kernel/trace_freerunning"); system("echo 0 > /proc/sys/kernel/trace_print_at_crash"); system("echo 1 > /proc/sys/kernel/trace_user_triggered"); system("echo 0 > /proc/sys/kernel/trace_verbose"); system("echo 0 > /proc/sys/kernel/preempt_max_latency"); system("echo 0 > /proc/sys/kernel/preempt_thresh"); system("[ -e /proc/sys/kernel/wakeup_timing ] && echo 0 > /proc/sys/kernel/wakeup_timing"); system("echo 1 > /proc/sys/kernel/mcount_enabled"); gettimeofday(0, 1); usleep(100000); gettimeofday(0, 0); system("cat /proc/latency_trace"); return 0; }
Attachment:
trace-it
Description: Binary data
- Follow-Ups:
- Re: [patch] Real-Time Preemption, -RT-2.6.12-rc6-V0.7.48-00
- From: Ingo Molnar <[email protected]>
- Re: [patch] Real-Time Preemption, -RT-2.6.12-rc6-V0.7.48-00
- References:
- Re: [patch] Real-Time Preemption, -RT-2.6.12-rc6-V0.7.48-00
- From: William Weston <[email protected]>
- Re: [patch] Real-Time Preemption, -RT-2.6.12-rc6-V0.7.48-00
- From: Ingo Molnar <[email protected]>
- Re: [patch] Real-Time Preemption, -RT-2.6.12-rc6-V0.7.48-00
- From: William Weston <[email protected]>
- Re: [patch] Real-Time Preemption, -RT-2.6.12-rc6-V0.7.48-00
- From: William Weston <[email protected]>
- Re: [patch] Real-Time Preemption, -RT-2.6.12-rc6-V0.7.48-00
- From: Ingo Molnar <[email protected]>
- Re: [patch] Real-Time Preemption, -RT-2.6.12-rc6-V0.7.48-00
- From: William Weston <[email protected]>
- Re: [patch] Real-Time Preemption, -RT-2.6.12-rc6-V0.7.48-00
- From: Ingo Molnar <[email protected]>
- Re: [patch] Real-Time Preemption, -RT-2.6.12-rc6-V0.7.48-00
- From: William Weston <[email protected]>
- Re: [patch] Real-Time Preemption, -RT-2.6.12-rc6-V0.7.48-00
- From: Ingo Molnar <[email protected]>
- Re: [patch] Real-Time Preemption, -RT-2.6.12-rc6-V0.7.48-00
- From: William Weston <[email protected]>
- Re: [patch] Real-Time Preemption, -RT-2.6.12-rc6-V0.7.48-00
- Prev by Date: Re: [PATCH] ppc/ppc64: Fix pci mmap via sysfs
- Next by Date: Re: reiser4 merging action list
- Previous by thread: Re: [patch] Real-Time Preemption, -RT-2.6.12-rc6-V0.7.48-00
- Next by thread: Re: [patch] Real-Time Preemption, -RT-2.6.12-rc6-V0.7.48-00
- Index(es):