Re: rt20 patch question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 11 May 2006, Mark Hounschell wrote:

>
> Here is a detailed list of the RT tasks running with prios, cpu masks
> etc. There are 3 nics. eth1 is the nic being used by the emulation. eth2
> is currently unused.

>
> pid      SCHED        PRIO      CPUM TASK
> ---      ----         ----      ---- ----

This being a SMP machine, pid 2 and 3 must be the migration threads.

> 2        FIFO         99           1 (unknown)
> 3        FIFO         99           1 (unknown)

> 4        FIFO         1            1 (unknown)
> 5        FIFO         1            1 (unknown)
> 6        FIFO         1            1 (unknown)
> 7        FIFO         1            1 (unknown)
> 8        FIFO         1            1 (unknown)
> 9        FIFO         1            1 (unknown)
> 10       FIFO         1            1 (unknown)

Do you know what these processes are (12 and 13)?

> 12       FIFO         99           2 (unknown)
> 13       FIFO         99           2 (unknown)

[...]

> 39       FIFO  acpi   49 [IRQ 9]   1 (unknown)
> 1129     FIFO  rtc    48 [IRQ 8]   1 (unknown)
> 1135     FIFO  i8042  47 [IRQ 12]  1 (unknown)
> 1145     FIFO  floppy 46 [IRQ 6]   1 (unknown)
> 1178     FIFO  i8042  45 [IRQ 1]   1 (unknown)
> 1268     FIFO  ide0   44 [IRQ 14]  1 (unknown)
> 1313     FIFO  ide1   43 [IRQ 15]  1 (unknown)
>

FYI, The above are all of higher priority than the below.

> 1362     FIFO         42 [IRQ 169] 1 (unknown)
>      ide2, aic7xxx, aic7xxx, eth1, eth2,
>      gpiohsd, gpiohsd, gpiohsd, gpiohsd, eprm

Wow! that's a lot on a shared IRQ.  Do you have the ide2 being used. If
one of these where to spin for a while, then all the below would freeze.
Also them being preempted will also have a problem.  Perhaps you want to
raise the priority of this interrupt thread.

>
> 2663     FIFO ???     41 [IRQ 4]   1 (unknown)
> 2667     FIFO ???     40 [IRQ 3]   1 (unknown)
> 3420     FIFO 82801BA 39 [IRQ 177] 1 (unknown)
> 5788     FIFO eth0    38 [IRQ 185] 1 (unknown)
> 8036     FIFO rtom    37 [IRQ 193] 2 (unknown)
> 10338    FIFO EMU-CPU 33           2 ./vrsx

[...]

> > What seems to be happening is that the vortex_timer is going off while the
> > interrupt is running.  Hence the disable_irq fails and schedules.
> >
> > Perhaps the interrupt thread has been preempted by some high priority task
> > and causes it to lose a connection.
> >
> > Yeah that task output would be helpful to see if you can get it to work.
>
> Ok I have this but it is 2000+ lines. I probably don't want to put it on
> the list. Should I send it to you directly?

Yes please (compress it as well).  With so much shared on an IRQ and you
are disabling it, it might cause some large timeouts. The disable irq with
the hardirqs as threads is a sleep (that's where you hit the bug) where as
otherwise it just spins and waits.  So it can be a timing issue.

Could also you try running the RT kernel without hardirqs as threads to
see if it works fine then?

>
> > Also can you show us the output of /proc/interrupts so we know which
> > threads are associated to the network card interrupt, and see where they
> > are.
> >
>
> harley:/home/markh/work/lcrs-linux # cat /proc/interrupts
>            CPU0       CPU1
>   0:     450333          0  IO-APIC-edge   [........N/  0]  pit
>   1:       4288          0  IO-APIC-edge   [........./  1]  i8042
>   8:          2          0  IO-APIC-edge   [........./  0]  rtc
>   9:          0          0  IO-APIC-level  [........./  0]  acpi
>  12:      66129          0  IO-APIC-edge   [........./  1]  i8042
>  14:       3523          0  IO-APIC-edge   [........./  0]  ide0
>  15:      65675          0  IO-APIC-edge   [........./  0]  ide1
> 169:     219209          0  IO-APIC-level  [........./  0]  ide2,
> aic7xxx, aic7xxx, eth1, eth2, gpiohsd, gpiohsd, gpiohsd, gpiohsd, eprm
> 177:       1821          0  IO-APIC-level  [........./  0]  Intel
> 82801BA-ICH2
> 185:     185550          0  IO-APIC-level  [........./  0]  eth0
> 193:          0      76740  IO-APIC-level  [........./  0]  rtom
> NMI:          0          0
> LOC:    2657906     587751
> ERR:          0
> MIS:          0

I see you are pinning all the irqs to CPU0

>
> The aic7xxx controllers are both connected to external legacy scsi
> racks. eth1, eth2, and the aix7xxx cards are in an SBS pci expansion
> chassis. The 3 gpiohsd and the 1 eprm cards are also in the expansion
> rack but are not being used at all in this.

So all but the 3 gpiohsd and eprm are being used?  Still that seems to be
a lot.  But anyway, send me the compressed task dump, and I'll take a
look.  Maybe it will shed some light.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux