Re: rt20 patch question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Steven Rostedt wrote:
> On Thu, 11 May 2006, Mark Hounschell wrote:
> 
>> Here is a detailed list of the RT tasks running with prios, cpu masks
>> etc. There are 3 nics. eth1 is the nic being used by the emulation. eth2
>> is currently unused.
> 
>> pid      SCHED        PRIO      CPUM TASK
>> ---      ----         ----      ---- ----
> 
> This being a SMP machine, pid 2 and 3 must be the migration threads.
> 
>> 2        FIFO         99           1 (unknown)
>> 3        FIFO         99           1 (unknown)
> 
>> 4        FIFO         1            1 (unknown)
>> 5        FIFO         1            1 (unknown)
>> 6        FIFO         1            1 (unknown)
>> 7        FIFO         1            1 (unknown)
>> 8        FIFO         1            1 (unknown)
>> 9        FIFO         1            1 (unknown)
>> 10       FIFO         1            1 (unknown)
> 
> Do you know what these processes are (12 and 13)?
> 

    2 ?        S      0:00 [migration/0]
    3 ?        S      0:00 [posix_cpu_timer]
.
.
   14 ?        S      0:00 [migration/1]
   15 ?        S      0:00 [posix_cpu_timer]

>> 12       FIFO         99           2 (unknown)
>> 13       FIFO         99           2 (unknown)
> 
> [...]
> 
>> 39       FIFO  acpi   49 [IRQ 9]   1 (unknown)
>> 1129     FIFO  rtc    48 [IRQ 8]   1 (unknown)
>> 1135     FIFO  i8042  47 [IRQ 12]  1 (unknown)
>> 1145     FIFO  floppy 46 [IRQ 6]   1 (unknown)
>> 1178     FIFO  i8042  45 [IRQ 1]   1 (unknown)
>> 1268     FIFO  ide0   44 [IRQ 14]  1 (unknown)
>> 1313     FIFO  ide1   43 [IRQ 15]  1 (unknown)
>>
> 
> FYI, The above are all of higher priority than the below.
> 
>> 1362     FIFO         42 [IRQ 169] 1 (unknown)
>>      ide2, aic7xxx, aic7xxx, eth1, eth2,
>>      gpiohsd, gpiohsd, gpiohsd, gpiohsd, eprm
> 
> Wow! that's a lot on a shared IRQ.  Do you have the ide2 being used. If
> one of these where to spin for a while, then all the below would freeze.
> Also them being preempted will also have a problem.  Perhaps you want to
> raise the priority of this interrupt thread.
> 
I'll try that when I get to work this morning.

>> 2663     FIFO ???     41 [IRQ 4]   1 (unknown)
>> 2667     FIFO ???     40 [IRQ 3]   1 (unknown)
>> 3420     FIFO 82801BA 39 [IRQ 177] 1 (unknown)
>> 5788     FIFO eth0    38 [IRQ 185] 1 (unknown)
>> 8036     FIFO rtom    37 [IRQ 193] 2 (unknown)
>> 10338    FIFO EMU-CPU 33           2 ./vrsx
> 
> [...]
> 
>>> What seems to be happening is that the vortex_timer is going off while the
>>> interrupt is running.  Hence the disable_irq fails and schedules.
>>>
>>> Perhaps the interrupt thread has been preempted by some high priority task
>>> and causes it to lose a connection.
>>>
>>> Yeah that task output would be helpful to see if you can get it to work.
>> Ok I have this but it is 2000+ lines. I probably don't want to put it on
>> the list. Should I send it to you directly?
> 

Done. Keep in mind it was taken only after one of those BUGs that seemed
to cause a network connection loss into the emulation. It was not taken
after one of those "stops" in 'complete preempt' mode. Did the logdev
output show anything of interest concerning the "stops"?

> Yes please (compress it as well).  With so much shared on an IRQ and you
> are disabling it, it might cause some large timeouts. The disable irq with
> the hardirqs as threads is a sleep (that's where you hit the bug) where as
> otherwise it just spins and waits.  So it can be a timing issue.
> 
> Could also you try running the RT kernel without hardirqs as threads to
> see if it works fine then?
> 

I assume you mean in preemptable kernel mode. Will do asap. I have 4
machines I'm attempting to use the rt20 kernel on. All 4 have the
"stop/pause" problem in complete preempt mode. This is the only one of
the 4 that I have seen the BUGs message on so I suspect you are correct
that it may be a result of hardware/configuration. I guess if raising
that irq prio or eliminating the irq threads fixes it, then.....

>>> Also can you show us the output of /proc/interrupts so we know which
>>> threads are associated to the network card interrupt, and see where they
>>> are.
>>>
>> harley:/home/markh/work/lcrs-linux # cat /proc/interrupts
>>            CPU0       CPU1
>>   0:     450333          0  IO-APIC-edge   [........N/  0]  pit
>>   1:       4288          0  IO-APIC-edge   [........./  1]  i8042
>>   8:          2          0  IO-APIC-edge   [........./  0]  rtc
>>   9:          0          0  IO-APIC-level  [........./  0]  acpi
>>  12:      66129          0  IO-APIC-edge   [........./  1]  i8042
>>  14:       3523          0  IO-APIC-edge   [........./  0]  ide0
>>  15:      65675          0  IO-APIC-edge   [........./  0]  ide1
>> 169:     219209          0  IO-APIC-level  [........./  0]  ide2,
>> aic7xxx, aic7xxx, eth1, eth2, gpiohsd, gpiohsd, gpiohsd, gpiohsd, eprm
>> 177:       1821          0  IO-APIC-level  [........./  0]  Intel
>> 82801BA-ICH2
>> 185:     185550          0  IO-APIC-level  [........./  0]  eth0
>> 193:          0      76740  IO-APIC-level  [........./  0]  rtom
>> NMI:          0          0
>> LOC:    2657906     587751
>> ERR:          0
>> MIS:          0
> 
> I see you are pinning all the irqs to CPU0
> 

All except the rtom irq. It is on the same processor as the emulations
CPU thread. Even with the rt20 patch, this is still the only way to
insure deterministic delivery of signals and such from the rtom driver
to the emulations CPU thread. What I find with the rt20 patch (so far)
is that now there seems to be an acceptable(sort of) "max" latency
(-200usec) that allows me to use the machine for things other than just
the emulation.

>> The aic7xxx controllers are both connected to external legacy scsi
>> racks. eth1, eth2, and the aix7xxx cards are in an SBS pci expansion
>> chassis. The 3 gpiohsd and the 1 eprm cards are also in the expansion
>> rack but are not being used at all in this.
> 
> So all but the 3 gpiohsd and eprm are being used?  Still that seems to be
> a lot.  But anyway, send me the compressed task dump, and I'll take a
> look.  Maybe it will shed some light.

Actually eth2 is not being used either. Only one of the scsi controllers
is being used and it only when I boot the emulation from one of the
legacy scsi drives. One scsi card has a bus of tapes the other disks.
When I boot the emulation from a virtual disk file it's not used at all.
IDE2 is my 'linux' boot drive however.

So do you think this BUG reported in 'preempt kernel' mode is related to
the "stops" I am having in 'complete preempt mode?

Mark
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux