Re: process starvation with 2.6 scheduler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 06 Jun 2006 10:01:58 +0200
Mike Galbraith <[email protected]> wrote:

> (please line wrap)
> 
> On Mon, 2006-06-05 at 12:48 -0700, Kallol Biswas wrote:
> > Hello,
> >        We have a process starvation problem with our 2.6.11 kernel running on a ppc-440 based system.
> > 
> > We have a storage SOC based on PPC-440. The SOC is emulated on a system emulator called Palladium. It is from Cadence. The system runs at 400KHz speed. It has three Ethernet ports; they are connected to outside lab network with a speed bridge.
> > 
> > The netperf server netserver runs on the emulated system (2.6.11 kernel on Palladium). There are netperf linux clients running on a x86 box.
> > 
> > If netperf request response (TCP_RR) traffic is run on all three ports; after sometime only one port remains active, the application (netperf client) on other two ports wait for a long time and eventually time out.
> > 
> > The netserver code has been instrumented. For one of the starved netserver processes it has been found that the TCP_RR request from the netperf client on linux x86 box has been received by the server, it has issued send() call to send back reply but send() never returns.
> > 
> > With an ICE connected to the Palladium (emulator) I have dumped the kernel data structures of the starved process and the active process. 
> > 
> > 
> > For Active  Process:
> >   Time_slice 84
> >   Policy : SCHED_NORMAL
> >   Dynamic priority: 118
> >   Static priority: 120
> >   Preempt_count: 0x20100
> >   Flags = 0
> >   State = 0 (TASK_RUNNING)
> > 
> > For Starved Process:
> >   Time slice: 77
> >   Policy: SCHED_NORMAL
> >   Dynamic priority: 120
> >   Static priority: 120
> >   Preempt_count: 0x10000000 (PREEMPT_ACTIVE is set)
> >   Flags = 0 
> >   State = 0 (TASK_RUNNING)
> > 
> > Any help to debug the problem is welcome. 
> 
> I'm having difficulty understanding.  Are you saying that the "starved"
> tasks are runnable, but receiving _zero_ cpu?  That's impossible with
> only one other SCHED_NORMAL task afaik, which makes me think you may
> mean they're not receiving cpu frequently enough to keep clients from
> timing out?  One task which has slept enough to acquire interactive
> status (as above) can hold others off the cpu for quite a while if it
> starts a burst of heavy cpu burning.  If your netperf clients are
> choking on this latency, running the servers at nice 19 should prevent
> the problem.
> 


Is the processor getting consumed by network traffic in soft irq?
If you are using non NAPI device driver, then it is easy to get soft irq
overwhelmed with packets.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux