RE: process starvation with 2.6 scheduler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



More information:
Turning on CONFIG_SCHEDSTAT I have got more information. Next I will try lowering the nice value of the servers.

Starved Process:
Sched_info->pcnt 33
            Cpu_time 64
            Run_delay 113
            Last_arrival 0xffc4a89

Active Process:
Sched_info->pcnt 238
             Cpu_time 2852
             Run_delay 190
             Last arrival 0xfffc4aa5

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Kallol Biswas
Sent: Tuesday, June 06, 2006 10:56 AM
To: Stephen Hemminger; [email protected]
Cc: Mike Galbraith
Subject: RE: process starvation with 2.6 scheduler


I have verified that the starved tasks are in the runqueue (prio_array_t 
array[0], active points to array[0]), the timestamp and last_ran 
indicate that they have not run for a while.

The network traffic is of request response type.

Client (on an external box)3 ports ---- 3 cables ----3 ports Emulated Host
 
The netperf clients run on an external box, the emulated host (ppc440) runs 
the servers. A client sends request to a server, the server returns the 
reply, then the next request from the client goes to the server. There are 3
clients and 3 servers, one client-server pair for each connection 
(3 connections: 3 ports on external box  --3 connection 
 -- 3 ports on emulated host).

Since traffic is of request/response in nature and the packets reach
user space (to netserver) before turning around I do not think slow CPU is an issue.

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Stephen Hemminger
Sent: Tuesday, June 06, 2006 9:56 AM
To: [email protected]
Subject: Re: process starvation with 2.6 scheduler

On Tue, 06 Jun 2006 10:01:58 +0200
Mike Galbraith <[email protected]> wrote:

> (please line wrap)
> 
> On Mon, 2006-06-05 at 12:48 -0700, Kallol Biswas wrote:
> > Hello,
> >        We have a process starvation problem with our 2.6.11 kernel running on a ppc-440 based system.
> > 
> > We have a storage SOC based on PPC-440. The SOC is emulated on a system emulator called Palladium. It is from Cadence. The system runs at 400KHz speed. It has three Ethernet ports; they are connected to outside lab network with a speed bridge.
> > 
> > The netperf server netserver runs on the emulated system (2.6.11 kernel on Palladium). There are netperf linux clients running on a x86 box.
> > 
> > If netperf request response (TCP_RR) traffic is run on all three ports; after sometime only one port remains active, the application (netperf client) on other two ports wait for a long time and eventually time out.
> > 
> > The netserver code has been instrumented. For one of the starved netserver processes it has been found that the TCP_RR request from the netperf client on linux x86 box has been received by the server, it has issued send() call to send back reply but send() never returns.
> > 
> > With an ICE connected to the Palladium (emulator) I have dumped the kernel data structures of the starved process and the active process. 
> > 
> > 
> > For Active  Process:
> >   Time_slice 84
> >   Policy : SCHED_NORMAL
> >   Dynamic priority: 118
> >   Static priority: 120
> >   Preempt_count: 0x20100
> >   Flags = 0
> >   State = 0 (TASK_RUNNING)
> > 
> > For Starved Process:
> >   Time slice: 77
> >   Policy: SCHED_NORMAL
> >   Dynamic priority: 120
> >   Static priority: 120
> >   Preempt_count: 0x10000000 (PREEMPT_ACTIVE is set)
> >   Flags = 0 
> >   State = 0 (TASK_RUNNING)
> > 
> > Any help to debug the problem is welcome. 
> 
> I'm having difficulty understanding.  Are you saying that the "starved"
> tasks are runnable, but receiving _zero_ cpu?  That's impossible with
> only one other SCHED_NORMAL task afaik, which makes me think you may
> mean they're not receiving cpu frequently enough to keep clients from
> timing out?  One task which has slept enough to acquire interactive
> status (as above) can hold others off the cpu for quite a while if it
> starts a burst of heavy cpu burning.  If your netperf clients are
> choking on this latency, running the servers at nice 19 should prevent
> the problem.
> 


Is the processor getting consumed by network traffic in soft irq?
If you are using non NAPI device driver, then it is easy to get soft irq
overwhelmed with packets.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux