Thanks for help. We do not see the issue if every netserver's priority is set to 19 with setpriority() call.
-----Original Message-----
From: Kallol Biswas
Sent: Tuesday, June 06, 2006 10:56 AM
To: 'Stephen Hemminger'; [email protected]
Cc: 'Mike Galbraith'
Subject: RE: process starvation with 2.6 scheduler
I have verified that the starved tasks are in the runqueue (prio_array_t
array[0], active points to array[0]), the timestamp and last_ran
indicate that they have not run for a while.
The network traffic is of request response type.
Client (on an external box)3 ports ---- 3 cables ----3 ports Emulated Host
The netperf clients run on an external box, the emulated host (ppc440) runs
the servers. A client sends request to a server, the server returns the
reply, then the next request from the client goes to the server. There are 3
clients and 3 servers, one client-server pair for each connection
(3 connections: 3 ports on external box --3 connection
-- 3 ports on emulated host).
Since traffic is of request/response in nature and the packets reach
user space (to netserver) before turning around I do not think slow CPU is an issue.
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Stephen Hemminger
Sent: Tuesday, June 06, 2006 9:56 AM
To: [email protected]
Subject: Re: process starvation with 2.6 scheduler
On Tue, 06 Jun 2006 10:01:58 +0200
Mike Galbraith <[email protected]> wrote:
> (please line wrap)
>
> On Mon, 2006-06-05 at 12:48 -0700, Kallol Biswas wrote:
> > Hello,
> > We have a process starvation problem with our 2.6.11 kernel running on a ppc-440 based system.
> >
> > We have a storage SOC based on PPC-440. The SOC is emulated on a system emulator called Palladium. It is from Cadence. The system runs at 400KHz speed. It has three Ethernet ports; they are connected to outside lab network with a speed bridge.
> >
> > The netperf server netserver runs on the emulated system (2.6.11 kernel on Palladium). There are netperf linux clients running on a x86 box.
> >
> > If netperf request response (TCP_RR) traffic is run on all three ports; after sometime only one port remains active, the application (netperf client) on other two ports wait for a long time and eventually time out.
> >
> > The netserver code has been instrumented. For one of the starved netserver processes it has been found that the TCP_RR request from the netperf client on linux x86 box has been received by the server, it has issued send() call to send back reply but send() never returns.
> >
> > With an ICE connected to the Palladium (emulator) I have dumped the kernel data structures of the starved process and the active process.
> >
> >
> > For Active Process:
> > Time_slice 84
> > Policy : SCHED_NORMAL
> > Dynamic priority: 118
> > Static priority: 120
> > Preempt_count: 0x20100
> > Flags = 0
> > State = 0 (TASK_RUNNING)
> >
> > For Starved Process:
> > Time slice: 77
> > Policy: SCHED_NORMAL
> > Dynamic priority: 120
> > Static priority: 120
> > Preempt_count: 0x10000000 (PREEMPT_ACTIVE is set)
> > Flags = 0
> > State = 0 (TASK_RUNNING)
> >
> > Any help to debug the problem is welcome.
>
> I'm having difficulty understanding. Are you saying that the "starved"
> tasks are runnable, but receiving _zero_ cpu? That's impossible with
> only one other SCHED_NORMAL task afaik, which makes me think you may
> mean they're not receiving cpu frequently enough to keep clients from
> timing out? One task which has slept enough to acquire interactive
> status (as above) can hold others off the cpu for quite a while if it
> starts a burst of heavy cpu burning. If your netperf clients are
> choking on this latency, running the servers at nice 19 should prevent
> the problem.
>
Is the processor getting consumed by network traffic in soft irq?
If you are using non NAPI device driver, then it is easy to get soft irq
overwhelmed with packets.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]