It seems that with the priority set to 19 the netserver processes do not starve but still we have unfair scheduling issue. The netperf clients do not timeout now but one of the servers runs much less than the other. It seems that thorough understanding of scheduling algorithm is essential at this point.
-----Original Message-----
From: Kallol Biswas
Sent: Tuesday, June 06, 2006 2:58 PM
To: Kallol Biswas; 'Stephen Hemminger'; '[email protected]'
Cc: 'Mike Galbraith'
Subject: RE: process starvation with 2.6 scheduler
Thanks for help. We do not see the issue if every netserver's priority is set to 19 with setpriority() call.
-----Original Message-----
From: Kallol Biswas
Sent: Tuesday, June 06, 2006 10:56 AM
To: 'Stephen Hemminger'; [email protected]
Cc: 'Mike Galbraith'
Subject: RE: process starvation with 2.6 scheduler
I have verified that the starved tasks are in the runqueue (prio_array_t
array[0], active points to array[0]), the timestamp and last_ran
indicate that they have not run for a while.
The network traffic is of request response type.
Client (on an external box)3 ports ---- 3 cables ----3 ports Emulated Host
The netperf clients run on an external box, the emulated host (ppc440) runs
the servers. A client sends request to a server, the server returns the
reply, then the next request from the client goes to the server. There are 3
clients and 3 servers, one client-server pair for each connection
(3 connections: 3 ports on external box --3 connection
-- 3 ports on emulated host).
Since traffic is of request/response in nature and the packets reach
user space (to netserver) before turning around I do not think slow CPU is an issue.
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Stephen Hemminger
Sent: Tuesday, June 06, 2006 9:56 AM
To: [email protected]
Subject: Re: process starvation with 2.6 scheduler
On Tue, 06 Jun 2006 10:01:58 +0200
Mike Galbraith <[email protected]> wrote:
> (please line wrap)
>
> On Mon, 2006-06-05 at 12:48 -0700, Kallol Biswas wrote:
> > Hello,
> > We have a process starvation problem with our 2.6.11 kernel running on a ppc-440 based system.
> >
> > We have a storage SOC based on PPC-440. The SOC is emulated on a system emulator called Palladium. It is from Cadence. The system runs at 400KHz speed. It has three Ethernet ports; they are connected to outside lab network with a speed bridge.
> >
> > The netperf server netserver runs on the emulated system (2.6.11 kernel on Palladium). There are netperf linux clients running on a x86 box.
> >
> > If netperf request response (TCP_RR) traffic is run on all three ports; after sometime only one port remains active, the application (netperf client) on other two ports wait for a long time and eventually time out.
> >
> > The netserver code has been instrumented. For one of the starved netserver processes it has been found that the TCP_RR request from the netperf client on linux x86 box has been received by the server, it has issued send() call to send back reply but send() never returns.
> >
> > With an ICE connected to the Palladium (emulator) I have dumped the kernel data structures of the starved process and the active process.
> >
> >
> > For Active Process:
> > Time_slice 84
> > Policy : SCHED_NORMAL
> > Dynamic priority: 118
> > Static priority: 120
> > Preempt_count: 0x20100
> > Flags = 0
> > State = 0 (TASK_RUNNING)
> >
> > For Starved Process:
> > Time slice: 77
> > Policy: SCHED_NORMAL
> > Dynamic priority: 120
> > Static priority: 120
> > Preempt_count: 0x10000000 (PREEMPT_ACTIVE is set)
> > Flags = 0
> > State = 0 (TASK_RUNNING)
> >
> > Any help to debug the problem is welcome.
>
> I'm having difficulty understanding. Are you saying that the "starved"
> tasks are runnable, but receiving _zero_ cpu? That's impossible with
> only one other SCHED_NORMAL task afaik, which makes me think you may
> mean they're not receiving cpu frequently enough to keep clients from
> timing out? One task which has slept enough to acquire interactive
> status (as above) can hold others off the cpu for quite a while if it
> starts a burst of heavy cpu burning. If your netperf clients are
> choking on this latency, running the servers at nice 19 should prevent
> the problem.
>
Is the processor getting consumed by network traffic in soft irq?
If you are using non NAPI device driver, then it is easy to get soft irq
overwhelmed with packets.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]