At Mon, 16 Apr 2007 11:16:24 +0900,
Satoru Takeuchi wrote:
>
> At Sat, 14 Apr 2007 01:31:12 +1000,
> Con Kolivas wrote:
> >
> > On Monday 09 April 2007 16:09, Andrew Morton wrote:
> > > On Sat, 07 Apr 2007 16:31:39 +0900 Satoru Takeuchi
> > <[email protected]> wrote:
> > > > When I was examining the following program ...
> > > >
> > > > 1. There are a large amount of small jobs takes several msecs,
> > > > and the number of job increases constantly.
> > > > 2. The process creates a thread or a process per job (I examined both
> > > > the thread model and the process model).
> > > > 3. Each child process/thread does the assigned job and exit
> > > > immediately.
> > > >
> > > > ... I found that the thread model's latency is longer than proess
> > > > model's one against my expectation. It's because of the current
> > > > sched_fork()/sched_exit() implementation as follows:
> > > >
> > > > a) On sched_fork, the creator share its timeslice with new process.
> > > > b) On sched_exit, if the exiting process didn't exhaust its first
> > > > timeslice yet, it gives its timeslice to the parent.
> > > >
> > > > It has no problem on the process model since the creator is the parent.
> > > > However, on the thread model, the creator is not the parent, it is same
> > > > as the creator's parent. Hence, on this kind of program, the creator
> > > > can't retrieve shared timeslice and exausts its timeslice at a rate of
> > > > knots. In addition, somehow, the parent (typically shell?) gets extra
> > > > timeslice.
> > > >
> > > > I believe it's a bug and the exiting process should give its timeslice
> > > > to the creator. Now I have some patch plan to fix this problem as follow:
> > > >
> > > > a) Add the field for the creator to task_struct. It needs extra memory.
> > > > b) Doesn't add extra field and have thread's parent the creater, which
> > > > is same as process creation. However it has many side effects, for
> > > > example, we also need to change sys_getppid() implementation.
> > > >
> > > > What do you think? Any comments are welcome.
> > >
> > > This comes at an awkward time, because we might well merge the
> > > staircase/deadline work into 2.6.22, and I think it rewrites the part of
> > > the scheduler which is causing the problems you're observing.
> > >
> > > Has anyone verified that SD fixes this problem and the one at
> > > http://lkml.org/lkml/2007/4/7/21 ?
> >
> > No, SD does nothing different in this regard. There is no notion of who made
> > the thread and who the remaining timeslice should go to. As you say, some
> > decision on sched_exit should probably be used to decide where to send it. In
> > the absence of yet more fields in task_struct, the easiest thing to do would
> > be to check if the the thread id is equal to the pid and if not, send it to
> > the pid (or any parent of that if it no longer exists). Whether it's worth
> > adding yet another field to task_struct to track this or not I do not have
> > enough information to make an informed choice in this regard. Either way I'm
> > low on patch-creation cycles so please feel free to provide your solution.
>
> I wrote the patch fixing this problem. It's for 2.6.21-rc6 and works well.
> I also examined the following load tests:
>
> - Compile kernel with -j4 option continuously for 8 hours on i386 UP box.
> - Compile kernel with -j48 option continuously for 2 hours on ia64 12 CPU
> box.
>
> Now I'm testing its 2.6.21-rc6-mm1(with SD) version and would submit it soon.
Here is the -mm version. I examined the same test as -rc6.
Thanks,
Satoru
---
Fix the return of the first time_slice
Currently exiting process takes back its first_time_slice to its
parent. If the task is created with CLONE_PARENT or CLONE_THREAD
flag, the parent is not the creator. Hence the creator can't
collect the time_slice and time_slice leak occurs.
To fix this problem, remove first_time_slice field of task_struct
and introduce time_slice_reaper field instead. The new field means
the task which exiting task should return its first_time_slice to,
and is NULL after exhausting its first time_slice.
Signed-off-by: Satoru Takeuchi <[email protected]>
Index: linux-2.6.21-rc6-mm1.tsfix/include/linux/sched.h
===================================================================
--- linux-2.6.21-rc6-mm1.tsfix.orig/include/linux/sched.h 2007-04-15 16:56:12.000000000 +0900
+++ linux-2.6.21-rc6-mm1.tsfix/include/linux/sched.h 2007-04-15 16:56:17.000000000 +0900
@@ -864,8 +864,8 @@
* before being requeued at a lower priority.
*/
int time_slice;
- /* Is this the very first time_slice this task has ever run. */
- unsigned int first_time_slice;
+ /* The task which this task should bring back its first time_slice. */
+ struct task_struct *time_slice_reaper;
/* How much this task receives at each priority level */
int quota;
Index: linux-2.6.21-rc6-mm1.tsfix/kernel/exit.c
===================================================================
--- linux-2.6.21-rc6-mm1.tsfix.orig/kernel/exit.c 2007-04-15 16:56:12.000000000 +0900
+++ linux-2.6.21-rc6-mm1.tsfix/kernel/exit.c 2007-04-15 16:56:17.000000000 +0900
@@ -680,10 +680,14 @@
reaper = child_reaper(father);
break;
}
+ if (reaper->time_slice_reaper == father)
+ reaper->time_slice_reaper = NULL;
} while (reaper->exit_state);
list_for_each_safe(_p, _n, &father->children) {
p = list_entry(_p, struct task_struct, sibling);
+ if (p->time_slice_reaper == father)
+ p->time_slice_reaper = NULL;
choose_new_parent(p, reaper);
reparent_thread(p, father);
}
Index: linux-2.6.21-rc6-mm1.tsfix/kernel/sched.c
===================================================================
--- linux-2.6.21-rc6-mm1.tsfix.orig/kernel/sched.c 2007-04-15 16:56:12.000000000 +0900
+++ linux-2.6.21-rc6-mm1.tsfix/kernel/sched.c 2007-04-15 16:56:17.000000000 +0900
@@ -1693,7 +1693,7 @@
* The remainder of the first timeslice might be recovered by
* the parent if the child exits early enough.
*/
- p->first_time_slice = 1;
+ p->time_slice_reaper = current;
p->timestamp = sched_clock();
local_irq_enable();
out:
@@ -1768,16 +1768,18 @@
*/
void fastcall sched_exit(struct task_struct *p)
{
- struct task_struct *parent;
+ struct task_struct *reaper = p->time_slice_reaper;
unsigned long flags;
struct rq *rq;
- parent = p->parent;
- rq = task_rq_lock(parent, &flags);
- if (p->first_time_slice && task_cpu(p) == task_cpu(parent)) {
- parent->time_slice += p->time_slice;
- if (unlikely(parent->time_slice > parent->quota))
- parent->time_slice = parent->quota;
+ if (!reaper)
+ return;
+
+ rq = task_rq_lock(reaper, &flags);
+ if (p->time_slice_reaper && task_cpu(p) == task_cpu(reaper)) {
+ reaper->time_slice += p->time_slice;
+ if (unlikely(reaper->time_slice > reaper->quota))
+ reaper->time_slice = reaper->quota;
}
task_rq_unlock(rq, &flags);
}
@@ -3352,8 +3354,8 @@
int overrun;
int old_prio;
- if (unlikely(p->first_time_slice))
- p->first_time_slice = 0;
+ if (unlikely(p->time_slice_reaper))
+ p->time_slice_reaper = NULL;
if (rt_task(p)) {
p->time_slice = p->quota;
list_move_tail(&p->run_list, p->array->queue + p->prio);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]