On Saturday 31 December 2005 00:52, Paolo Ornati wrote: > WAS: [SCHED] Totally WRONG prority calculation with specific test-case > (since 2.6.10-bk12) > http://lkml.org/lkml/2005/12/27/114/index.html > > On Wed, 28 Dec 2005 10:26:58 +1100 > > Con Kolivas <[email protected]> wrote: > > The issue is that the scheduler interactivity estimator is a state > > machine and can be fooled to some degree, and a cpu intensive task that > > just happens to sleep a little bit gets significantly better priority > > than one that is fully cpu bound all the time. Reverting that change is > > not a solution because it can still be fooled by the same process > > sleeping lots for a few seconds or so at startup and then changing to the > > cpu mostly-sleeping slightly behaviour. This "fluctuating" behaviour is > > in my opinion worse which is why I removed it. > > Trying to find a "as simple as possible" test case for this problem > (that I consider a BUG in priority calculation) I've come up with this > very simple program: Hi Paolo. Can you try the following patch on 2.6.15 please? I'm interested in how adversely this affects interactive performance as well as whether it helps your test case. Thanks, Con --- include/linux/sched.h | 9 +++++- kernel/sched.c | 72 ++++++++++++++++++++++---------------------------- 2 files changed, 41 insertions(+), 40 deletions(-) Index: linux-2.6.15/include/linux/sched.h =================================================================== --- linux-2.6.15.orig/include/linux/sched.h +++ linux-2.6.15/include/linux/sched.h @@ -683,6 +683,13 @@ static inline void prefetch_stack(struct struct audit_context; /* See audit.c */ struct mempolicy; +enum sleep_type { + SLEEP_NORMAL, + SLEEP_NONINTERACTIVE, + SLEEP_INTERACTIVE, + SLEEP_INTERRUPTED, +}; + struct task_struct { volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */ struct thread_info *thread_info; @@ -704,7 +711,7 @@ struct task_struct { unsigned long sleep_avg; unsigned long long timestamp, last_ran; unsigned long long sched_time; /* sched_clock time spent running */ - int activated; + enum sleep_type sleep_type; unsigned long policy; cpumask_t cpus_allowed; Index: linux-2.6.15/kernel/sched.c =================================================================== --- linux-2.6.15.orig/kernel/sched.c +++ linux-2.6.15/kernel/sched.c @@ -751,31 +751,22 @@ static int recalc_task_prio(task_t *p, u * prevent them suddenly becoming cpu hogs and starving * other processes. */ - if (p->mm && p->activated != -1 && + if (p->mm && p->sleep_type != SLEEP_NONINTERACTIVE && sleep_time > INTERACTIVE_SLEEP(p)) { p->sleep_avg = JIFFIES_TO_NS(MAX_SLEEP_AVG - DEF_TIMESLICE); } else { + /* * The lower the sleep avg a task has the more - * rapidly it will rise with sleep time. + * rapidly it will rise with sleep time. This enables + * tasks to rapidly recover to a low latency priority. + * If a task was sleeping with the noninteractive + * label do not apply this non-linear boost */ - sleep_time *= (MAX_BONUS - CURRENT_BONUS(p)) ? : 1; - - /* - * Tasks waking from uninterruptible sleep are - * limited in their sleep_avg rise as they - * are likely to be waiting on I/O - */ - if (p->activated == -1 && p->mm) { - if (p->sleep_avg >= INTERACTIVE_SLEEP(p)) - sleep_time = 0; - else if (p->sleep_avg + sleep_time >= - INTERACTIVE_SLEEP(p)) { - p->sleep_avg = INTERACTIVE_SLEEP(p); - sleep_time = 0; - } - } + if (p->sleep_type != SLEEP_NONINTERACTIVE || p->mm) + sleep_time *= + (MAX_BONUS - CURRENT_BONUS(p)) ? : 1; /* * This code gives a bonus to interactive tasks. @@ -818,11 +809,7 @@ static void activate_task(task_t *p, run if (!rt_task(p)) p->prio = recalc_task_prio(p, now); - /* - * This checks to make sure it's not an uninterruptible task - * that is now waking up. - */ - if (!p->activated) { + if (p->sleep_type != SLEEP_NONINTERACTIVE) { /* * Tasks which were woken up by interrupts (ie. hw events) * are most likely of interactive nature. So we give them @@ -831,13 +818,13 @@ static void activate_task(task_t *p, run * on a CPU, first time around: */ if (in_interrupt()) - p->activated = 2; + p->sleep_type = SLEEP_INTERRUPTED; else { /* * Normal first-time wakeups get a credit too for * on-runqueue time, but it will be weighted down: */ - p->activated = 1; + p->sleep_type = SLEEP_INTERACTIVE; } } p->timestamp = now; @@ -1356,22 +1343,23 @@ out_activate: if (old_state == TASK_UNINTERRUPTIBLE) { rq->nr_uninterruptible--; /* - * Tasks on involuntary sleep don't earn - * sleep_avg beyond just interactive state. + * Tasks waking from uninterruptible sleep are likely + * to be sleeping involuntarily on I/O and are otherwise + * cpu bound so label them as noninteractive. */ - p->activated = -1; - } + p->sleep_type = SLEEP_NONINTERACTIVE; + } else /* * Tasks that have marked their sleep as noninteractive get - * woken up without updating their sleep average. (i.e. their - * sleep is handled in a priority-neutral manner, no priority - * boost and no penalty.) + * woken up with their sleep average not weighted in an + * interactive way. */ - if (old_state & TASK_NONINTERACTIVE) - __activate_task(p, rq); - else - activate_task(p, rq, cpu == this_cpu); + if (old_state & TASK_NONINTERACTIVE) + p->sleep_type = SLEEP_NONINTERACTIVE; + + + activate_task(p, rq, cpu == this_cpu); /* * Sync wakeups (i.e. those types of wakeups where the waker * has indicated that it will leave the CPU in short order) @@ -2938,6 +2926,12 @@ EXPORT_SYMBOL(sub_preempt_count); #endif +static inline int interactive_sleep(enum sleep_type sleep_type) +{ + return (sleep_type == SLEEP_INTERACTIVE || + sleep_type == SLEEP_INTERRUPTED); +} + /* * schedule() is the main scheduler function. */ @@ -3063,12 +3057,12 @@ go_idle: queue = array->queue + idx; next = list_entry(queue->next, task_t, run_list); - if (!rt_task(next) && next->activated > 0) { + if (!rt_task(next) && interactive_sleep(next->sleep_type)) { unsigned long long delta = now - next->timestamp; if (unlikely((long long)(now - next->timestamp) < 0)) delta = 0; - if (next->activated == 1) + if (next->sleep_type == SLEEP_INTERACTIVE) delta = delta * (ON_RUNQUEUE_WEIGHT * 128 / 100) / 128; array = next->array; @@ -3081,7 +3075,7 @@ go_idle: } else requeue_task(next, array); } - next->activated = 0; + next->sleep_type = SLEEP_NORMAL; switch_tasks: if (next == rq->idle) schedstat_inc(rq, sched_goidle);
Attachment:
pgpBkLo0VUIx5.pgp
Description: PGP signature
- Follow-Ups:
- Re: [SCHED] wrong priority calc - SIMPLE test case
- From: Paolo Ornati <[email protected]>
- Re: [SCHED] wrong priority calc - SIMPLE test case
- From: Con Kolivas <[email protected]>
- Re: [SCHED] wrong priority calc - SIMPLE test case
- Prev by Date: [patch] kobject: don't oops on null kobject.name
- Next by Date: Re: Oops in ufs_fill_super at mount time
- Previous by thread: Re: [SCHED] wrong priority calc - SIMPLE test case
- Next by thread: Re: [SCHED] wrong priority calc - SIMPLE test case
- Index(es):