sched_domains SD_BALANCE_FORK and sched_balance_self

First off, apologies for not reviewing this code at 2.6.12-mm2, I wastied up with other things. I have some concerns as to the intent vs.actual implementation of SD_BALANCE_FORK and the sched_balance_fork()routine.

ARCHS=i386,x86_64,ia64

First, iirc SD_NODE_INIT initializes the sched_domain that contains allthe cpus in the system in node groups (with the exception of ia64 andthe allnodes domain). Correct?

SD_NODE_INIT for $ARCHS contains SD_BALANCE_FORK, and no other SD_*_INITroutines do. This seems strange to me as it would seem more appropriateto balance within a node on fork as to not have to access the duplicatedmm across nodes. If we are going to use SD_BALANCE_FORK, wouldn't itmake sense to push it down the sched_domain hierarchy to the SD_CPU_INITlevel?

And now to sched_balance_self(). As I understand it the purpose of thismethod is to choose the "best" cpu to start a task on. It seems to methat the best CPU for a forked process would be an idle CPU on the samenode as the parent in order to stay close to it's memory. Failing this,we may need to move to other nodes if they are idle enough to warrantthe move across node boundaries. Thoughts?

For the comments below, flag = SD_BALANCE_FORK.

static int sched_balance_self(int cpu, int flag)
{
        struct task_struct *t = current;
        struct sched_domain *tmp, *sd = NULL;

        for_each_domain(cpu, tmp)
                if (tmp->flags & flag)
                        sd = tmp;

This jumps to the highest level domain that supports SD_BALANCE_FORKwhich are the domains created with SD_NODE_INIT, so we look at all theCPUs on the system first rather than those local to the parent's node.



        while (sd) {
                cpumask_t span;
                struct sched_group *group;
                int new_cpu;
                int weight;

                span = sd->span;
                group = find_idlest_group(sd, t, cpu);
                if (!group)
                        goto nextlevel;

                new_cpu = find_idlest_cpu(group, cpu);
                if (new_cpu == -1 || new_cpu == cpu)
                        goto nextlevel;

                /* Now try balancing at a lower domain level */
                cpu = new_cpu;
nextlevel:
                sd = NULL;
                weight = cpus_weight(span);
                for_each_domain(cpu, tmp) {
                        if (weight <= cpus_weight(tmp->span))
                                break;
                        if (tmp->flags & flag)
                                sd = tmp;
                }

If I am reading it right, this for_each_domain will exit immediately ifjumped to via nextlevel and will only do any work if a new cpu is foundto run on (which is fair sense there is no need to keep looking if thewhole system doesn't have a better place for us to go). If a new cpu_is_ assigned though, for_each_domain will start with the lowest leveldomain - which always has the smallest cpus_weight doesn't it? If so,won't the (weight <= cpu...) condition always equate to true, ending theloop at the first domain? If so, then that last loop doesn't doanything at all, ever. If I am misreading this fragment, could someoneplease correct my thinking?


                /* while loop will break here if sd == NULL */
        }

        return cpu;
}

So it seems to me that we should first look at the cpus on the localdomain for SD_BALANCE_FORK. SD_BALANCE_EXEC tasks however have aminimal mm and could probably be put on whichever cpu/group is the mostidle, regardless of node. Thoughts?


Thanks,


--
Darren Hart
IBM Linux Technology Center
Linux Kernel Team
Phone: 503 578 3185
  T/L: 775 3185
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: sched_domains SD_BALANCE_FORK and sched_balance_self
  - From: "Siddha, Suresh B" <suresh.b.siddha@intel.com>

Prev by Date: Re: lockups with netconsole on e1000 on media insertion
Next by Date: Re: [PATCH] Export handle_mm_fault to modules.
Previous by thread: x86_64 frame pointer via thread context
Next by thread: Re: sched_domains SD_BALANCE_FORK and sched_balance_self
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind]