Re: [PATCH] cpusets+hotplug+preepmt broken

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Paul Jackson wrote:
> 
> I share you preference for not nesting these semaphores.
> 
> The other choice I am aware of would be for the hotplug code to be less
> cpuset-friendly.  In the move_task_off_dead_cpu() code, at the point it
> says "No more Mr. Nice Guy", instead of looking for the nearest
> enclosing cpuset that has something online, which is what the
> cpuset_cpus_allowed() does, instead we could just take any damn cpu that
> was online.
> 
> Something along the lines of the following fix:
> 
> --- pj/kernel.old/sched.c	2005-05-11 13:00:17.000000000 -0700
> +++ pj/kernel.new/sched.c	2005-05-11 13:02:24.000000000 -0700
> @@ -4229,7 +4229,7 @@ static void move_task_off_dead_cpu(int d
>  
>  	/* No more Mr. Nice Guy. */
>  	if (dest_cpu == NR_CPUS) {
> -		tsk->cpus_allowed = cpuset_cpus_allowed(tsk);
> +		tsk->cpus_allowed = cpu_online_map;
>  		dest_cpu = any_online_cpu(tsk->cpus_allowed);

Well, CPU_MASK_ALL rather than cpu_online_map, I would think.  That is
what the behavior was before the cpuset merge, anyway.  It might be
the best short term solution, more below...

> So what we'd really like to do would be to first fallback to all the
> cpus allowed in the specified tasks cpuset (no walking the cpuset
> hierarchy), and see if any of those cpus are still online to receive
> this orphan task.  Unless someone has botched the system configuration,
> and taken offline all the cpus in a cpuset, this should yield up a cpu
> that is still both allowed and online.  If that fails, then to heck with
> honoring cpuset placement - just take the first online cpu we can find.
> 
> This is doable without holding cpuset_sem.  We can look at a current
> tasks cpuset without cpuset_sem, just with the task lock.

Yes, but your patch doesn't lock the task itself (unless I'm
misreading patches again).  However, have a look at the comments above
task_lock in sched.h:

/*
 * Protects ->fs, ->files, ->mm, ->ptrace, ->group_info, ->comm, keyring
 * subscriptions and synchronises with wait4().  Also used in procfs.
 *
 * Nests both inside and outside of read_lock(&tasklist_lock).
 * It must not be nested with write_lock_irq(&tasklist_lock),
 * neither inside nor outside.
 */
static inline void task_lock(struct task_struct *p)
{
        spin_lock(&p->alloc_lock);
}


Unfortunately, move_task_off_dead_cpu is called from
migrate_live_tasks while the latter has a write_lock_irq on
tasklist_lock.  So we can't use task_lock in this context, assuming
the comments are valid.  Right?


Nathan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux