On Wed, May 11, 2005 at 01:42:35PM -0700, Paul Jackson wrote: > So what we'd really like to do would be to first fallback to all the > cpus allowed in the specified tasks cpuset (no walking the cpuset > hierarchy), and see if any of those cpus are still online to receive > this orphan task. Unless someone has botched the system configuration, > and taken offline all the cpus in a cpuset, this should yield up a cpu > that is still both allowed and online. If that fails, then to heck with > honoring cpuset placement - just take the first online cpu we can find. > I tried your suggestion, but hit another oops. This has more to do with hotplug+preempt looks like Code: fc 89 ec 5d e9 8b 0e 00 00 c7 04 24 5c 49 41 c0 e8 7f 39 00 00 e8 da 87 fe ff eb c1 0f 0b 36 11 32 96 41 c0 eb 92 83 f8 <6>note: cpuoff.sh[25439] exited with preempt_count 1 scheduling while atomic: cpuoff.sh/0x10000001/25439 [<c04010d2>] schedule+0x4b2/0x4c0 [<c0152da0>] unmap_page_range+0xc0/0xe0 [<c0401988>] cond_resched+0x28/0x40 [<c0152f56>] unmap_vmas+0x196/0x290 [<c0157c93>] exit_mmap+0xa3/0x1a0 [<c011ce53>] mmput+0x43/0x100 [<c01222f0>] do_exit+0xf0/0x3b0 [<c01047da>] die+0x18a/0x190 [<c0104bb0>] do_invalid_op+0x0/0xd0 [<c0104c5e>] do_invalid_op+0xae/0xd0 [<c011bbe7>] migrate_dead+0xa7/0xc0 [<c0400f14>] schedule+0x2f4/0x4c0 [<c040112b>] preempt_schedule+0x4b/0x70 [<c040112b>] preempt_schedule+0x4b/0x70 [<c0103fd7>] error_code+0x4f/0x54 [<c040007b>] svc_proc_unregister+0x1b/0x20 [<c011007b>] __acpi_map_table+0x7b/0xc0 [<c011bbe7>] migrate_dead+0xa7/0xc0 [<c011fef9>] profile_handoff_task+0x39/0x50 [<c011bc62>] migrate_dead_tasks+0x62/0x90 [<c011bda6>] migration_call+0x116/0x2c0 [<c0142102>] __stop_machine_run+0x92/0xb0 [<c012ddad>] notifier_call_chain+0x2d/0x50 [<c013a3cb>] cpu_down+0x16b/0x2a0 [<c027db0b>] store_online+0x5b/0x80 [<c027ab25>] sysdev_store+0x35/0x40 [<c019f14e>] flush_write_buffer+0x3e/0x50 [<c019f1b8>] sysfs_write_file+0x58/0x80 [<c0163a36>] vfs_write+0xc6/0x180 [<c0163bc1>] sys_write+0x51/0x80 [<c01034e5>] syscall_call+0x7/0xb On 2.6.12-rc4-mm1 it is even worse, it panics Booting processor 2/1 eip 3000 Calibrating delay using timer specific routine.. 1800.08 BogoMIPS (lpj=900043) CPU2: Intel Pentium III (Cascades) stepping 04 Unable to handle kernel NULL pointer dereference<7>CPU0 attaching sched-domain: at virtual address 00000001 printing eip: 00000001 *pde = 000001e3 *pte = 00000001 Oops: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 2 EIP: 0060:[<00000001>] Not tainted VLI EFLAGS: 00010006 (2.6.12-rc4-mm1) EIP is at 0x1 eax: c18c0000 ebx: c04758a0 ecx: 00000001 edx: e8c13eb4 esi: ffffffff edi: c185c030 ebp: c18c1f80 esp: c18c1f2c ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=c18c0000 task=c185c030) Stack: c0114e0d e8c13eb4 c18c0000 c0103ecc c18c0000 00000008 00000002 ffffffff c185c030 c18c1f80 00000001 0000007b 0000007b fffffffb c04048a0 00000060 00000206 c040511c 00000003 00000000 00000000 c18c0000 c01033ee 00000003 Call Trace: [<c0114e0d>] smp_call_function_interrupt+0x3d/0x60 [<c0103ecc>] call_function_interrupt+0x1c/0x24 [<c04048a0>] schedule+0x0/0x7c0 [<c040511c>] preempt_schedule_irq+0x4c/0x80 [<c01033ee>] need_resched+0x1f/0x21 [<c011516f>] start_secondary+0x10f/0x1a0 Code: Bad EIP value. <0>Kernel panic - not syncing: Fatal exception in interrupt Booting processor 3/2 eip 3000 Initializing CPU#3 I really havent had a chance to investigate whats going on, should be able to do that tomorrow. Here is the patch I tried, my .config and scripts -Dinakar
diff -Naurp linux-2.6.12-rc2-mm3.orig/include/linux/cpuset.h linux-2.6.12-rc2-mm3-0/include/linux/cpuset.h --- linux-2.6.12-rc2-mm3.orig/include/linux/cpuset.h 2005-05-11 19:44:42.000000000 +0530 +++ linux-2.6.12-rc2-mm3-0/include/linux/cpuset.h 2005-05-12 16:54:15.000000000 +0530 @@ -19,6 +19,7 @@ extern void cpuset_init_smp(void); extern void cpuset_fork(struct task_struct *p); extern void cpuset_exit(struct task_struct *p); extern cpumask_t cpuset_cpus_allowed(const struct task_struct *p); +extern const cpumask_t cpuset_task_cpus_allowed(const struct task_struct *p); void cpuset_init_current_mems_allowed(void); void cpuset_update_current_mems_allowed(void); void cpuset_restrict_to_mems_allowed(unsigned long *nodes); @@ -38,6 +39,10 @@ static inline cpumask_t cpuset_cpus_allo { return cpu_possible_map; } +static inline cpumask_t cpuset_task_cpus_allowed(struct task_struct *p) +{ + return cpu_possible_map; +} static inline void cpuset_init_current_mems_allowed(void) {} static inline void cpuset_update_current_mems_allowed(void) {} diff -Naurp linux-2.6.12-rc2-mm3.orig/kernel/cpuset.c linux-2.6.12-rc2-mm3-0/kernel/cpuset.c --- linux-2.6.12-rc2-mm3.orig/kernel/cpuset.c 2005-05-11 19:44:42.000000000 +0530 +++ linux-2.6.12-rc2-mm3-0/kernel/cpuset.c 2005-05-12 17:19:53.000000000 +0530 @@ -1445,6 +1445,31 @@ cpumask_t cpuset_cpus_allowed(const stru return mask; } +/** + * cpuset_task_cpus_allowed - return cpus_allowed mask from a tasks cpuset. + * @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed. + * + * Description: Returns the cpumask_t cpus_allowed of the cpuset + * attached to the specified @tsk. Unlike cpuset_cpus_allowed(), + * is not guaranteed to return a non-empty subset of cpu_online_map. + * Does not walk up the cpuset hierarchy, and does not attempt to + * acquire the cpuset_sem. If called on a task about to exit, + * where tsk->cpuset is already NULL, return cpu_online_map. + * + * Call with task locked. + **/ + +const cpumask_t cpuset_task_cpus_allowed(const struct task_struct *tsk) +{ + struct cpuset *cs = tsk->cpuset; + + if (!cs) + return cpu_online_map; + if (cpus_intersects(cs->cpus_allowed, cpu_online_map)) + return cs->cpus_allowed; + return cpu_online_map; +} + void cpuset_init_current_mems_allowed(void) { current->mems_allowed = NODE_MASK_ALL; diff -Naurp linux-2.6.12-rc2-mm3.orig/kernel/sched.c linux-2.6.12-rc2-mm3-0/kernel/sched.c --- linux-2.6.12-rc2-mm3.orig/kernel/sched.c 2005-05-11 19:44:42.000000000 +0530 +++ linux-2.6.12-rc2-mm3-0/kernel/sched.c 2005-05-12 16:50:04.000000000 +0530 @@ -4301,7 +4301,7 @@ static void move_task_off_dead_cpu(int d /* No more Mr. Nice Guy. */ if (dest_cpu == NR_CPUS) { - tsk->cpus_allowed = cpuset_cpus_allowed(tsk); + tsk->cpus_allowed = cpuset_task_cpus_allowed(tsk); dest_cpu = any_online_cpu(tsk->cpus_allowed); /*
Attachment:
test-scripts.tar.gz
Description: application/tar-gz
- Follow-Ups:
- Re: [Lse-tech] Re: [PATCH] cpusets+hotplug+preepmt broken
- From: Dinakar Guniguntala <[email protected]>
- Re: [Lse-tech] Re: [PATCH] cpusets+hotplug+preepmt broken
- References:
- [PATCH] cpusets+hotplug+preepmt broken
- From: Dinakar Guniguntala <[email protected]>
- Re: [PATCH] cpusets+hotplug+preepmt broken
- From: Nathan Lynch <[email protected]>
- Re: [PATCH] cpusets+hotplug+preepmt broken
- From: Paul Jackson <[email protected]>
- [PATCH] cpusets+hotplug+preepmt broken
- Prev by Date: Need kernel patch to compile with Intel compiler
- Next by Date: Re: [PATCH] adjust i386 watchdog tick calculation
- Previous by thread: Re: [PATCH] cpusets+hotplug+preepmt broken
- Next by thread: Re: [Lse-tech] Re: [PATCH] cpusets+hotplug+preepmt broken
- Index(es):