Re: [RFC] Fix SMP brokenness for PF_FREEZE and make freezing usable for other purposes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Christoph I was wrong a bit. Due to use of completion you have no one race I described before. If the task is leaving refrigarator with TIF_FREEZE it will just visit refrigarator() once more, but won't sleep there since completion is done. BTW, I see no place where you initialize the completion.

Kirill

-static inline int freezing(struct task_struct *p)
-{
-    return p->flags & PF_FREEZE;
-}
+#if defined(CONFIG_PM) || defined(CONFIG_MIGRATE)


<<<< why not to make a single option CONFIG_REFRIGERATOR? It looks to be a
more robust way, since there are multiple users of it.



Yes. That may be better. We can do that once the migration code is finished and when we know what kind of CONFIG_XXX the migration code really needs.


+#ifdef CONFIG_PM


<<<< is it intentionaly? or you just lost CONFIG_MIGRATE?

It is intentional. freeze_processes and thaw_processes are only needed for suspend. One only needs to freeze a couple of processes for process migration.

But PM and your migrate code can be not the only users of it.

<<<< I still think this refrigerator is racy with freeze_processes():
<<<< scenarios:
<<<< scenario 1
<<<<
task1 -> freeze_processes():            taskXXX ->refrigerator()
 checked (task->flags & PF_FROZEN) == 0    cur->flags |= PF_FROZEN
                        clear TIF_FREEZE
                        <sleep on thaw>
 set TIF_FREEZING
                        clear PF_FROZEN

<<<< so the task awakes with TIF_FREEZE flag set!!!



Hmm... If we wait to clear both flags until after the completion notification then we do not have the race right? But then we need to move the signal recalc since it tests for TIF_FREEZE too.

It is almost ok, but it is still not fine :)

look what happens if you call freeze/unfreeze in a loop:

refrigerator:
awakes

freezer:
check PF_FROZEN, it is still set, skips task and thinks it is finished freezing.

refrigerator:
clears PF_FROZEN and TIF_FREEZE and returns.

I think you can fix this by moving PF_FROZEN check and set in both places under siglock.

Kirill


<<<< scenario 2
<<<< look at error path in freeze_processes (on timeout), it is broken as
well. You need to wakeup tasks there...



Ok. How about this additional patch? This still requires that process freezing does not immediately occurr again after the completion handler. All of this is iffy due to not having a real lock protecting all these values and we may still need to add some barriers.

Index: linux-2.6.12/kernel/power/process.c
===================================================================
--- linux-2.6.12.orig/kernel/power/process.c 2005-06-28 06:34:52.000000000 +0000 +++ linux-2.6.12/kernel/power/process.c 2005-06-28 06:40:28.000000000 +0000
@@ -47,12 +47,13 @@ int freeze_processes(void)
             unsigned long flags;
             if (!freezeable(p))
                 continue;
-            if ((p->flags & PF_FROZEN) ||
-                (p->state == TASK_TRACED) ||
+            if ((p->state == TASK_TRACED) ||
                 (p->state == TASK_STOPPED))
                 continue;
set_thread_flag(TIF_FREEZE);
+            if (p->flags & PF_FROZEN)
+                continue;
             spin_lock_irqsave(&p->sighand->siglock, flags);
             signal_wake_up(p, 0);
             spin_unlock_irqrestore(&p->sighand->siglock, flags);
@@ -63,6 +64,8 @@ int freeze_processes(void)
         if (time_after(jiffies, start_time + TIMEOUT)) {
             printk( "\n" );
printk(KERN_ERR " stopping tasks failed (%d tasks remaining)\n", todo );
+            complete_all(&thaw);
+            up(&freezer_sem);
             return todo;
         }
     } while(todo);
Index: linux-2.6.12/kernel/sched.c
===================================================================
--- linux-2.6.12.orig/kernel/sched.c 2005-06-28 06:34:52.000000000 +0000
+++ linux-2.6.12/kernel/sched.c    2005-06-28 06:37:36.000000000 +0000
@@ -5210,13 +5210,13 @@ DECLARE_COMPLETION(thaw);
 void refrigerator(void)
 {
     current->flags |= PF_FROZEN;
+    wait_for_completion(&thaw);
     clear_thread_flag(TIF_FREEZE);
+    current->flags &= ~PF_FROZEN;
     /* A fake signal 0 may have been sent. Recalculate sigpending */
     spin_lock_irq(&current->sighand->siglock);
     recalc_sigpending();
     spin_unlock_irq(&current->sighand->siglock);
-    wait_for_completion(&thaw);
-    current->flags &= ~PF_FROZEN;
 }
 EXPORT_SYMBOL(refrigerator);
 #endif





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux