Hi. On Thu, 2007-04-26 at 21:28 +0200, Rafael J. Wysocki wrote: > On Thursday, 26 April 2007 18:10, Pekka Enberg wrote: > > > > On 4/26/2007, "Rafael J. Wysocki" <[email protected]> wrote: > > > In principle, we could add suspend2 as an alternative (in analogy with the I/O > > > schedulers, for example), but I think for this purpose it should be reviewed > > > properly. > > > > Yeah, this makes sense. > > > > On 4/26/2007, "Rafael J. Wysocki" <[email protected]> wrote: > > > There also is a real problem with how it uses the LRU pages. It _seems_ to > > > work, but at least to me it seems to be potentially dangerous. > > > > I am new to suspend2 so can you please explain what exactly is dangerous > > about it? > > After freezing tasks, it first saves the contents of the LRU pages, freezes > devices and then uses the LRU pages for storing the suspend image (if more > memory is needed, it's allocated, but that's irrelevant here). Now, we have no > warranty that the LRU pages are not updated after we've saved their contents > (first potential problem here). > > After the image has been created, we have to unfreeze devices and save the > image. Now, we have no warranty that no one will be writing to the LRU pages > that we have used to store the image, for whatever reasons known to him, so the > image can potentially get corrupted while it's being saved. > > In principle, device drivers can do this and there are some kernel threads that > also can do this (we don't freeze them, because they're needed for the image > saving). > > The design is conceptually really really complicated and it makes strong > assumptions about the behavior of different subsystems. While these > assumptions _may_ be satisfied right now, we'd have to ensure the satisfaction > of them in the future if suspend2 were merged. That's a good description of the issue, although I think _may_ and _seems_ are stating things a bit more pessimistically than is necessary. You see, we need to remember that the pages which are saved separately are LRU pages. Because userspace is frozen, their contents are going to be static. The only possibilities for modifying them come from timer routines, improperly frozen filesystems and device drivers. We have code to check that the LRU isn't changing, and I've only seen one report of modifications to about 20 LRU pages. I haven't had the time yet to chase down the cause, but hope to do so soon. The general scheme has been working for four or five years - if there was a fundamental issue, we would have found it by now. The scheme isn't complicated. The algo for figuring out whether to save the page in an atomic copy just says: Iterate through all LRU pages. For each page, ask: Is this used by the thread suspending, or by userui? No? Save separately. Yes? Save in the atomic copy.... oh, and save everything else that needs to be saved in the atomic copy. Regards, Nigel
Attachment:
signature.asc
Description: This is a digitally signed message part
- Follow-Ups:
- Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
- From: "Rafael J. Wysocki" <[email protected]>
- Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
- References:
- Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
- From: "Pekka Enberg" <[email protected]>
- Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
- From: "Rafael J. Wysocki" <[email protected]>
- Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
- Prev by Date: Re: 2.6.21-rc7-mm1: BUG_ON in kthread_bind during _cpu_down
- Next by Date: Re: [PATCH] ARM: Remove useless config option GENERIC_BUST_SPINLOCK.
- Previous by thread: Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
- Next by thread: Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
- Index(es):