On Mon, Jun 04, 2007 at 03:22:20PM +1000, Nigel Cunningham wrote:
> Hi.
>
> I can see that the idea of writing a kernel image from using another
> kernel sounds nice and clean initially, but the more we get into the
> details (yes, I am listening, even though I said nothing before now),
> the more it's sounding like the cure is worse than the disease.
I think if we look into the details a bit more, we may find that it is in
fact not worse after all. It would be nice if it were also the case that this
approach could be implemented in only a few hours of work, but unfortunately I
doubt that to be the case even though I imagine it may be somewhat simpler to
implement than the current swsusp and suspend2 implementations.
Just to give some perspective on the implementation, I believe the following
functions/procedures provided by the kernel to userspace (implemented as system
calls, sysfs files, ioctls, etc.) would be sufficient for this hibernation
approach:
(Note that I wrote this description after writing my responses to the other
points you make, and so it may make more sense for those to be read first.)
1. "start hibernation"
Parameters:
- "save image" kernel to use (either as the binary data or as a path to the
file perhaps);
- extra kernel command-line parameters to the "save image" kernel;
- an initrd for the "save image" kernel (if needed).
This function would result in the original kernel loading the "save image"
kernel into memory, stopping all devices, and jumping to the new kernel.
2. "resume from hibernation"
Parameters:
Somehow the block of memory containing the hibernate image would need to be
provided; it could be specified as a pointer to memory in the process
invoking this function, or alternatively something like /dev/snapshot could
be used.
This function would stop devices, shuffle the pages around in memory, and
jump back to the original kernel.
3. "abort hibernation"
Parameters:
The address to jump back to the original kernel would need to be specified;
the new kernel would know this address because it would be provided as a
kernel command-line parameter.
This function would act similarly to "resume from hibernation", except that
the pages are already in memory exactly where they need to be, so all that
needs to be done is to stop all devices, and jump back to the original
kernel.
If it is desired to do slightly more in the kernel, the "save image" kernel
could process the kernel command-line arguments to determine the pages that
need to be written, and provide of a view of them e.g. as /dev/snapshot, rather
than having the userspace under the "save image" kernel do that work and then
perhaps access the pages using /dev/mem.
> To get rid of process freezing, we're talking about:
Note that the advantage of this approach is not just getting rid of process
freezing and its associated problems. There is also the advantage of allowing
much greater flexibility in how the image is written, and avoiding disturbing
things like the network stack.
> * making hibernation depend on depriving the user of 32 or 64M of
> otherwise perfectly usable memory (thereby making hibernation on
> machines with less memory impossible)
It is not clear that this much memory would really need to be reserved. I'll
admit I don't fully understand the requirements for using kexec to load a
kernel. In particular, I don't know how much memory would really be required
to load a kernel to write an image, and to what extent that memory needs to be
contiguous. Even if a significant amount of contiguous physical memory needs
to be reserved at boot, this memory could still perhaps be used for the page
cache by the original kernel, since it could be freed up for hibernation (and
possibly those cached pages could be moved to different memory.)
In the best case, though, a significant amount of contiguous memory would not
be required, in which case a certain amount of memory would need to be freed
only for hibernation, and could be used normally while not hibernating.
(As a side note, with machines typically having 1GB+ of memory these days, even
wasting 64MB of memory is becoming increasingly unimportant, although I agree
it is not a good idea. I actually run an x86 system with 1GB of memory and no
HIGHMEM support, and as a result waste over 100MB of physical memory, which
would handily be free for the new kernel. Changing the VM split broke certain
programs that I didn't feel like fixing.)
> * requiring them to set up kexec or kdump (I don't understand the
> difference, sorry) or some new variation
This new hibernation approach would indeed internally use some or all of the
kexec code, but I don't think this detail would significantly impact the setup
procedure. The only real impact would be that the user would need to somehow
specify how to access the "save image kernel" and the additional kernel
command-line arguments to include. If an initrd is to be used instead of an
initramfs, then that would have to be specified as well. I don't think this
setup requirement is significantly more taxing than having to specify the
path to the user interface program, for instance.
> * adding interfaces to tell kexec/dump/whatever what pages need to be
> saved and reloaded
Any hibernation mechanism needs to know which pages to save. This approach is
no different. The "interface" could likely be one of the following:
1. Just before jumping to the new kernel, with interrupts disabled and devices
already stopped, the original kernel prepares a list of pages to write
somewhere in memory. The old kernel passes the address of this list as a
kernel command-line argument to the new kernel. The initramfs or initrd
userspace (or the kernel itself, although there would be no advantage in doing
this in the kernel) gets this address from the kernel command-line and then
reads that list to determine which pages to write. Presumably preparing the
list would be a small amount of code, and presumably both suspend2 and the
in-kernel swsusp already need to do something like this.
2. The old kernel prepares no new data structures, and simply provides a few
pointers as kernel command-line arguments to the new kernel to the existing
data structures that describe the pages that are used. The code running under
the new kernel responsible for writing the hibernation image simply accesses
these data structures using the pointers from the kernel command-line to
determine which pages to write.
> * adding convolutions in which at resume time we boot one kernel, switch
> to another kernel to do the loading and then switch back again to the
> resumed kernel (assuming I understand what you're suggesting).
This shouldn't actually be necessary. It should be possible to do the resume
in exactly the same way the in-kernel swsusp resumes currently (except that
userspace could be used to actually load the image into memory, and then tells
the kernel to do the necessary manipulations to stop devices, shuffle the
pages around so they are in the right positions, and then jump to the resumed
kernel).
>
> It all sounds terribly complicated and confusing to me, and that's
> before I even begin to think about how this second kernel could possibly
> write the image to an encrypted device or LVM or such like that the
> first kernel knows about and might use now.
I find in some ways it is much simpler than the current approaches. The "save
kernel" has to re-initialize device mapper devices that are needed to write the
image in exactly the same way that the resume kernel needs to reinitialize those
devices. In fact, it could probably use the very same initramfs/initrd code to
do it. The fact that it imposes this symmetry is arguably an advantage.
> Can't we just get the freezer right and be done with it?
The question is: can the freezer ever be right? As far as I can see, no level
of correctness of the freezer is going to allow you to save the hibernation
image to something on a fuse filesystem, because essentially any code that is
run while writing the image needs to live in an special box that is totally
isolated from the rest of the system in order to avoid problems; thus, it seems
like it makes sense to implement this box by simply using a separate kernel,
rather than adding hacks.
--
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]