I did some checkpoint/restart work on Linux about 5 years ago (you may
still be able to google "CRAK"), so I'm jumping in with my 2 cents.
> Personally, I think that these assumptions are incorrect for a
> checkpoint/restart facility. I think that:
>
> (1) It is really only possible to checkpoint/restart a
> cooperative process.
It's hard, but not impossible, at least theoretically.
> For this to work with uncooperative processes you have to
> figure out (for example) how to save and restore the file
> system state. (e.g. how do you
> get the file position set correctly for an open file in the
> restored program instance?)
This is actually one of the simplest problems in checkpoint/restart.
You'd need kernel support to save the state, and restart could be done
entirely in user space to restore the file descriptors.
> And this doesn't even consider what to do with open network
> connections.
Right, this is really hard. I played with it 5 years ago and I had semi
success on restoring network connections (with my limited understanding
on Linux networking and some really ugly hacks). I could restart a
killed remote Emacs X session with about 50% success rate.
> Similarly, what does one do about the content of System V
> shared memory regions or the contents of System V semaphores? I'm
sure
> there are many more such problems we can come up with a careful study
of the
> Linux/Unix API.
>
> (Note that "cooperation" in this context can also mean
> "willing to run inside of a container of some kind that supports
checkpoint/restart".)
>
> So you can probably only checkpoint the process at certain
> points in its lifetime, points which the application should be willing
to
> identify in some way. And I would argue that at such points in
time, you
> can require that the current register state doesn't include the
results of a
> system call such as getpid(), couldn't you?
Again, it IS very hard, but I don't think it's impossible to have
transparent checkpoint/restart. I mean, it cant be more difficult than
writing Linux from scratch, can it? :-)
> So, I guess my question is wrt the task_pid API is the
> following: Given that there are a lot of other problems to solve
before transparent
> checkpointing of uncooperative processes is possible, why should this
> partial solution be accepted into the main line kernel and "set in
stone" so to speak?
I agree with this. Before we see a mature checkpoint/restart solution
already implemented, there is no point in doing the vpid thing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]