RE: [RFC] [PATCH 00/13] Introduce task_pid api

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I did some checkpoint/restart work on Linux about 5 years ago (you may
still be able to google "CRAK"), so I'm jumping in with my 2 cents.

> Personally, I think that these assumptions are incorrect for a 
> checkpoint/restart facility.   I think that:
> 
> (1)  It is really only possible to checkpoint/restart a 
> cooperative process.

It's hard, but not impossible, at least theoretically.

> For this to work with uncooperative processes you have to 
> figure out (for example) how to save and restore the file 
> system state.  (e.g. how do you 
> get the file position set correctly for an open file in the 
> restored program instance?)

This is actually one of the simplest problems in checkpoint/restart.

You'd need kernel support to save the state, and restart could be done
entirely in user space to restore the file descriptors.

> And this doesn't even consider what to do with open network 
> connections.

Right, this is really hard. I played with it 5 years ago and I had semi
success on restoring network connections (with my limited understanding
on Linux networking and some really ugly hacks). I could restart a
killed remote Emacs X session with about 50% success rate.

> Similarly, what does one do about the content of System V 
> shared memory regions or the contents of System V semaphores?   I'm
sure 
> there are many more such problems we can come up with a careful study
of the 
> Linux/Unix API.
>
> (Note that "cooperation" in this context can also mean 
> "willing to run inside of a container of some kind that supports
checkpoint/restart".)
> 
> So you can probably only checkpoint the process at certain 
> points in its lifetime, points which the application should be willing
to 
> identify in some way.    And I would argue that at such points in
time, you 
> can require that the current register state doesn't include the
results of a 
> system call such as getpid(), couldn't you?

Again, it IS very hard, but I don't think it's impossible to have
transparent checkpoint/restart. I mean, it cant be more difficult than
writing Linux from scratch, can it? :-)

> So, I guess my question is wrt the task_pid API is the 
> following:   Given that there are a lot of other problems to solve
before transparent 
> checkpointing of uncooperative processes is possible, why should this 
> partial solution be accepted into the main line kernel and "set in
stone" so to speak?

I agree with this. Before we see a mature checkpoint/restart solution
already implemented, there is no point in doing the vpid thing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux