Re: [RFC] [PATCH 00/13] Introduce task_pid api

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tuesday 15 November 2005 13:41, Serge E. Hallyn wrote:
> Quoting Ray Bryant ([email protected]):
> > On Monday 14 November 2005 15:23, Serge E. Hallyn wrote:
> > > --
> > >
> > > I'm part of a project implementing checkpoint/restart processes.
> > > After a process or group of processes is checkpointed, killed, and
> > > restarted, the changing of pids could confuse them.  There are many
> > > other such issues, but we wanted to start with pids.
> >
> > I've read through the rest of this thread, but it seems to me that the
> > real problems are in the basic assumptions you are making that are
> > driving the rest of this effort and perhaps we should be examining those
> > assumptions instead of your patch.
>
> Ok.
>
> > For example, from what I've read (particularly Hubertus's post that the
> > pid could be in a register), I'm inferring that what you want to do is to
> > be able to checkpoint/restart an arbitrary process at an arbitrary time
> > and without any special support for checkpoint/restart in that process.
>
> Yes.
>
> > Also (c. f. Dave Hansen's post on the number of Xen virtual machines
> > required),  you appear to think that the number of processes on the
> > system for which checkpoint/restart should be enabled is large (more or
> > less the same as the number of processes on the system).
>
> Right.
>
> > Am I reading this correctly?
>
> As far as I can see, yes.
>
> -serge

Personally, I think that these assumptions are incorrect for a 
checkpoint/restart facility.   I think that:

(1)  It is really only possible to checkpoint/restart a cooperative process.
For this to work with uncooperative processes you have to figure out (for 
example) how to save and restore the file system state.  (e. g. how do you 
get the file position set correctly for an open file in the restored program 
instance?)   And this doesn't even consider what to do with open network 
connections.

Similarly, what does one do about the content of System V shared memory 
regions or the contents of System V semaphores?   I'm sure there are many 
more such problems we can come up with a careful study of the Linux/Unix API.

(Note that "cooperation" in this context can also mean "willing to run inside 
of a container of some kind that supports checkpoint/restart".)

So you can probably only checkpoint the process at certain points in its 
lifetime, points which the application should be willing to identify in some 
way.    And I would argue that at such points in time, you can require that 
the current register state doesn't include the results of a system call such 
as getpid(), couldn't you?

(2)  Checkpoint/Restart really only makes sense for a long running, resource 
intensive job.   (e. g. for a job that is doing a lot of work and hence, for 
which, recovery is hard -- perhaps as hard as re-running the entire job).
By their very nature, there are probably only a few such jobs running on the 
system.    If there are lots of such jobs on the system, then re-running each 
one can't be that hard, can it?

So, I guess my question is wrt the task_pid API is the following:   Given that 
there are a lot of other problems to solve before transparent checkpointing 
of uncooperative processes is possible, why should this partial solution be 
accepted into the main line kernel and "set in stone" so to speak?

Don't get me wrong, I would love for there to be a commonly accepted 
checkpoint/restart API.    But I don't think that this can be done 
transparently at the kernel level and without some cooperation from the 
target task.
-- 
Ray Bryant
AMD Performance Labs                   Austin, Tx
512-602-0038 (o)                 512-507-7807 (c)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux