Re: RFC [patch 13/34] PID Virtualization Define new task_pid api

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Eric W. Biederman wrote:
Hubertus Franke <[email protected]> writes:


...

Actions: The vpid_to_pid will disappear and the check for whether we are in the
same
container needs to be pushed down into the task lookup. question remains to
figure out
whether the context of the task lookup (will always remain the caller ?).


You don't need a same container check.  If something is in another container
it becomes invisible to you.


Eric, agreed.... that was implied by me (but poorly worded). What I meant (lets try this
again) is that the context defines/provides the namespace in which the lookup
is performed, hence as you say state.. naturally things in different containers
(namespaces) are invisible to you..


Doing so has an implication, namely that we are moving over to "system
containers".
The current implementation requires the vpid/pid only for the boundary condition
at the
top of the container (to rewrite pid=1) and its parent and the fact that we
wanted
a global look through container=0.
If said boundary would be eliminated and we simply make a container a child of
the
initproc (pid=1), this would be unnecessary.

all together this would provide private namespaces (as just suggested by Eric).

The feeling would be that large parts of patch could be reduce by this.


I concur.  Except I think the initial impact could still be large.
It may be worth breaking all users of pids just so we audit them.

But that will certainly result in no long term cost, or runtime overhead.


What we need is a new system calls (similar to vserver) or maybe we can continue
the /proc approach for now...

sys_exec_container(const *char container_name, pid_t pid, unsigned int flags,
const *char argv, const *char envp);

exec_container creates a new container (if indicated in flags) and a new task in
it that reports to parent initproc.
if a non-zero pid is specified we use that pid, otherwise the system will
allocate it. Finally
it create new session id ; chroot and exec's the specified program.

What we loose with this is the session and the tty, which Cedric described as
application
container...

The sys_exec_container(...)  seems to be similar to what Eric just called
clone_namespace()


Similar. But I was actually talking about just adding another flag to
sys_clone the syscall underlying fork().  Basically it is just another
resource not share or not-share.

Eric


That's a good idea .. right now we simply did this through a flag left by the call
to the /proc/container fs ... (awkward at best, but didn't break the API).
I have a concern wrt doing it in during fork namely the sharing of resources.
Whe obviously are looking at some constraints here wrt to sharing. We need to
ensure that this ain't a thread etc that will share resources
across "containers" (which then later aren't migratable due to that sharing).
So doing the fork_exec() atomically would avoid that problem.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux