Hubertus Franke <[email protected]> writes:
> Eric W. Biederman wrote:
> Let me just summarize the discussion that has taken place so far
> and the consequences I/we seem to be drawing out of it.
>
> We discussed the various approaches that are floating around now, enough
> has been said about each, so I leave it at that ...
> (a) GLIBC intercept LD_PRELOAD
> (b) Binary Rewrite of glibc
> (c) syscall table intercept (see ZAP)
> (d) vpid approach (see "IBM" patches posted)
> (e) <pid,container> approach (see below, suggested by Alan?.. )
That seems to have been an observation of the current patchset.
>
> There are several issues that came up in the email exchange ( Arjen, Alan Cox,
> .. ).
> [ Please feel free to tell me if I/we captured or misinterpregin these wrong ]
...
> Actions: The vpid_to_pid will disappear and the check for whether we are in the
> same
> container needs to be pushed down into the task lookup. question remains to
> figure out
> whether the context of the task lookup (will always remain the caller ?).
You don't need a same container check. If something is in another container
it becomes invisible to you.
> Doing so has an implication, namely that we are moving over to "system
> containers".
> The current implementation requires the vpid/pid only for the boundary condition
> at the
> top of the container (to rewrite pid=1) and its parent and the fact that we
> wanted
> a global look through container=0.
> If said boundary would be eliminated and we simply make a container a child of
> the
> initproc (pid=1), this would be unnecessary.
>
> all together this would provide private namespaces (as just suggested by Eric).
>
> The feeling would be that large parts of patch could be reduce by this.
I concur. Except I think the initial impact could still be large.
It may be worth breaking all users of pids just so we audit them.
But that will certainly result in no long term cost, or runtime overhead.
> What we need is a new system calls (similar to vserver) or maybe we can continue
> the /proc approach for now...
>
> sys_exec_container(const *char container_name, pid_t pid, unsigned int flags,
> const *char argv, const *char envp);
>
> exec_container creates a new container (if indicated in flags) and a new task in
> it that reports to parent initproc.
> if a non-zero pid is specified we use that pid, otherwise the system will
> allocate it. Finally
> it create new session id ; chroot and exec's the specified program.
>
> What we loose with this is the session and the tty, which Cedric described as
> application
> container...
>
> The sys_exec_container(...) seems to be similar to what Eric just called
> clone_namespace()
Similar. But I was actually talking about just adding another flag to
sys_clone the syscall underlying fork(). Basically it is just another
resource not share or not-share.
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]