Linus Torvalds wrote:
(I'm coming in late, it's not been a high priority for me)
On Fri, 20 Jan 2006, Hubertus Franke wrote:
2nd:
==== Issue: we don't need pid virtualization, instead simply use
<container,pid> pair.
This requires a bit more thought. Essentially that's what I was doing,
but I mangled them into the same pid and using masking to add/remove the
container for internal use. As pointed out by Alan(?), we can indeed
reused the same pid internally many times as long as we can distinguish
during the pid-to-task_struct lookup. This is easily done because, the
caller provides the context hence the container for the lookup.
This is my preferred approach BY FAR.
Doing a <container,pid> approach is very natural, and avoids almost all
issues. At most, you might want to have a new system call (most naturally
just the one that is limited to the "init container" - it the one that we
boot up with) that can specify both container and pid explicitly, and see
all processes and access all processes. But all "normal" system calls
would only ever operate within their container.
That's what the current patch set does.
One "global container" that sees and accesses all and the rest is limited
to their respective "container".
The fact is, we want "containers" anyway for any virtualization thing, ie
vserver already adds them. And if we have containers, then it's very easy
("easyish") to split up the current static "pid_hash[]", "pidmap_array[]"
and "pidmap_lock", and make them per-container, and have a pointer to the
container for each "struct task_struct".
We are very close to that .. the pidmap_array is already organized that way.
This was done so not to make the container an object that penetrates every
where in the code. Now that the discussion is flushing out, actually
accessing those entities through the container of the context-task would
be the next logical restructuring of the code.
After that, there wouldn't even be a lot else to do. The normal system
calls would just use their own container, and the (few) places that save
away pid's for later (ie things that use "kill_proc_info_as_uid()" and
"struct fown_struct" friends) would have to also squirrell away the
container, but then you should be pretty much done.
Agreed.
Of course, you'll have to do the system calls to _create_ the containers
in the first place, but that's at a higher level and involves much more
than just the pid-space (ie a container would normally have more than just
the uid mappings, it would have any network knowledge too etc - hostname,
perhaps list of network devices associated with that context etc etc)
Right now we do it simply through a poor man's /proc/container fs approach
that should be reasonable straight forward to convert to a syscall.
Linus
Finally, I presume you followed the discussion on the conversion from task->pid
to access_functions that stirred some criticism. That part would disappear.
-- Hubertus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]