Re: RFC [patch 13/34] PID Virtualization Define new task_pid api

Kirill Korotaev <dev@sw.ru> writes:

>>>I hope you understand, that such things do not make anything
>>>secure. Administrator of the node will always have access to /proc/kcore,
>>>devices, KERNEL CODE(!) etc. No security from this point of view.
>> Only if they have CAP_SYS_RAWIO.  I admit it takes a lot more
>> to get there than just that.  But having a mechanism that has the
>> potential to be secured and is much simpler to understand
>> and to setup for minimal privileges than any of the other unix
>> addons I have seen is very interesting.
> ok. I suppose it can be done as an option. If required, access from host system
> can be allowed. If "secure" environment is requested - fully isolated.
>
>>>>3) Nesting of containers, (so they are general purpose and not special
> hacks).
>>>
>>>Why are you interested in nesting? Any applications for this?
>>>Until everything is virtualized in nesting way (including TCP/IP stack,
> routing
>>>etc.) I see no much use of it.
>> For everything except the PID namespace I am just interested in having
> multiple
>> separate namespaces.  For the PID namespace to keep the traditional unix
>> model you need a parent process so it is actually nesting.
> Yes, but nesting can be one level as in OpenVZ, when VPS is a nested namespace
> inside host system or it can be a fully isolated separate traditional namespace.
>
> By real nesting I mean hierarchical containers, when containers inside multiple
> containers are allowed. This is hard to implement. For example, for real
> containers you will need to have isolated TCP/IP stacks and with complex rules
> of routing etc.

TCP/IP is a pain because it has so many global static variables, but otherwise
it is easier than PIDs.  You just need what looks like 2 instances
of the network stack.   And since you usually don't have enough real
hardware a 2 headed network device that acts like 2 NICS plugged into
a cross over cable.

>> I am interested because, it is easy, because if it is possible than
>> the range of applications you can apply a containers to is much
>> larger.  At the far end of that spectrum is migrating a server running
>> on real hardware and bringing it up as a guest on a newer much more
>> powerful machine.  With the appearance that it had only been
>> unreachable for a few seconds.

> You can use fully isolated containers like OpenVZ VPSs for this. They are
> naturally suitable for this, because provide you not PIDs isolation only, but
> also IPC, sockets, etc.

Exactly.

> How can you migrate application which consists of two processes doing IPC via
> signals? They are not tired inside kernel anyhow and there is no way to
> automatically detect that both should be migrated together.
> VPSs what provides you such kind of boundaries of what should be considered as a
> whole.

I always look at migration in terms of a container/VPS.

However the entire system can be considered one such container.  So on the
extreme side you can migrate everything to another machine.  It's not
a hard requirement but it would be nice if the mechanism wasn't so special
that it prevented that.


>>>
>>>Huh, it sounds too easy. Just imagine that VPS owner has deleted ps, top,
> kill,
>>> bash and other tools. You won't be able to enter.
>
>> Entering is different from execing a process on the inside.
>> Implementation wise it is changing the context pointer on your task.
> If I understand you correctly it is fully insecure way of doing things. After
> changing context without applying all the restrictions which should be implied
> by VPS your process will be ptrace'able and so on.

Not exactly insecure.   But something you need to be careful with.

It's an idea I don't like personally.  But I like it more than adhoc
mechanisms for modify what I guest gets to look at.


>>>Another example when VPS owner
>>>is near its resource limits - you won't be able to do anything after VPS
>>>entering.
>> For debugging this is a good reason for being inside.  What if the
>> problem is that you are out of resources?
> Debugging - yes, in production - no.

I was talking about sysadmin style of debugging.

> That is why OpenVZ allows host system to access VPS resources - for debugging in
> production.

This is something I will freely admit is up on the air, how this
should be accomplished.  But I don't want an assumption that the host
system will always be able to access guest resources.  

> Thanks a lot for valuable discussion and your time!

Welcome.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- RFC [patch 00/34] PID Virtualization Overview
  - From: Serge Hallyn <serue@us.ibm.com>
- RFC [patch 13/34] PID Virtualization Define new task_pid api
  - From: Serge Hallyn <serue@us.ibm.com>
- Re: RFC [patch 13/34] PID Virtualization Define new task_pid api
  - From: Arjan van de Ven <arjan@infradead.org>
- Re: RFC [patch 13/34] PID Virtualization Define new task_pid api
  - From: "Serge E. Hallyn" <serue@us.ibm.com>
- Re: RFC [patch 13/34] PID Virtualization Define new task_pid api
  - From: Alan Cox <alan@lxorguk.ukuu.org.uk>
- Re: RFC [patch 13/34] PID Virtualization Define new task_pid api
  - From: Dave Hansen <haveblue@us.ibm.com>
- Re: RFC [patch 13/34] PID Virtualization Define new task_pid api
  - From: Greg KH <greg@kroah.com>
- Re: RFC [patch 13/34] PID Virtualization Define new task_pid api
  - From: Dave Hansen <haveblue@us.ibm.com>
- Re: RFC [patch 13/34] PID Virtualization Define new task_pid api
  - From: ebiederm@xmission.com (Eric W. Biederman)
- Re: RFC [patch 13/34] PID Virtualization Define new task_pid api
  - From: Hubertus Franke <frankeh@watson.ibm.com>
- Re: RFC [patch 13/34] PID Virtualization Define new task_pid api
  - From: Linus Torvalds <torvalds@osdl.org>
- Re: RFC [patch 13/34] PID Virtualization Define new task_pid api
  - From: Kirill Korotaev <dev@sw.ru>
- Re: RFC [patch 13/34] PID Virtualization Define new task_pid api
  - From: ebiederm@xmission.com (Eric W. Biederman)
- Re: RFC [patch 13/34] PID Virtualization Define new task_pid api
  - From: Kirill Korotaev <dev@sw.ru>
- Re: RFC [patch 13/34] PID Virtualization Define new task_pid api
  - From: ebiederm@xmission.com (Eric W. Biederman)
- Re: RFC [patch 13/34] PID Virtualization Define new task_pid api
  - From: Kirill Korotaev <dev@sw.ru>
- Re: RFC [patch 13/34] PID Virtualization Define new task_pid api
  - From: ebiederm@xmission.com (Eric W. Biederman)
- Re: RFC [patch 13/34] PID Virtualization Define new task_pid api
  - From: Kirill Korotaev <dev@sw.ru>

Prev by Date: Re: [ 00/10] [Suspend2] Modules support.
Next by Date: [PATCH] slab: extract setup_cpu_cache
Previous by thread: Re: RFC [patch 13/34] PID Virtualization Define new task_pid api
Next by thread: Re: RFC [patch 13/34] PID Virtualization Define new task_pid api
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]