Herbert Poetzl <[email protected]> writes:
> On Mon, Jun 26, 2006 at 10:40:59AM -0600, Eric W. Biederman wrote:
>> Daniel Lezcano <[email protected]> writes:
>>
>> >> Then you lose the ability for each namespace to have its own
>> >> routing entries. Which implies that you'll have difficulties with
>> >> devices that should exist and be visible in one namespace only
>> >> (like tunnels), as they require IP addresses and route.
>> >
>> > I mean instead of having the route tables private to the namespace, the
> routes
>> > have the information to which namespace they are associated.
>>
>> Is this an implementation difference or is this a user visible
>> difference? As an implementation difference this is sensible, as it is
>> pretty insane to allocate hash tables at run time.
>>
>> As a user visible difference that affects semantics of the operations
>> this is not something we want to do.
>
> well, I guess there are even more options here, for
> example I'd like to propose the following idea, which
> might be a viable solution for the policy/isolation
> problem, with the actual overhead on the setup part
> not the hot pathes for packet and connection handling
>
> we could use the multiple routing tables to provide
> a single routing table for each guest, which could
> be used inside the guest to add arbitrary routes, but
> would allow to keep the 'main' policy on the host, by
> selecting the proper table based on IPs and guest tags
>
> similar we could allow to have a separate iptables
> chain for each guest (or several chains), which are
> once again directed by the host system (applying the
> required prolicy) which can be managed and configured
> via normal iptable interfaces (both on the guest and
> host) but actually provide at least to layers
I have real concerns about the complexity of the route you
have described.
> note: this does not work for hierarchical network
> contexts, but I do not see that the yet proposed
> implementations would do, so I do not think that
> is of concern here ...
Well we are hierarchical in the sense that a parent
can have a different network namespace then a child.
So recursive containers work fine. So this is like
the uts namespace or the ipc namespace rather than
like the pid namespace.
I really do not believe we have a hotpath issue, if this
is implemented properly. Benchmarks of course need to be taken,
to prove this.
There are only two places a sane implementation should show issues.
- When the access to a pointer goes through a pointer to find
that global variable.
- When doing a lookup in a hash table we need to look at an additional
field to verify a hash match. Because having a completely separate
hash table is likely too expensive.
If that can be shown to really slow down packets on the hot path
I am willing to consider other possibilities. Until then I think
we are on path to the simplest and most powerful version of building
a network namespace usable by containers.
The routing between network namespaces does have the potential to
be more expensive than just a packet trivially coming off the wire
into a socket. However that is fundamentally from a lack of hardware.
If the rest works smarter filters in the drivers should enable to
remove the cost.
Basically it is just a matter of:
if (dest_mac == my_mac1) it is for device 1.
If (dest_mac == my_mac2) it is for device 2.
etc.
At a small count of macs it is trivial to understand it will go
fast for a larger count of macs it only works with a good data
structure. We don't hit any extra cache lines of the packet,
and the above test can be collapsed with other routing lookup tests.
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]