Kyle Moffett wrote:
Problem 1: "updating cached acls of descendent objects": How do you
find out what a 'descendent object' is? Answer: You can't without
recursing through the entire in-memory dentry tree. Such recursion is
lock-intensive and has poor performance. Furthermore, you have to do
the entire recursion as an atomic operation; other cross-directory
renames or ACL changes would invalidate your results halfway through and
cause race conditions.
Yes, it would take some cpu time, and yes, it would have to use a lock
to protect the acl which would also lock out moves. Is that such a high
cost? Changing acls and moving whole directory trees around is not THAT
common of an operation... if it takes a wee bit more cpu time, I doubt
anyone will complain.
Oh, and by the way, the kernel has no real way to go from a dentry to a
(process, fd) pair. That data simply is not maintained because it is
unnecessary and inefficent to do so. Without that data you *can't*
What would you need (process,fd) for? You just need to walk the tree of
dentries and update them.
determine what is "dependent". Furthermore, even if you could it still
wouldn't work because you can't even tell which path the file was
originally opened via. Say you run:
mount --bind /mnt/cdrom /cdrom
umount /mnt/cdrom
Now any process which had a cwd or open directory handle in "/cdrom" is
STILL USING THE ACLs from when it was mounted as "/mnt/cdrom". If you
have the same volume bind-mounted in two places you can't easily
distinguish between them. Caching permission data at the vfsmount won't
even help you because you can move around vfsmounts as long as they are
in subdirectories:
mkdir -p /a/b/foo
mount -t tmpfs tmpfs /a/b/foo
mv /a/b /quux
umount /quux/foo
At this point you would also have to look at vfsmounts during your
recursive traversal and update their cached ACLs too.
Each bind mount point would need its own dentry so it could have its own
acl and parent pointer.
Problem 2: "Some other directory that you have a handle to": When you
are given this relative path and this cwd ACL, how do you determine the
total ACL of the parent directory:
path: ../foo/bar
cached cwd total-ACL:
root rwx (inheritable)
bob rwx (inheritable)
somegroup rwx (inheritable)
jane rwx
".." partial-ACL
root +rwx (inheritable)
somegroup +rx (inheritable)
Answer: you can't. For example, if "/" had the permission 'root +rwx
(inheritable)', and nothing else had subtractive permissions, then the
"root +rwx (inheritable)" in the parent dir would be a no-op, but you
can't tell that without storing a complete parent directory history.
What? The total acl of the parent directory is in its dentry.
Now assume that I "mkdir /foo && set-some-inheritable-acl-on /foo && mv
/home /foo/home". Say I'm running all sorts of X apps and GIT and a
number of other programs and have some conservative 5k FDs open on
/home. This is actually something I've done before (without the ACLs),
albeit accidentally. With your proposal, the kernel would first have to
identify all of the thousands of FDs with cached ACL data across a very
large cache-hot /home directory. For each FD, it would have to store an
updated copy of the partial-ACL states down its entire path. Oh, and
you can't do any other ACL or rename operations in the entire subtree
while this is going on, because that would lead to the first update
reporting incorrect results and racing with the second. You are also
extremely slow, deadlock-prone, and memory hungry, since you have to
take an enormous pile of dentry locks while doing the recursion. Nobody
can even open files with relative paths while this is going on because
the cached ACLs are in an intermediate and inconsistent state: they're
updated but the directory isn't in its new position yet.
Again, such a move would take some cpu time but the locks would not
block file opens, just other moves and permission changes. I don't
think this burden is terribly high for what is a relatively rare
operation.
This is completely opposite the way that permissions currently operate
in Linux. When I am chrooted, I don't care about the permissions of
*anything* outside of the chroot, because it simply doesn't exist.
Yes, it is different... if it wasn't we wouldn't be talking about it.
Furthermore you still don't answer the "computing ACL of parent
directory requires lots of space" problem.
What problem?
No, you aren't getting it: YOUR CWD DOES NOT GO AWAY WHEN YOU MOVE IT
OR UMOUNT -L IT. NEITHER DO OPEN DIRECTORY HANDLES. Sorry for yelling
It effectively does if you chmod 000 it. Same idea. You yank
permissions to cwd, then you can't access cwd anymore.
but this is the crux of the point I am trying to make. Any permissions
system which cannot handle a *completely* discontiguous filesystem space
cannot work on Linux; end of story. The primary reason behind that is
all sorts of filesystem operations are internally discontiguous because
it makes them much more efficient. By attempting to "force" the VFS to
pretend like everything is contiguous you are going to break horribly in
a thousand different corner cases that simply don't exist at the moment.
Not sure what you mean here.
No, your cwd still exists and is full of files. You can still navigate
around in it (same with any open directory handle). You can still open
files, chdir, move files, etc. There isn't even a way for the process
in NS1 to tell the processes in NS2 that its directories were
rearranged, so even a simple "NS1# mv /mnt/bar/a/somedir
/mnt/bar/b/somedir" is not going to work.
No, like above, your example is equivalent to either an rm -fr or a
chmod 000 of cwd. That means you either no longer can use cwd, or it is
now empty.
Whether it is with chmod or by moving changing the inherited acls, if
you loose access to cwd, then you can no longer access cwd.
But you said above "Yes, if /foo/quux is not already cached in memory,
then you would have to walk the tree to build it's ACL". Now assume
that instead of "/foo/quux", you are one directory deep in the
now-unmounted CDROM and you try to open "../baz/quux". In order to get
at the ACL of the parent directory it has to have an absolute path
somewhere, but at that point it doesn't.
It isn't unmounted yet, it is just disconnected from the tree, so it
would still be using the same acl it had prior to the disconnect.
What it "has to do" is it is part of the Linux ABI and as such you can't
just break it because it's "inconvenient" for inheritable ACLs. You
also can't make a previously O(1) operation take lots of time, as that's
also considered "major breakage".
It isn't inconvenient at all for inheritable acls, nor would chroot or
chdir be any slower.
"Pedantic corner case"? You could do the same thing even *WITHOUT* all
the processes holding open FDs, you would still have to iterate over the
entire in-cache portion of the subtree in order to verify that there are
no open FDs on it. Yet again you would also run into the problem that
we don't have *ANY* dentry-to-filehandle mapping in the kernel.
Again, don't care about filehandles. The only thing you have to do is
walk the dentries and update them. No different than a chmod -R or
setfacl -R, except of course, you only have to walk the dentries already
in memory, not update anything on disk.
Converting a previously O(1) operation into an O(number-of-subdirs)
operation is also known as "a major regression which we don't do a
release till we get it fixed". For boxes where O(number-of-subdirs)
numbers in the millions that would make it slow to a painful crawl.
We aren't talking about the scheduler or something here. We are talking
about an optional feature that doesn't slow things down if you don't
bother using it. If you aren't trying to move a super massive directory
tree which inherits permissions ( you can flag objects to NOT inherit
from their parent ) and has millions of cached dentries, then there is
no slowdown.
By the way, I'm done with this discussion since you don't seem to be
paying attention at all. Don't bother replying unless you've actually
written testable code you want people on the list to look at. I'll eat
my own words if you actually come up with an algorithm which works
efficiently without introducing regressions.
Testy testy... and why is it that the standard cop out on this list is
"code up or shut up"? If you don't want to discuss the idea, then don't...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]