Re: Thinking outside the box on file systems

Mmm, slow-as-dirt hotel wireless.  What fun...

On Aug 15, 2007, at 18:14:44, Phillip Susi wrote:

Kyle Moffett wrote:
I am well aware of that, I'm simply saying that sucks. Doing arecursive chmod or setfacl on a large directory tree is slow asall hell.
Doing it in the kernel won't make it any faster.
Right... I'm talking about getting rid of it entirely.

Let me repeat myself here: Algorithmically you fundamentally CANNOTimplement inheritance-based ACLs without one of the following(although if you have some other algorithm in mind, I'm listening):(A) Some kind of recursive operation *every* time you change aninheritable permission(B) A unified "starting point" from which you begin *every* access-control lookup (or one "starting point" per useful semantic grouping,like a namespace).

The "(A)" is presently done in userspace and that's what you want toavoid. As to (B), I will attempt to prove below that you cannotimplement "(B)" without breaking existing assumptions and restrictinga very nice VFS model.

Not necessarily. When I do "vim some-file-in-current-directory",for example, the kernel does *NOT* look up the path of my currentdirectory. It does (in pseudocode):
if (starts_with_slash(filename)) {
    entry = task->cwd;
} else {
    entry = task->root;
}
while (have_components_left(filename)
    entry = lookup_next_component(filename);
return entry;
Right.... and task->cwd would have the effective acl in memory,ready to be combined with any acl set on the file.


What ACL would "task->cwd" use?

Options:
(1.a) Use the one calculated during the original chdir() call.
(1.b) Navigate "up" task->cwd building an ACL backwards.
(1.c) $CAN_YOU_THINK_OF_SOMETHING_ELSE_HERE

Unsolvable problems with each option:

(1.a.I)

You just broke all sorts of chrooted daemons. When I start bind inits chroot jail, it does the following:

  chdir("/private/bind9");
  chroot(".");
  setgid(...);
  setuid(...);

The "/private" directory is readable only by root, since root is theonly one who will be navigating you into these chroots for anyreason. You only switch UID/GID after the chroot() call, at whichpoint you are inside of a sub-context and your cwd is fullyaccessible. If you stick an inheritable ACL on "/private", then the"cwd" ACL will not allow access by anybody but root and my bind won'tbe able to read any config files.

You also break relative paths and directory-moving. Say a processdoes chdir("/foo/bar"). Now the ACL data in "cwd" is appropriatefor /foo/bar. If you later chdir("../quux"), how do you unapply thechanges made when you switched into that directory? For inheritableACLs, you can't "unapply" such an ACL state change unless you savestate for all the parent directories, except... What happens whenyou are in "/foo/bar" and another process does "mv /foo/bar /foobar/quux"? Suddenly any "cwd" ACL data you have is completely invalidand you have to rebuild your ACLs from scratch. Moreover, if thedirectory you are in was moved to a portion of the filesystem notaccessible from your current namespace then how do you deal with it?


For example:
NS1 has the / root dir of /dev/sdb1 mounted on /mnt
NS2 has the /bar subdir of /dev/sdb1 mounted on /mnt

Your process is in NS2 and does chdir("/mnt/quux"). A user in NS1does: "mv /mnt/bar/quux /mnt/quux". Now your "cwd" is in a directoryon a filesystem you have mounted, but it does not correspond *AT ALL*to any path available from your namespace.


Another example:

Your process has done dirfd=open("/media/cdrom/somestuff") when theadmin does "umount -l /media/cdrom". You still have the CD-ROM openand accessible but IT HAS NO PATH. It isn't even mounted in *any*namespace, it's just kind of dangling waiting for its last users togo away. You can still do fchdir(dirfd), openat(dirfd, "foo/bar", ...), open("./foo"), etc.

In Linux the ONLY distinction between "relative" and "absolute" pathsis that the "absolute" path begins with a magic slash which impliesthat you start at the hidden "root" fd the kernel manages.


More detail on problems with the "building the ACL from scratch" part:

That's not even paying attention to functions like "fchdir" ortheir interactions with "chroot" and namespaces. I can probablyhave an open directory handle to a volume in a completelydifferent namespace, a volume which isn't even *MOUNTED* in mycurrent fs namespace. Using that file-handle I believe I can"fchdir", "openat", etc, in a completely different namespace. Ican do the same thing with a chroot, except there I can even"escape":
  /* Switch into chroot.  Doesn't drop root privs */
  chdir("/some/dir/somewhere");
  chroot(".");
  /* Malicious code later on */
  chdir("/");
  chroot("another_dir");
  chdir("../../../../../../../../..");
  chroot(".");
  /* Now I'm back in the real root filesystem */
I don't see what this has to do with this discussion, and I alsocan't believe that is correct... the chdir( "../../../../.." )should fail because there is no such directory.

No, this is correct because in the root directory "/", the ".." entryis just another link to the root directory. So the absolute path"/../../../../../.." is just a fancy name for the root directory.The above jail-escape-as-root exploit is possible because it isimpossible to determine whether a directory is or is not a subentryof another directory without an exhaustive search. So when your"cwd" points to a path outside of the chroot, the one special case inthe code for the "root" directory does not ever match and you can"chdir" all the way up to the real root. You can even do an fstat()after every iteration to figure out whether you're there or not!

And yes, this has been exploited before, although not often as chroot()-ed uid=0 daemons aren't all that common.

So, pray tell, when this code runs and you do the "chroot" call, whatACL do you think should get stuck on "cwd"? It doesn't referenceanything available relative to the chroot.

The locking penalty is because the path-lookup is *not* implied.The above chroot example shows that in detail. If you have to dothe lookup in *reverse* on every open operation then you have toeither:(A) Store lots of security context with every open directory(cwd included). When a directory you have open is moved, youstill have full access to everything inside it since your handle'sdata hasn't changed.
Yes, the effective acl of the open directory is kept in memory, butin the directory itself, not the handle to it, thus when thedirectory is moved, it's acl is recomputed for the new location andupdated immediately. It is like using fcntl to set a file to nonblocking... it is the file you set, not the handle to it, so iteffects other processes that have inherited or duplicated the file.

With this you just got into the big-ugly-nasty-recursive-behavioragain. Say I untar 20 kernel source trees and then have my programopen all 1000 available FDs to various directories in the kernelsource tree. Now I run 20 copies of this program, one for each tree,still well within my ulimits even on a conservative box. Now run "mvdir_full_of_kernel_sources some/new/dir". The only thing you can doto find all of the FDs is to iterate down the entire subdirectorytree looking for open files and updating their contexts one-by-one.Except you have 20,000 directory FDs to update. Ouch.

To sum up, when doing access control the only values you can safelyand efficiently get at are:

(A)  The dentry/inode
(B)  The superblock
(C)  *Maybe* the vfsmount if those patches get accepted

Any access control model which tries to poke other values is justgoing to have a shitload of corner cases where it just falls over.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: Thinking outside the box on file systems
  - From: Phillip Susi <[email protected]>

References:
- Thinking outside the box on file systems
  - From: Marc Perkel <[email protected]>
- Re: Thinking outside the box on file systems
  - From: alan <[email protected]>
- Re: Thinking outside the box on file systems
  - From: Michael Tharp <[email protected]>
- Re: Thinking outside the box on file systems
  - From: [email protected] (Lennart Sorensen)
- Re: Thinking outside the box on file systems
  - From: Kyle Moffett <[email protected]>
- Re: Thinking outside the box on file systems
  - From: Phillip Susi <[email protected]>
- Re: Thinking outside the box on file systems
  - From: Kyle Moffett <[email protected]>
- Re: Thinking outside the box on file systems
  - From: Phillip Susi <[email protected]>
- Re: Thinking outside the box on file systems
  - From: Kyle Moffett <[email protected]>
- Re: Thinking outside the box on file systems
  - From: Phillip Susi <[email protected]>

Prev by Date: Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Next by Date: Re: [ckrm-tech] Regression in 2.6.23-rc2-mm2, mounting cpusets causes a hang
Previous by thread: Re: Thinking outside the box on file systems
Next by thread: Re: Thinking outside the box on file systems
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]