fuse, get_user_pages, flush_anon_page, aliasing caches and all that again

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've recently been asked to look at why fuse doesn't work on ARM.
In doing so and thinking about what fuse is doing, I've come to
the conclusion that the way fuse accesses the _current_ processes
memory space is less than ideal.

The problem is as follows:

- current process calls writev()
- fuse_dev_write is invoked, and this ends up calling
   get_user_pages(current, current->mm, address, 1, 0, ...)
- data is then read from the kernel mapping of this page (briefly) via:
   void *mapaddr = kmap_atomic(page, ..);
   memcpy(dest, mapaddr + offset, size);
   kunmap_atomic(mapaddr);

With a VIVT cache, there is an aliasing issue.  Data may be in the
cache lines corresponding with the userspace mapping, thereby making
data read from the kernel mapping incoherent.

On the face of it, the solution seems simple - we could assume that the
kernel mapping does not have any cache lines allocated with it, so we
can just write back the userspace mapping to make that data visible.

However, the action of that memcpy() will allocate some cache lines
in the kernel mapping of this page, which means when we next come to
read it (maybe via the same code) we could end up reading stale data.

Therefore, we also need to flush - more specifically, discard - the
kernel mapping cache lines.


Okay, now let's look at the read case:

- current process calls readv()
- fuse_dev_read is invoked, and eventually calls:
   get_user_pages(current, current->mm, address, 1, 1, ...)
- data is then written to the kernel mapping of this page (briefly) via:
   void *mapaddr = kmap_atomic(page, ..);
   memcpy(mapaddr + offset, source, size);
   kunmap_atomic(mapaddr);

A similar problem exists here.  The userspace mapping may have some
cache lines assocated with it, and as we found out above, so may the
kernel mapping.  We can apply the same fix to it.

However, and this is the problem, we need cache maintainence _after_
that memcpy() has completed - with a write allocate VIVT cache, the
memcpy() itself will allocate cache lines in the kernel mapping of
the page which will need to be written back for the user process to
see that data.

Moreover, the userspace mapping would need its cache lines evicted
(maybe by get_user_pages()) for the written back kernel cache lines
to be visible.  The more I think about this, the more I'm convinced
that this part could only be done by get_user_pages().


Now, throw in SMP or preempt with a multi-threaded userspace program
touching the page in question, and the problem just gets much much
worse.  In such a scenario, we can not guarantee, no matter how much
cache maintainence is applied to the kernel, that this API comes
anywhere near to being safe.


To summarise, in order for get_user_pages() to vaguely work with this
method of accessing the current processes address space on ARM, we'd
need to make the following changes:

1. get_user_pages() needs to unconditionally write back and invalidate
   the user space mapping.
2. get_user_pages() needs to invalidate the kernel space mapping.
3. all users of get_user_pages(current, current->mm, ...) need auditing,
   and additional cache maintainence added in their write paths to
   write back and invalidate the kernel mapping.
4. all fuse userspace programs need to run single-threaded.

Note: flush_anon_page() was added to work around 1 and 2 on other
architectures because fuse tripped over this issue there.

So, given all this additional complexity _and_ that it would only be
safe on non-preempt UP, the question becomes: is using get_user_pages()
to access the current processes memory space legal?  Given the above,
I would say not.

I suggest that in order for fuse to work reliably on ARM, it is modified
to behave more like a reasonable device driver, and use the functions
defined in asm/uaccess.h when it wants to access the current processes
VM space.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux