Re: copy_from_user/copy_to_user question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2005-12-02 at 21:33 -0600, Robert Hancock wrote:
> Steven Rostedt wrote:
> > Nope, the kernel is always locked into memory.  If you take a page fault
> > from the kernel world, you will crash and burn. The kernel is never
> > "swapped out".  So if you are in kernel mode, going into do_page_fault
> > in arch/i386/mm/fault.c there is no path to swap a page in.  Even the
> > vmalloc_fault only handles a page not in the page global descriptor of
> > the current task.  But if this page is not mapped somewhere in memory
> > (not swapped out), you will get a kernel oops.
> > 
> > Kernel memory may never be swapped out. What happens if an interrupt
> > tries to use such memory. How does it handle sleeping?
> > 
> > Just change copy_to_user into memcopy, and see how long your system
> > stays up and running.  Do it on a machine that you don't need to worry
> > about rogue applications.  It won't last very long.
> 
> Yes, kernel memory is never swapped out. But my point is merely that as 
> far as I know there is no special handling in the copy_to/from_user 
> functions to handle the case where the userspace memory is swapped out, 
> and therefore this would not be an issue for accessing the memory 
> directly. Obviously this is not something that one should actually do, 
> since access faults are not trapped and on some architectures or 
> configurations it won't work at all.

Quite the contrary, it does handle the case for swapped out memory.

Lets take a look at it, shall we?

copy_to/from_user boils down to __copy_user in arch/i386/lib/usercopy.c:

#define __copy_user(to,from,size)					\
do {									\
	int __d0, __d1, __d2;						\
	__asm__ __volatile__(						\
		"	cmp  $7,%0\n"					\
		"	jbe  1f\n"					\
		"	movl %1,%0\n"					\
		"	negl %0\n"					\
		"	andl $7,%0\n"					\
		"	subl %0,%3\n"					\
		"4:	rep; movsb\n"					\
		"	movl %3,%0\n"					\
		"	shrl $2,%0\n"					\
		"	andl $3,%3\n"					\
		"	.align 2,0x90\n"				\
		"0:	rep; movsl\n"					\
		"	movl %3,%0\n"					\
		"1:	rep; movsb\n"					\
		"2:\n"							\
		".section .fixup,\"ax\"\n"				\
		"5:	addl %3,%0\n"					\
		"	jmp 2b\n"					\
		"3:	lea 0(%3,%0,4),%0\n"				\
		"	jmp 2b\n"					\
		".previous\n"						\
		".section __ex_table,\"a\"\n"				\
		"	.align 4\n"					\
		"	.long 4b,5b\n"					\
		"	.long 0b,3b\n"					\
		"	.long 1b,2b\n"					\
		".previous"						\
		: "=&c"(size), "=&D" (__d0), "=&S" (__d1), "=r"(__d2)	\
		: "3"(size), "0"(size), "1"(to), "2"(from)		\
		: "memory");						\
} while (0)


Take a look at the __ex_table. It adds the following addresses:

label 4, label 5, 
label 0, label 3, 
label 1, label 2.

Now lets see if we take a page fault at any of the labels 4, 0, or 1.
(where the copies actually are).

In do_page_fault in arch/i386/mm/fault.c

	if (!down_read_trylock(&mm->mmap_sem)) {
		if ((error_code & 4) == 0 &&
		    !search_exception_tables(regs->eip))
			goto bad_area_nosemaphore;
		down_read(&mm->mmap_sem);
	}

The error_code & 4 == 0 is true when we are in the kernel (which we
are).  And here's the difference between memcpy and copy_user.  The
search_exception_tables!  If this were to fail (as it would in memcpy)
we would just die here (jumping to bad_area_nosemaphore).  But lets now
look at search_exception_tables.

search_exception_tables is defined in kernel/extable.c, and the entries
to the table are in include/asm-i386/uaccess.h:

struct exception_table_entry
{
	unsigned long insn, fixup;
};

Remember those labels? Well the linker puts them into this table sorted,
by insn, so here we now have:

insn = label 4, fixup = label 5,
insn = label 0, fixup = label 3,
insn = label 1, fixup = label 2.
  (but sorted with all other exceptions)

The search_exception_tables does a binary search of all these sorted
exceptions, looking for the address where the exception took place.
Which would be at either label 4, 0 or 1, any of which would be found in
this table.

Then this would continue happily along and swap back in the pages (just
as if it was in user land) and return to the point it left off and
continue.

What's the fixup for in that table??  Well lets look at do_page_fault in
the case that this actually fails and goes to bad_area.  We don't want
to oops the kernel for some misbehavior of an application.

In usermode we would just SIGSEGV, but we are in kernel mode. So we
would drop down to fix_exception called by do_page_fault.

fix_exception in arch/i386/mm/extable.c has the following code.

	fixup = search_exception_tables(regs->eip);
	if (fixup) {
		regs->eip = fixup->fixup;
		return 1;
	}

So it once again searches the exception table for the bad address, which
we find. And say this happened at label 4, we set our new eip to 
label 5, so when we return from this exception, we jump to label 5 which
is really in the .fixup section (not actually after the code of label 4)

So lets take a look at this code again:

		"	cmp  $7,%0\n"					\
		"	jbe  1f\n"					\
		"	movl %1,%0\n"					\
		"	negl %0\n"					\
		"	andl $7,%0\n"					\
		"	subl %0,%3\n"					\
		"4:	rep; movsb\n"					\

We take an exception here (no page) and in fact, it is a bad
memory location!  So we go to do_page_fault, that eventually calls 
fixup_exception which puts the eip to label 5 (so we now jump there)


		"	movl %3,%0\n"					\
		"	shrl $2,%0\n"					\
		"	andl $3,%3\n"					\
		"	.align 2,0x90\n"				\
		"0:	rep; movsl\n"					\
		"	movl %3,%0\n"					\
		"1:	rep; movsb\n"					\
		"2:\n"							\
		".section .fixup,\"ax\"\n"				\

We jump to label 5 and add what we didn't write and then jump to
label 2 above.  This is really in the fixup section, so these 
jumps are really jumping around in the code and they are not
as close as this looks.

label 2 above just exits, so the copy_user here now returns the
number of bytes that were not copied!

		"5:	addl %3,%0\n"					\
		"	jmp 2b\n"					\
		"3:	lea 0(%3,%0,4),%0\n"				\
		"	jmp 2b\n"					\
		".previous\n"						\
		".section __ex_table,\"a\"\n"				\
		"	.align 4\n"					\
		"	.long 4b,5b\n"					\
		"	.long 0b,3b\n"					\
		"	.long 1b,2b\n"					\

Pretty neat eh?

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux