Re: [PATCHv4 5/6] Allow setting O_NONBLOCK flag for new sockets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mon, 26 Nov 2007, H. Peter Anvin wrote:
> 
> I'm presuming you're not talking about some sort of syslets/fibrils/threadlets
> here (executing an interpreted thread of execution in kernel space).  That's a
> whole separate ball of wax.

Indeed. 

I'm hoping that just dies. It's too complex. But the "do this single 
system call asynchronously" isn't, and has lots of historical 
implementations, ranging from VMS to the braindead POSIX "aio" setup.

I do think that more complex threadlets could be useful in theory, I just 
doubt they'd be used in practice..

> > So the choice is basically one of:
> > 
> >  - come up with a totally new interface to system calls, and effectively
> > duplicating the whole system call table.
> > 
> >    I'd hate to do this. We already have duplicated system call tables due
> > to compat stuff, it's painful.
> 
> This would be the right thing to do if we were to redesign the system call
> interface from the ground up, which it doesn't exactly sound like we are
> intending.

Yeah. I'm also not sure it's the right thing even if we did redesign from 
scratch.

The current system call interface may look less than regular, but it has 
some very solid foundation: it's fast. Passing arguments in registers is 
by definition a lot faster *and*safer* than passing them any other way. 
There are no subtle security issues with people playing games with the 
argument base pointer (ie usually the stack pointer) and trying to fool 
the kernel into accessing kernel memory etc.

Immediately when you do anything but registers, it is much *much* more 
costly. The "get_user()" and "copy_from_user()" stuff is not exactly slow, 
but it's quite noticeable overhead for simple system calls. It gets worse 
if this all is described by some indirect table setup.

In the system call path, right now, for some system calls, the biggest two 
overheads are

 - the CPU system call overhead itself. We can't do much about this, but 
   the CPU designers do seem to be slowly getting it fixed (ie it's slower 
   than it should need to be, but it's a hell of a lot faster than a P4 
   used to be ;)

 - the cost of just the single indirect - and unpredictable - call.

(The second cost is actually often totally hidden in the trivial system 
call benchmarks people run: if the benchmark just does "getppid()" a 
million times in a tight loop, the indirect call on the system call number 
seems really quite fast, but outside of benchmarks it is generally totally 
unpredictable indeed, and a real cost for real-life system call usage).

Everything else in the system call path is generally as fast as we can 
make it. Doing more indirection and conditionals would be really quite 
nasty.

Of course, for *most* of system calls, the work the kernel actually does 
ends up being so big that it doesn't much matter, but I was literally 
chasing down why a page fault had slowed down by ~70 cycles two weeks ago. 
And it doesn't take more than a couple of unpredictable jumps to do things 
like that!

> The 6-word limit is a red herring.  There is at least two ways to deal with it
> (and this doesn't mean wiping the legacy stuff we already have):
> 
> - Let each architecture pick a calling convention and redefine the
> architecture-independent bits to take an arbitrary number of arguments.  This
> is a one-time panarchitectural change.

Not applicable on x86-32.

The six-word limit is effectively a hardware limit there. Once it goes 
past that limit, one of the words needs to be a pointer to extended 
information that is fundamentally slower to access. Happily, only very 
rare system calls do that (and none of them are of the simple variety 
where we see a few cycles easily).

On other architectures, we could more easily just use more registers. But 
x86-32 is still a big part (bulk) of what matters for most people.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux