Re: More documentation: system call how-to

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ulrich,


On Wed, 1 Aug 2007, Ulrich Drepper wrote:

> How about adding the attached text to the Documentation directory?  I
> had to correct over the years to one or the other system call design
> problems.  Other problems couldn't be corrected anymore and we have to
> live with them.  Maybe spelling out the rules explicitly will help a bit.

Most definitely, but going through the list below, I could think of
maybe several more little things that people tend to forget when actually
/implementing/ the system call (and not necessarily the abstract level
design decisions such as argument(s) and sizes).


> I've added a few rules I could think of right now.  What should be
> added as well is a rule for 64-bit parameters on 32-bit platforms.  I
> leave this to the s390 people who have the biggest restrictions when
> it comes to this.

Yes, that must definitely be spelt out clearly, probably with examples
of how to do it right.

Another thing that's a must when designing a syscall would be thinking
of any security implications that it brings about and clearly spelling
out expected behaviour in all cases -- security could mean different
things for different syscalls, but just getting that word in here would
mean people don't make basic mistakes like introducing "xxx_set_xxx"
kind of syscalls that go ahead and modify kernel/global structures
without authors having even thought of how and why that's wrong.

Other than that, as I said above, probably what we also need is a
"system call implementation checklist" of some sort, which lists out the
basic things (copying buffers from/to userspace, various security checks,
other things I'm not recollecting currently) and how to get them right.


> Signed-off-by: Ulrich Drepper <[email protected]>
> 
> Rules for designing new system calls
> ------------------------------------
> 
> 1. Do not use multiplexing system calls.
> 
>    A practical argument is that it invariably reduces the number of
>    available parameters to the system call which will haunt people who
>    have to care about architectures with a limited set of registers
>    reserved for this purpose.
> 
>    Another aspect is that it is most likely slower.  The caller in
>    most cases knows exactly which sub-function of the system call is
>    needed.  If the decision about the sub-function is dynamic the
>    computation of the code could just as well be a computation of a
>    system call number.  The difference lies in the kernel where the
>    multiplexing always has to happen, even if the required
>    sub-function is known to the caller ahead of time.
> 
>    Adding new system calls is much cheaper: it is a word in a table.
>    This is much less code and data than the switch statement or
>    if-cascade needed to implement the multiplexer.
> 
>    Bad examples: sys_socketcall on x86, sys_futex, and several more
> 
> 
> 2. Use of ENOSYS:
> 
>    The runtime has to be able to distinguish non-existing system calls
>    due to old kernel versions from error conditions in an implemented
>    system call.  This means the ENOSYS error should never be used in
>    an error condition once a system call is implemented.
> 
>    Example: In sys_fallocate, if the file system does not implement the
>    fallocate operation, return EOPNOTSUPP and not ENOSYS.
> 
>    There is one exception to the rule: if rule #1 is violated and a
>    multiplexer system call is used, invalid sub-function codes should
>    be signaled using ENOSYS.
> 
>    Example: sys_futex
  ^^^

Probably makes sense to prefix "sad" or "unfortunate" here.

> 3. Choose parameters for growth
> 
>    It makes today no sense anymore to implement any system call which
>    restricts even on 32-bit machines the size of values indicating
>    file sizes or offsets to 32-bits.  64-bit values should be used
>    throughout.
> 
>    Example: sys_fadvise64, which should have been defined from day 1
>    like sys_fadvise64_64.

Again, this is a "bad" example.

>    Similarly, timeout granularity of seconds is not suitable anymore.
>    Most interfaces use nano-second resolution and a often used way
>    to specify such times and intervals is using the timespec structure.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux