On Thu, 22 Sep 2005, Vadim Lobanov wrote:
1. Size
The size parameter to sys_epoll_create() is simply sanity-checked (has
to be greater than zero) and then ignored. I presume the current
implementation works perfectly without this parameter, so I am rather
curious why it is even passed in. Historical reasons? Future code
improvements? On the same note, I'd like to suggest that '0' also should
be an allowed value, for the case when the application really does not
know what the size estimate should be.
This born from the initial epoll implementation, that was using the size
parameter to size the hash. Since now epoll uses rbtrees, the parameter is
no more used. Changing the API to remove it, and break userspace, was/is
not a good idea.
2. Timeout
It seems that the timeout parameter in sys_epoll_wait() is not handled
quite correctly. According to the manpages, a value of '-1' means
infinite timeout, but the effect of other negative values is left
undefined. In fact, if you run a userland program that calls
epoll_wait() with a timeout value of '-2', the kernel prints an error
into /var/log/messages from within schedule_timeout(), due to its
argument being negative. It seems there are two ways to correct this
behavior:
- Check the passed timeout for being less than '-1', and return an
error. A new errno value needs to be introduced into the epoll_wait()
API.
- Redefine the epoll_wait() API to accept any negative value as an
infinite timeout, and change the code appropriately.
Yes, it is nicer if epoll behaves like poll. (*)
3. Wakeup
As determined by testing with userland code, the sys_tgkill() and
sys_tkill() functions currently will NOT wake up a sleeping
epoll_wait(). Effectively, this means that epoll_wait() is NOT a pthread
cancellation point. There are two potential issues with this:
- epoll_wait() meets the unofficial(?) definition of a "system call that
may block".
- epoll_wait() behaves differently from poll() and friends.
The epoll_wait() wait loop is the standard one that even poll() uses (prep
wait, make interruptible, test signals, sched timeo). So if poll() is woke
up, so should epoll_wait(). A minimal code snippet that proves poll()
behing woke up, and epoll_wait() not, would help.
4. Code Duplication
As sys_tgkill() and sys_tkill() are currently written, a large portion
of the two functions is duplicated. It might make sense to pull that
equivalent code out into a separate function.
This should be moved in "[RFC] something else" ;)
Comments please? In particular, the pthread cancellation issue is
worrysome. In the case that any of the above points turn into actual
code TODOs, I'll be more than happy to cook up and submit the patches.
[*] No need, since it's a one liner.
- Davide
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
|
|