Re: [PATCH] splice support #2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Linus Torvalds <[email protected]> wrote:

> In particular, what happens when you try to connect two streaming 
> devices, but the destination stops accepting data? You cannot put the 
> received data "back" into the streaming source any way - so if you 
> actually want to be able to handle error recovery, you _have_ to get 
> access to the source buffers.

i'd rather implement this error case as an exception mechanism, instead 
of a forced intermediary buffer mechanism.

We should extend the userspace API so that it is prepared to receive 
'excess data' via a separate 'flush excess data to' file descriptor:

	sys_splice(fd_in, fd_out, fd_flush, size,
                   max_flush_size, *bytes_flushed)

Note1: fd_flush can be a pipe too! This would avoid copies in the 
       exception case - if the exception case is expected to be common.

Note2: max_flush_size serves as hint and as a natural 'buffering limit' 
       for the kernel-internal loops. I believe it's more natural than 
       the implicit 'pipe buffering limit' we currently have.
       max_flush_size == 0 would say to the kernel: 'use whatever 
       buffering is natural or necessary'. E.g. if fd_flush is a pipe, 
       it would automatically set the buffering size to the flush-pipe's
       internal buffering limit.

Note3: we could even eliminate the "*bytes_flushed" parameter from 
       the syscall: as fd_flush's seek offset gives userspace an idea 
       about how much data was written to it.

Note4: if the user messes up fd_flush so that the kernel's "excess data" 
       transfer into fd_flush failes then that's 'tough luck' and flush 
       data may be lost. Users can use pipes [if the exception case is 
       common and they want to optimize that codepath] or can pre-write 
       their files if they need a 100% guarantee.
       In fact, the kernel doesnt even have to _look up_ fd_flush in the 
       common case. It's the application's responsibility to make sure 
       the exception case will work. This means that the _only_ overhead 
       from this exception mechanism are the 2-3 extra parameters to
       sys_splice(). That's _much_ faster.

Just look at the beauty of this generalization. fd_flush can be 
_anything_. It could be a pipe. It could be a temporary file in /tmp. It 
could be a file over the network. fd_flush could be mmap()-ed to 
user-space! Or it could even be -1 if the user is not interested in the 
error case for the streaming data. (For example a good portion of video 
and audio playback applications are not interested in the fd_out error 
case at all: such data can easily lose 'value' if it gets delayed by 
more than a few milliseconds and the right answer is to skip the frame 
or display an error message, ignoring the lost data.)

But for heaven's sake: do not slow down the 99.9999999999% fastpath by 
forcing a pipe inbetween on the ABI level! I really have nothing against 
making sys_splice() generic and i agree that a very good first step to 
achieve that is to include pipes in the implementation, but i dont think 
pipes are (or should be) all that critical and fundamental to the splice 
data-streaming concept itself, as you are suggesting.

> Also, for signal handling, you need to have some way to keep the pipe 
> around for several iterations on the sender side, while still 
> returning to user space to do the signal handler.

i believe the signal case is naturally handled by the fd_flush approach 
too - in fact it can also acts as a nice tester for the exception 
handling mechanism.

If the application in question expects to get many signals then it can 
use a pipe as fd_flush. (But signal-heavy apps are quite rare: most 
performance-critical apps avoid them for the fastpath like the plague, 
on modern CPUs it's more expensive to receive and handle a single signal 
than to create and tear down a completely new thread (!))

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux