Re: [PATCH] splice support #2

On Thu, 2006-03-30 at 13:16 -0800, Linus Torvalds wrote:
[...]
> splice() really can handle any fd->fd move.
> 
> The reason you want to have a pipe in the middle is that you have to 
> handle partial moves _some_ way. And the pipe being the buffer really does 
> allow that, and also handles the case of "what happens when we received 
> more data than we could write".
> 
> So the way to do copies is
> 
> 	int pipefd[2];
> 	unsigned long copied = 0;
> 
> 	if (pipe(pipefd) < 0)
> 		error
> 
> 	for (;;) {
> 		int nr = splice(in, pipefd[1], MAX_INT, 0);
> 		if (nr <= 0)
> 			break;
> 		do {
> 			int ret = splice(pipefd[0], out, nr, 0);
> 			if (ret <= 0) {
> 				error: couldn't write 'nr' bytes
> 				break;
> 			}
> 
> 			nr -= ret;
> 		} while (nr);
> 	}
> 	close(pipefd[0]);
> 	close(pipefd[1]);
> 
> which may _seem_ very complicated and the extra pipe seems "useless", but 
> it's (a) actually pretty efficient and (b) allows error recovery.
> 
> That (b) is the really important part. I can pretty much guarantee that 
> without the "useless" pipe, you simply couldn't do it.
> 
> In particular, what happens when you try to connect two streaming devices, 
> but the destination stops accepting data? You cannot put the received data 
> "back" into the streaming source any way - so if you actually want to be 
> able to handle error recovery, you _have_ to get access to the source 
> buffers.
> 
> Also, for signal handling, you need to have some way to keep the pipe 
> around for several iterations on the sender side, while still returning to 
> user space to do the signal handler. 
> 
> And guess what? That's exactly what you get with that "useless" pipe. For 
> error handling, you can decide to throw the extra data that the recipient 
> didn't want away (just close the pipe), and maybe that's going to be what 
> most people do. But you could also decide to just do a "read()" on the 
> pipefd, to just read the data into user space for debugging and/or error 
> recovery..
> 
> Similarly, for signals, the pipe _is_ that buffer that you need that is 
> consistent across multiple invocations.
> 
> So that pipe is not at all unnecessary, and in fact, it's critical. It may 

IOW the "pipe" above is in fact just a "handle" of a kernel-internal
buffer (of one page) and the two fd's can be used to tell sys-calls to
read from or write into the buffer. But it is much faster because it
avoids real copying of the data into a user-space buffer and out of the
user-space buffer since the kernel can manipulate the underlying data
directly as a typical implementation of the above code does (i.e. use
read(2) and write(2) instead of the two splice(2) calls).
Uuuuuh, real zero-copy seems possible.

	Bernd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- [PATCH] splice support #2
  - From: Jens Axboe <axboe@suse.de>
- Re: [PATCH] splice support #2
  - From: Ingo Molnar <mingo@elte.hu>
- Re: [PATCH] splice support #2
  - From: Jens Axboe <axboe@suse.de>
- Re: [PATCH] splice support #2
  - From: Linus Torvalds <torvalds@osdl.org>
- Re: [PATCH] splice support #2
  - From: Jeff Garzik <jeff@garzik.org>
- Re: [PATCH] splice support #2
  - From: Linus Torvalds <torvalds@osdl.org>

Prev by Date: Re: NFS client (10x) performance regression 2.6.14.7 -> 2.6.15
Next by Date: Re: [PATCH] splice support #2
Previous by thread: Re: [PATCH] splice support #2
Next by thread: Re: [PATCH] splice support #2
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]