Re: [patch 05/11] syslets: core code

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2) On the client facing side (port 53), I'd very much hope for a way to
	do 'recvv' on datagram sockets, so I can retrieve a whole bunch of
	UDP datagrams with only one kernel transition.

I want to highlight this point that Bert is making.

Whenever we talk about AIO and kernel threads some folks are rightly concerned that we're talking about a thread *per IO* and fear that memory consumption will be fatal.

Take the case of userspace which implements what we'd think of as page cache writeback. (*coughs, points at email address*). It wants to issue thousands of IOs to disjoint regions of a file. "Thousands of kernel threads, oh crap!"

But it only issues each IO with a separate syscall (or io_submit() op) because it doesn't have an interface that lets it specify IOs that vector user memory addresses *and file position*.

If we had a seemingly obvious interface that let it kick off batched IOs to different parts of the file, the looming disaster of a thread per IO vanishes in that case.

struct off_vec {
	off_t pos;
	size_t len;
};

long sys_sgwrite(int fd, struct iovec *memvec, size_t mv_count,
	struct off_vec *ovec, size_t ov_count);

It doesn't take long to imagine other uses for this that are less exotic.

Take e2fsck and its iterating through indirect blocks or directory data blocks. It has a list of disjoint file regions (blocks) it wants to read, but it does them serially to keep the code from getting even more confusing. blktrace a clean e2fsck -f some time.. it's leaving *HALF* of the disk read bandwith on the table by performing serial block-sized reads. If it could specify batches of them the code would still be simple but it could tell the kernel and IO scheduler *exactly* what it wants, without having to mess around with sys_readahead() or AIO or any of that junk :).

Anyway, that's just something that's been on my mind. If there are obvious clean opportunities to get more done with single syscalls, it might not be such a bad thing.

- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux