Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Ulrich Drepper <[email protected]> wrote:

> Ingo Molnar wrote:
> > in terms of AIO, the best queueing model is i think what the kernel uses 
> > internally: freely ordered, with barrier support.
> 
> Speaking of AIO, how do you imagine lio_listio is implemented?  If 
> there is no asynchronous syscall it would mean creating a threadlet 
> for each request but this means either waiting or creating 
> several/many threads.

my current thinking is that special-purpose (non-programmable, static) 
APIs like aio_*() and lio_*(), where every last cycle of performance 
matters, should be implemented using syslets - even if it is quite 
tricky to write syslets (which they no doubt are - just compare the size 
of syslet-test.c to threadlet-test.c). So i'd move syslets into the same 
category as raw syscalls: pieces of the raw infrastructure between the 
kernel and glibc, not an exposed API to apps. [and even if we keep them 
in that category they still need quite a bit of API work, to clean up 
the 32/64-bit issues, etc.]

The size of the async thread pool can be kept in check either from 
user-space (by starting to queue up requests after a certain point of 
saturation without submitting them) or from kernel-space which involves 
waiting (the latter was present in v2 but i temporarily removed it from 
v3). "You have to wait" is the eventual final answer in every 
well-behaved queueing system anyway.

How things work out with a large number of outstanding threads in real 
apps is still an open question (until someone tries it) but i'm 
cautiously optimistic: in my own (FIO based) measurements syslets beat 
the native KAIO interfaces both in the cached and in the non-cached [== 
many threads] case. I did not expect the latter at all: the non-cached 
syslet codepath is not optimized at all yet, so i expected it to have 
(much) higher CPU overhead than KAIO.

This means that KAIO is in worse shape than i thought - there's just way 
too much context KAIO has to build up to submit parallel IO contexts. 
Many years of optimizations went into KAIO already, so it's probably at 
its outer edge of performance capabilities. Furthermore, what KAIO has 
to compete against in the syslet case are the synchronous syscalls 
turned async, and more than a decade of optimizations went into all the 
synchronous syscalls. Plus the 'threading overhead of syslets' really 
boils down to 'scheduling overhead' in the end - and we can do over a 
million context-switches a second, per CPU. What killed user-space 
thread-based AIO performance many moons ago wasnt really the threading 
concept itself or scheduling overhead, it was the (then) fragile 
threading implementation of Linux, combined with the resulting 
signal-based AIO code. Catching and handling a single signal is more 
expensive than a context-switch - and signals have legacies attached to 
them that make them hard to scale within the kernel. Plus with syslets 
the 'threading overhead' is optional, it only happens when it has to.

Plus there's the fundamental killer that KAIO is a /lot/ harder to 
implement (and to maintain) on the kernel side: it has to be implemented 
for every IO discipline, and even for the IO disciplines it supports at 
the moment, it is not truly asynchronous for things like metadata 
blocking or VFS blocking. To handle things like metadata blocking it has 
to resort to non-statemachine techniques like retries - which are bad 
for performance.

Syslets/threadlets on the other hand, once the core is implemented, have 
near zero ongoing maintainance cost (compared to KAIO pushed into every 
IO subsystem) and cover all IO disciplines and API variants immediately, 
and they are as perfectly asynchronous as it gets.

So all in one, i used to think that AIO state-machines have a long-term 
place within the kernel, but with syslets i think i've proven myself 
embarrasingly wrong =B-)

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux