Re: [patch] aio: add per task aio wait event condition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Dec 29, 2006, at 6:31 PM, Chen, Kenneth W wrote:

The AIO wake-up notification from aio_complete is really inefficient
in current AIO implementation in the presence of process waiting in
io_getevents().

Yeah, it's a real deficiency.  Thanks for taking a stab at it.

This patch adds a wait condition to the wait queue and only wake-up
process when that condition meets.  And this condition is added on a
per task base for handling multi-threaded app that shares single ioctx.

But only one of the waiting tasks is tested, the one at the head of the list. It looks like this change could starve a io_getevents() with a low min_nr in the presence of another io_getevents() with a larger min_nr.

Before:
0 0 0 3972608 7056 31312 0 0 14100 0 7885 13747 0 2 98 0
After:
0 0 0 3972608 7056 31312 0 0 13800 0 7885 42 0 2 98 0

Nice.  What min_nr was used in this test?

+struct aio_wait_queue {
+	int		nr_wait;	/* wake-up condition */

It appears that this is never assigned a negative? Can we make it that explicit in the type so that we reviewers don't have to worry about wrapping and signed comparisons?

-	DECLARE_WAITQUEUE(wait, tsk);
+	struct aio_wait_queue wait;

+	aio_init_wait(&wait);

This just changed from using default_wake_function() to autoremove_wait_function(). Very sneaky! wait_for_all_aios() should be adding the wait queue before going to sleep each time. (better still to just use wait_event()).

Was this on purpose? I'm all for it as a way to reduce wakeups from a stream of completions to a single waiter.

+	nr_evt = ring->tail - ring->head;
+	if (nr_evt < 0)
+		nr_evt += info->nr;

 int = unsigned - unsigned;
 if (int < 0)

My head already hurts. Can we clean this up so one doesn't have to live and breath type conversion rules to tell if this code is correct?

+	if (waitqueue_active(&ctx->wait)) {
+		struct aio_wait_queue *wait;
+		wait = container_of(ctx->wait.task_list.next,
+				    struct aio_wait_queue, wait.task_list);
+		if (nr_evt >= wait->nr_wait)
+			wake_up(&ctx->wait);
+	}

First is the fear of starvation as mentioned previously.

issue 2 ops
first io_getevents sleeps with a min_nr of 2
second io_getevents sleeps with min_nr of 3
2 ops complete but only test the second sleeper's min_nr of 3
first sleeper twiddles thumbs

This makes me think this elegant task_list approach is doomed. I think this is what stopped Ben and I from being interested in this last time we talked about it :).

Also, is that container_of() and dereference safe in the presence of racing wake-ups? It looks like we could get deref a freed wait and get a bogus nr_wait and decide not to wake.

Andrew, I fear we should remove this from -mm until it's fixed up.

- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux