Re: [take22 0/4] kevent: Generic event handling mechanism.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Evgeniy Polyakov a écrit :

If there would exist sockets support, then I could patch it to work with
kevents.


OK I post here my last version of epoll_bench.

It works with pipes (default),
or AF_UNIX socketpair() (option -u),
or AF_INET sockets (on loopback device), (option -i)
Only one machine involved (so no real ethernet trafic, and a limit on max number of AF_INET sockets since I use one listener 'only')

Option -f ask to bypass epoll.

On a dual opteron 246 machine (2GHZ cpu, 1MB of cache on each cpu, but somewhat busy 2.6.18 )

Perf for 2000 concurrent streams is :

259643 evts/sec for pipes
170188 evts/sec for AF_UNIX sockets (-u)
58771 evts/sec for AF_INET sockets (-i)
69475 evts/sec for AF_INET and no epoll gathering at all. (-i -f)

I believe difference between AF_INET sockets and other streams come from synchronous/asynchronous wakeups : I added counters of context switches per second and also number of events handled per epoll_wait() call, and we can see that in AF_INET case, the consumer is awaken more often. That means lower latency, but less bandwidth alas.

Detailed Results :
pipe
# ./epoll_bench -n 2000
2000 handles setup
255320 evts/sec 362.074 samples per call
254054 evts/sec 10473 ctxt/sec 381.569 samples per call
249868 evts/sec 9155 ctxt/sec 407.461 samples per call
181010 evts/sec 22656 ctxt/sec 420.36 samples per call
233368 evts/sec 8565 ctxt/sec 348.773 samples per call
284682 evts/sec 11114 ctxt/sec 299.987 samples per call
292485 evts/sec 10235 ctxt/sec 279.042 samples per call
279194 evts/sec 10760 ctxt/sec 267.694 samples per call
267917 evts/sec 12035 ctxt/sec 264.106 samples per call
291450 evts/sec 11024 ctxt/sec 247.028 samples per call
266837 evts/sec 11732 ctxt/sec 241.915 samples per call
272762 evts/sec 11492 ctxt/sec 247.629 samples per call
253756 evts/sec 11011 ctxt/sec 253.395 samples per call
251250 evts/sec 9912 ctxt/sec 259.88 samples per call
260706 evts/sec 10754 ctxt/sec 265.079 samples per call
Avg: 259643 evts/sec

 AF_UNIX
# ./epoll_bench -n 2000 -u
2000 handles setup
264827 evts/sec 6.01538 samples per call
259241 evts/sec 15682 ctxt/sec 5.70332 samples per call
262266 evts/sec 17072 ctxt/sec 5.64829 samples per call
262730 evts/sec 16744 ctxt/sec 5.43087 samples per call
253212 evts/sec 17343 ctxt/sec 5.14736 samples per call
255219 evts/sec 17579 ctxt/sec 5.0197 samples per call
166655 evts/sec 13090 ctxt/sec 5.27575 samples per call
111348 evts/sec 10127 ctxt/sec 5.61362 samples per call
104812 evts/sec 9476 ctxt/sec 5.93361 samples per call
95897 evts/sec 8876 ctxt/sec 6.22481 samples per call
97096 evts/sec 9372 ctxt/sec 6.51874 samples per call
113808 evts/sec 11142 ctxt/sec 6.86422 samples per call
102509 evts/sec 10035 ctxt/sec 7.17618 samples per call
100318 evts/sec 9731 ctxt/sec 7.47926 samples per call
102893 evts/sec 9458 ctxt/sec 7.78841 samples per call
Avg: 170188 evts/sec

AF_INET
# ./epoll_bench -n 2000 -i
2000 handles setup
69210 evts/sec 2.97224 samples per call
59436 evts/sec 12876 ctxt/sec 5.48675 samples per call
60722 evts/sec 12093 ctxt/sec 8.03185 samples per call
60583 evts/sec 14582 ctxt/sec 10.5644 samples per call
58192 evts/sec 12066 ctxt/sec 12.999 samples per call
54291 evts/sec 10613 ctxt/sec 15.2398 samples per call
47978 evts/sec 10942 ctxt/sec 17.2222 samples per call
59009 evts/sec 13692 ctxt/sec 19.6426 samples per call
58248 evts/sec 15099 ctxt/sec 22.0306 samples per call
58708 evts/sec 15118 ctxt/sec 24.4497 samples per call
58613 evts/sec 14608 ctxt/sec 26.816 samples per call
58490 evts/sec 13593 ctxt/sec 29.1708 samples per call
59108 evts/sec 15078 ctxt/sec 31.5557 samples per call
59636 evts/sec 15053 ctxt/sec 33.9292 samples per call
59355 evts/sec 15531 ctxt/sec 36.2914 samples per call
Avg: 58771 evts/sec

The last sample shows that epoll overhead is very small indeed, since disabling it doesnt boost AF_INET perf at all.
AF_INET + no epoll
# ./epoll_bench -n 2000 -i -f
2000 handles setup
79939 evts/sec
78468 evts/sec 9989 ctxt/sec
73153 evts/sec 10207 ctxt/sec
73668 evts/sec 10163 ctxt/sec
73667 evts/sec 20084 ctxt/sec
74106 evts/sec 10068 ctxt/sec
73442 evts/sec 10119 ctxt/sec
74220 evts/sec 10122 ctxt/sec
74367 evts/sec 10097 ctxt/sec
64402 evts/sec 47873 ctxt/sec
53555 evts/sec 58733 ctxt/sec
46000 evts/sec 48984 ctxt/sec
67052 evts/sec 21006 ctxt/sec
68460 evts/sec 12344 ctxt/sec
67629 evts/sec 10655 ctxt/sec
Avg: 69475 evts/sec

I add here oprofile results for the AF_INET (with epoll) test

CPU: AMD64 processors, speed 1992.3 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 50000
samples  %        symbol name
1127210   9.1969  tcp_sendmsg
692516    5.6502  fget_light
598653    4.8844  lock_sock
575396    4.6946  __tcp_push_pending_frames
364699    2.9756  tcp_ack
364352    2.9727  tcp_v4_rcv
356383    2.9077  ipt_do_table
324388    2.6467  do_sync_write
257869    2.1039  wait_on_retry_sync_kiocb
255977    2.0885  inet_sk_rebuild_header
255171    2.0819  tcp_recvmsg
249554    2.0361  copy_user_generic_c
232551    1.8974  tcp_transmit_skb
215471    1.7580  release_sock
208563    1.7017  tcp_window_allows
194983    1.5909  kfree
186842    1.5244  system_call
180074    1.4692  kmem_cache_free
160799    1.3120  ep_poll_callback
159235    1.2992  update_send_head
134291    1.0957  sys_epoll_wait
133670    1.0906  ip_queue_xmit
132829    1.0837  ret_from_sys_call
129348    1.0553  __mod_timer
129258    1.0546  sys_write
117884    0.9618  tcp_rcv_established
115181    0.9398  tcp_poll
102805    0.8388  memcpy
99017     0.8079  skb_clone
91125     0.7435  vfs_write
87087     0.7105  __kfree_skb
75387     0.6151  tcp_mss_to_mtu
72483     0.5914  init_or_fini
72207     0.5891  do_sync_read
72054     0.5879  tcp_ioctl
70555     0.5757  local_bh_enable_ip
70001     0.5711  tg3_start_xmit_dma_bug
69914     0.5704  ip_local_deliver
69002     0.5630  tcp_v4_do_rcv
68681     0.5604  dev_queue_xmit
68411     0.5582  do_ip_getsockopt
68235     0.5567  skb_copy_datagram_iovec
66489     0.5425  local_bh_enable

oprofile results for the pipe case :
(where epoll is not noise)

CPU: AMD64 processors, speed 1992.3 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 50000
samples  %        symbol name
1346203  12.2441  ep_poll_callback
1220770  11.1033  pipe_writev
1020377   9.2806  sys_epoll_wait
991913    9.0218  pipe_readv
779611    7.0908  fget_light
638929    5.8113  __wake_up
625332    5.6876  current_fs_time
486427    4.4242  __mark_inode_dirty
385763    3.5086  __write_lock_failed
217402    1.9773  system_call
175292    1.5943  sys_write
153698    1.3979  __wake_up_common
153242    1.3938  bad_pipe_w
143597    1.3061  generic_pipe_buf_map
140814    1.2807  pipe_poll
130028    1.1826  ret_from_sys_call
122930    1.1181  do_pipe
122359    1.1129  copy_user_generic_c
107443    0.9772  file_update_time
106037    0.9644  sysret_check
101256    0.9210  sys_read
99176     0.9020  iov_fault_in_pages_read
96823     0.8806  generic_pipe_buf_unmap
96675     0.8793  vfs_write
64635     0.5879  rw_verify_area
62997     0.5730  pipe_ioctl
60983     0.5547  tg3_start_xmit_dma_bug
59624     0.5423  get_task_comm
49573     0.4509  tg3_poll
46041     0.4188  schedule
44321     0.4031  vfs_read
35962     0.3271  eventpoll_release_file
30267     0.2753  tg3_write_flush_reg32
29395     0.2674  ipt_do_table
27683     0.2518  page_to_pfn
27492     0.2500  touch_atime
24921     0.2267  memcpy


Eric
/*
 * How to stress epoll
 *
 * This program uses many pipes|sockets and two threads.
 * First we open as many pipes|sockets we can. (see ulimit -n)
 * Then we create a worker thread.
 * The worker thread will send bytes to random streams.
 * The main thread uses epoll to collect ready events and clear them, reading streams.
 * Each second, a number of collected events is printed on stderr
 * After one minute, program prints an average value and stops.
 *
 * Usage : epoll_bench [-f] [-{u|i}] [-n X]
 *   -f : No epoll loop, just feed streams in a cyclic manner
 *   -u : Use AF_UNIX sockets (instead of pipes)
 *   -i : Use AF_INET sockets
 */
#include <pthread.h>
#include <stdlib.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <sys/epoll.h>
#include <signal.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/socket.h>
# include <netinet/in.h>
#include <fcntl.h>
#include <sys/ioctl.h>

int nbhandles = 1024;
int time_test = 15;
unsigned long nbhandled;
unsigned long epw_samples;
unsigned long epw_samples_cnt;

struct pipefd {
	int fd[2];
} *tab;

int epoll_fd;
int fflag;
int afunix;
int afinet;

static int alloc_streams()
{
	int i;
	int listen_sock;
	struct sockaddr_in me, to;
	socklen_t namelen;
    int on = 1;
	int off = 0;

	if (!fflag) {
		epoll_fd = epoll_create(nbhandles);
		if (epoll_fd == -1) {
			perror("epoll_create");
			return -1;
		}
	}
	tab = malloc(sizeof(struct pipefd) * nbhandles);
	if (tab == NULL) {
		perror("malloc");
		return -1;
	}
	if (afinet) {
		listen_sock = socket(AF_INET, SOCK_STREAM, 0);
		if (listen_sock == -1) {
			perror("socket");
			return -1;
		}
		if (listen(listen_sock, 256) == -1) {
			perror("listen");
			return -1;
		}
		namelen = sizeof(me);
		getsockname(listen_sock, (struct sockaddr *)&me, &namelen);
	}
	for (i = 0 ; i < nbhandles ; i++) {
		if (afinet) {
			tab[i].fd[0] = socket(AF_INET, SOCK_STREAM, 0);
			if (tab[i].fd[0] == -1)
				break;
			to = me;
			ioctl(tab[i].fd[0], FIONBIO, &on);
			if (connect(tab[i].fd[0], (struct sockaddr *)&to, sizeof(to)) != -1 || errno != EINPROGRESS)
				break;
			tab[i].fd[1] = accept(listen_sock, (struct sockaddr *)&to, &namelen);
			if (tab[i].fd[1] == -1)
				break;
			ioctl(tab[i].fd[0], FIONBIO, &off);
		}
		else if (afunix) {
			if (socketpair(AF_UNIX, SOCK_STREAM, 0, tab[i].fd) == -1)
				break;
		} else {
			if (pipe(tab[i].fd) == -1)
				break;
		}
		if (!fflag) {
			struct epoll_event ev;
			ev.events = EPOLLIN | EPOLLET;
			ev.data.u64 = (uint64_t)i;
			epoll_ctl(epoll_fd, EPOLL_CTL_ADD, tab[i].fd[0], &ev);
		}
	}
	nbhandles = i;
	printf("%d handles setup\n", nbhandles);
	return 0;
}

int sample_proc_stat(long *ctxt)
{
	int fd = open("/proc/stat", 0);
	char buffer[4096+1], *p;
	int lu;
	*ctxt = 0;
	if (fd == -1) {
		perror("/proc/stat");
		return -1;
	}
	lu = read(fd, buffer, sizeof(buffer));
	close(fd);
	if (lu < 10)
		return -1;
	buffer[lu] = 0;
	p = strstr(buffer, "ctxt");
	if (p)
		*ctxt = atol(p + 4);
	return 0;
}


static void timer_func()
{
	char buffer[128];
	size_t len;
	static unsigned long old;
	static unsigned long oldctxt=0;
	unsigned long ctxt;
	unsigned long delta = nbhandled - old;
	static int alarm_events = 0;

	old = nbhandled;
	len = sprintf(buffer, "%lu evts/sec", delta);
	sample_proc_stat(&ctxt);
	delta = ctxt - oldctxt;
	if (delta && oldctxt)
		len += sprintf(buffer + len, " %lu ctxt/sec", delta);
	oldctxt = ctxt;
	if (epw_samples)
		len += sprintf(buffer + len, " %g samples per call", (double)epw_samples_cnt/(double)epw_samples);
	buffer[len++] = '\n';
	write(2, buffer, len);
	if (++alarm_events >= time_test) {
		delta = nbhandled/alarm_events;
		len = sprintf(buffer, "Avg: %lu evts/sec\n", delta);
		write(2, buffer, len);
		exit(0);
	}
}

static void timer_setup()
{
	struct itimerval it;
	struct sigaction sg;

	memset(&sg, 0, sizeof(sg));
	sg.sa_handler = timer_func;
	sigaction(SIGALRM, &sg, 0);
	it.it_interval.tv_sec = 1;
	it.it_interval.tv_usec = 0;
	it.it_value.tv_sec = 1;
	it.it_value.tv_usec = 0;
	if (setitimer(ITIMER_REAL, &it, 0))
		perror("setitimer");
}

static void * worker_thread_func(void *arg)
{
	int fd = -1;
	char c = 1;
	int cnt = 0;
	nice(10);
	for (;;) {
		if (fflag)
			fd = (fd + 1) % nbhandles;
		else
			fd = rand() % nbhandles;
		write(tab[fd].fd[1], &c, 1);
		if (++cnt >= nbhandles) {
			cnt = 0 ;
			pthread_yield(); /* relax :) */
			}
	}
}

void usage(int code)
{
	fprintf(stderr, "Usage : epoll_bench [-n num] [-{u|i}] [-f] [-t duration] [-l limit] [-e maxepoll]\n");
	exit(code);
}

int main(int argc, char *argv[])
{
	char buff[1024];
	pthread_t tid;
	int c, fd;
	int limit = 1000;
	int max_epoll = 1024;

	while ((c = getopt(argc, argv, "fuin:l:e:t:")) != EOF) {
		if (c == 'n') nbhandles = atoi(optarg);
		else if (c == 'f') fflag++;
		else if (c == 'l') limit = atoi(optarg);
		else if (c == 'e') max_epoll = atoi(optarg);
		else if (c == 't') time_test = atoi(optarg);
		else if (c == 'u') afunix++;
		else if (c == 'i') afinet++;
		else usage(1);
	}
	alloc_streams();
	pthread_create(&tid, NULL, worker_thread_func, (void *)0);
	timer_setup();

	if (fflag) {
		for (fd = 0;;fd = (fd + 1) % nbhandles) {
			if (read(tab[fd].fd[0], buff, 1024) > 0)
				nbhandled++;
			}
		}
	else {
		struct epoll_event *events;
		events = malloc(sizeof(struct epoll_event) * max_epoll) ;
		for (;;) {
			int nb = epoll_wait(epoll_fd, events, max_epoll, -1);
			int i;
			epw_samples++;
			epw_samples_cnt += nb;
			for (i = 0 ; i < nb ; i++) {
				fd = tab[events[i].data.u64].fd[0];
				if (read(fd, buff, 1024) > 0)
					nbhandled++;
			}
			if (nb < limit)
				pthread_yield();
		}
	}
}

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux