[rfc: patch 0/6] scalable fd management

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is RFC-only at the moment, but if things look well,
I would like to see this patchset get some testing in -mm.

This patchset probably doesn't set the record for longest
gestation period, but I am still glad that we can use it
to solve a problem that a lot of people care about *now*.
Maneesh and I developed this patch in 2001
where we used somewhat dodgy locking and copious memory
barriers to demonstrate making file descriptor look-up
(then fget()) lock-free using RCU. The main advantage
was with threads, but there were other problems confronting threads
users then and I decided not to push for it. Since threads
performance is now important for a lot of people, it is
time to revisit the issue. Whether we like java or not, it
is a reality, so are threaded apps. Andrew Tridgell has a
test (http://samba.org/ftp/unpacked/junkcode/thread_perf.c)
which shows that on a 4-cpu P4 box, a "readwrite"
syscall test ran twice as fast using processes as threads.

An earlier version of this patchset was published and
discussed a few months ago :
(http://marc.theaimsgroup.com/?t=109144217400003&r=1&w=2)
The consensus there was that it makes sense to make
fget()/fget_light() lock-free in order to avoid the
cache line bouncing on ->files_lock typically when the fd table
is shared. The problem with that version of the patchset
was that it piggybacked on kref and complicated a simple
kref api meant for use with strict safety rules. It required
invasive changes to wherever f_count was being used. It also
used an RCU model from 2001 with explicit memory barriers
which we don't need to use anymore.

Recently, I rewrote the patchset with the following ideas :

1. Instead of using explicit memory barriers to make the fd table
   array and fdset updates appear atomic to lock-free readers,
   I split the fd table in files_struct and put it in a separate
   structure (struct fdtable). Whenever fd array/set expansion
   happens, I allocate a new fdtable, copy the contents and
   atomically update the pointer. This allows me to use
   the recent rcu_assign_pointer() and rcu_dereference() macros.
   Howver this required significant changes in file management
   code in VFS. With this new locking model, all the known issues
   of the past have been taken care of.

2. Greg and I agreed not to loosen the kref apis. Instead I wrote
   a set of rcuref APIs that work on a regular atomic_t counters.
   This is *not* a separate refcounting API set, it is meant for
   use in regular refcounters when needed with RCU. With this,
   f_count users, however wrong they are, are spared. There is
   a separate patchset to clean some of them up, but that does not
   affect this patchset.

3. I added documentation for both the rcuref apis and for the new
   locking model I used for file descriptor table and file
   reference counting.


Testing :
-------
I have been beating up this patchset with multiplce instances of
LTP and a special test I wrote to exercise the vmalloced fdtable
path that uses keventd for freeing. It has survived 24+ hour
tests as well as a 72 hour run with chat benchmark
and fd_vmalloc tests. All this was on a 4(8)-way P4 xeon
system.

No slab leak or vmalloc leak.

I would appreciate if someone tests this on an arch without
cmpxchg (sparc32??). I intend to run some more tests
with preemption enabled and also on ppc64 myself.

Performance results :
-------------------

tiobench on a 4(8)-way (HT) P4 system on ramdisk :

					(lockfree)
Test		2.6.10-vanilla	Stdev 	2.6.10-fd	Stdev
-------------------------------------------------------------
Seqread		1400.8	  	11.52	1465.4		34.27
Randread	1594  		8.86	2397.2		29.21
Seqwrite	242.72      	3.47	238.46		6.53
Randwrite	445.74		9.15	446.4		9.75

With Tridge's thread_perf test on a 4(8)-way (HT) P4 xeon system :

2.6.12-rc5-vanilla :

Running test 'readwrite' with 8 tasks
Threads     0.34 +/- 0.01 seconds
Processes   0.16 +/- 0.00 seconds

2.6.12-rc5-fd :

Running test 'readwrite' with 8 tasks
Threads     0.17 +/- 0.02 seconds
Processes   0.17 +/- 0.02 seconds

So, the lock-free file table patchset gets rid of the overhead
of doing I/O with thread.

Thanks
Dipankar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux