Re: New filesystem for Linux

On Sat, Nov 04, 2006 at 07:27:48PM +0100, Eric Dumazet wrote:
> Gautham R Shenoy a écrit :
> >On Thu, Nov 02, 2006 at 10:52:47PM +0100, Mikulas Patocka wrote:
> >>Hi
> >
> >Hi Mikulas
> >>As my PhD thesis, I am designing and writing a filesystem, and it's now 
> >>in a state that it can be released. You can download it from 
> >>http://artax.karlin.mff.cuni.cz/~mikulas/spadfs/
> >>
> >>It has some new features, such as keeping inode information directly in 
> >>directory (until you create hardlink) so that ls -la doesn't seek much, 
> >>new method to keep data consistent in case of crashes (instead of 
> >>journaling), free space is organized in lists of free runs and converted 
> >>to bitmap only in case of extreme fragmentation.
> >>
> >>It is not very widely tested, so if you want, test it.
> >>
> >>I have these questions:
> >>
> >>* There is a rw semaphore that is locked for read for nearly all 
> >>operations and locked for write only rarely. However locking for read 
> >>causes cache line pingpong on SMP systems. Do you have an idea how to 
> >>make it better?
> >>
> >>It could be improved by making a semaphore for each CPU and locking for 
> >>read only the CPU's semaphore and for write all semaphores. Or is there a 
> >>better method?
> >
> >I am currently experimenting with a light-weight reader writer semaphore 
> >with an objective to do away what you call a reader side cache line
> >"ping pong". It achieves this by using a per-cpu refcount.
> >
> >A drawback of this approach, as Eric Dumazet mentioned elsewhere in this
> >thread, would be that each instance of the rw_semaphore would require
> >(NR_CPUS * size_of(int)) bytes worth of memory in order to keep track of
> >the per-cpu refcount, which can prove to be pretty costly if this
> >rw_semaphore is for something like inode->i_alloc_sem.
> 
> We might use an hybrid approach : Use a percpu counter if NR_CPUS <= 8
> 
> #define refcount_addr(zone, cpu) zone[cpu]
> 
> For larger setups, have a fixed limit of 8 counters, and use a modulo
> 
> #define refcount_addr(zone, cpu) zone[cpu & 7]
> 
> In order not use too much memory, we could use kind of vmalloc() space, 
> using one PAGE per cpu, so that addr(cpu) = base + (cpu)*PAGE_SIZE;
> (vmalloc space allows a NUMA allocation if possible)

The fact that counters are shared forces use of atomic instructions.

If the situation is highly read-intensive, another memory-saving
approach would be to share the "lock" among multiple inodes, for
example, hashing the inode address.  That way there would be NR_CPUS
counters per hash bucket, but (hopefully) far fewer hash buckets
than inodes.

						Thanx, Paul

> So instead of storing in an object a table of 8 pointers, we store only the 
> address for cpu0.
> 
> 
> >
> >So the question I am interested in is, how many *live* instances of this
> >rw_semaphore are you expecting to have at any given time?
> >If this number is a constant (and/or not very big!), the light-weight
> >reader writer semaphore might be useful.
> >
> >Regards
> >Gautham.
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- New filesystem for Linux
  - From: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
- Re: New filesystem for Linux
  - From: Gautham R Shenoy <ego@in.ibm.com>
- Re: New filesystem for Linux
  - From: Eric Dumazet <dada1@cosmosbay.com>

Prev by Date: Re: [PATCH 2.6.19 4/4] ehca: ehca_av.c use constant for max mtu
Next by Date: Re: PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex_slowlock()
Previous by thread: Re: New filesystem for Linux
Next by thread: Re: New filesystem for Linux
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]