Re: [RFC] fsblock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nick Piggin wrote:
- No deadlocks (hopefully). The buffer layer is technically deadlocky by
  design, because it can require memory allocations at page writeout-time.
  It also has one path that cannot tolerate memory allocation failures.
  No such problems for fsblock, which keeps fsblock metadata around for as
  long as a page is dirty (this still has problems vs get_user_pages, but
  that's going to require an audit of all get_user_pages sites. Phew).

- In line with the above item, filesystem block allocation is performed
  before a page is dirtied. In the buffer layer, mmap writes can dirty a
  page with no backing blocks which is a problem if the filesystem is
  ENOSPC (patches exist for buffer.c for this).

This raises an eyebrow... The handling of ENOSPC prior to mmap write is more an ABI behavior, so I don't see how this can be fixed with internal changes, yet without changing behavior currently exported to userland (and thus affecting code based on such assumptions).


- An inode's metadata must be tracked per-inode in order for fsync to
  work correctly. buffer contains helpers to do this for basic
  filesystems, but any block can be only the metadata for a single inode.
  This is not really correct for things like inode descriptor blocks.
  fsblock can track multiple inodes per block. (This is non trivial,
  and it may be overkill so it could be reverted to a simpler scheme
  like buffer).

hrm; no specific comment but this seems like an idea/area that needs to be fleshed out more, by converting some of the more advanced filesystems.


- Large block support. I can mount and run an 8K block size minix3 fs on
  my 4K page system and it didn't require anything special in the fs. We
  can go up to about 32MB blocks now, and gigabyte+ blocks would only
  require  one more bit in the fsblock flags. fsblock_superpage blocks
  are > PAGE_CACHE_SIZE, midpage ==, and subpage <.

definitely useful, especially if I rewrite my ibu filesystem for 2.6.x, like I've been planning.


So. Comments? Is this something we want? If yes, then how would we
transition from buffer.c to fsblock.c?

Your work is definitely interesting, but I think it will be even more interesting once ext2 (w/ dir in pagecache) and ext3 (journalling) are converted.

My gut feeling is that there are several problem areas you haven't hit yet, with the new code.

Also, once things are converted, the question of transitioning from buffer.c will undoubtedly answer itself. That's the way several of us handle transitions: finish all the work, then look with fresh eyes and conceive a path from the current code to your enhanced code.

	Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux