Re: [PATCH] Add "-o bh" option to ext3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2006-03-14 at 10:47 +0100, Pavel Machek wrote:
> Hi!
> 
> > > > Its not really need for now, but as we try to make "nobh"
> > > > as default option, it would be nice to have a "-obh" fallback
> > > > option - if things go wrong.
> > > 
> > > Docs patch is missing...
> > > 
> > > ...and no, it is not even clear to me what bh vs. nobh does...
> > 
> > Hope this helps.
> 
> Not really, I still am not sure what it does. Is it like "nobh is more
> effective code, and should have exactly zero impact to the user, but
> as it is new, we make it optional"?

I wish, its that easy to say :)

Historically (2.4 and earlier), buffer_head is the primary structure for
doing IO. We also used it as the interface between VFS, helper functions
and filesystem-specific code to pass physical disk block# information.
We also used them to link buffers/pages/data to JBD transactions to
provide ordering guarantees (for various journal modes).

Now (2.6), we no longer use buffer_head as a primary IO descriptor, 
but we still use it for other reasons. In general, buffer_heads are 
evil - eats up low mem, lots of them floating around, bigger code path,
bigger memory foot print, causes TLB/SLB misses, causes fragmentation
etc..

"nobh" option tries to attaching buffer_head to pages to cache disk
block mapping information. Where ever its needed, it uses temporary
(on stack) buffer_head to pass it to lower-level filesystem-specific
code and uses the disk block# mapping info from it - to create bios.
(BTW, since its also used for transaction ordering - we can't support
"nobh" option for all journaling modes).

Now, "zero impact to user ?" - don't know for sure. Since buffer_head
nicely cache disk block mapping information - we save on calls to
filesystem->get_block() when we need this. With "nobh" option, 
we need to do this every time. Especially on filesystems with 
blocksize < pagesize (1k, 2k) - we may need to multiple calls to 
->get_block() to get all the disk block#s for a single page (4k). 
These calls,  *in theory* could end up doing a disk read. All the
benefits of not having buffer_heads may be worth taking this 
overhead ?  Don't know for sure - thats why this is an "option" 
for now :(

Clear as mud ? :)

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux