Re: [RFC] kernel facilities for cache prefetching

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2006-05-03 at 12:11 +0800, Wu Fengguang wrote:
> On Tue, May 02, 2006 at 08:55:06AM -0700, Linus Torvalds wrote:
> > Doing prefetching on a physical block basis is simply not a valid 
> > approach, for several reasons:
> 
> Sorry!
> I made a misleading introduction. I'll try to explain it in more detail.
> 
> DATA ACQUISITION
> 
> /proc/filecache provides an interface to query the cached pages of any
> file. This information is expressed in tuples of <idx, len>, which
> more specifically means <mapping-offset, pages>.
> 
> Normally one should use 'echo' to setup two parameters before doing
> 'cat':
>         @file
>                 the filename;
>                 use 'ALL' to get a list all files cached
>         @mask
>                 only show the pages with non-zero (page-flags & @mask);
>                 for simplicity, use '0' to show all present pages(take 0 as ~0)
> 
> Normally, one should first get the file list using param 'file ALL',
> and then iterate through all the files and pages of interested with
> params 'file filename' and 'mask pagemask'.
> 
> The param 'mask' acts as a filter for different users: it allows
> sysadms to know where his memory goes, and the prefetcher to ignore
> pages from false readahead.
> 
> One can use 'mask hex(PG_active|PG_referenced|PG_mapped)' in its hex form
> to show only accessed pages(here PG_mapped is a faked flag), and use
> 'mask hex(PG_dirty)' to show only dirtied pages.
> 
> One can use 
>         $ echo "file /sbin/init" > /proc/filecache
>         $ echo "mask 0" > /proc/filecache
>         $ cat /proc/filecache
> to get an idea which pages of /sbin/init are currently cached.
> 
> In the proposal, I used the following example, which is proved to be
> rather misleading:
>         $ echo "file /dev/hda1" > /proc/filecache
>         $ cat /proc/filecache
> The intention of that example was to show that filesystem dir/inode
> buffer status -- which is the key data for user-land pre-caching --
> can also be retrieved through this interface.
> 
> So the proposed solution is to
>         - prefetch normal files on the virtual mapping level
>         - prefetch fs dir/inode buffers on a physical block basis
> 
> I/O SUBMISSION
> How can we avoid unnecessary seeks when prefetching on virtual mapping
> level?  The answer is to leave this job to i/o elevators. What we
> should do is to present elevators with most readahead requests before
> too many requests being submitted to disk drivers.
> The proposed scheme is to:
>         1) (first things first)
>            issue all readahead requests for filesystem buffers
>         2) (in background, often blocked)
>            issue all readahead requests for normal files
>         -) make sure the above requests are of really _low_ priority
>         3) regular system boot continues
>         4) promote the priority of any request that is now demanded by
>            legacy programs
> 
> In the scheme, most work is done by user-land tools. The required
> kernel support is minimal and general-purpose:
>         - an /proc/filecache interface
>         - the ability to promote I/O priority on demanded pages
> 
> By this approach, we avoided the complicated OSX bootcache solution,
> which is a physical-blocks-based, special-handlings-in-kernel solution
> that is exactly what Linus is against.

Wu,

While ago, I hacked up similar /proc interface  
	echo "<filesystem-name>" > /proc/pagecache-usage

Which showed pagecache usage of every file in that filesystem
(filename, #num pages). My main objective was to shoot down pagecache
for all the files in a given filesystem. I ended up using it to do
posix_fadivse(POSIX_FADV_DONTNEED) on those files. (Initially, tried
to do this without this, by doing fadvise() on all files in the
filesystem - but ended up bloating up inode and dcache). 

Yeah. having this kind of information would be useful. But I am not sure
how much of this can benefit regular workloads - unless one is willing
to tweak things heavily. Bottom line is, need to have a better strategy
on how you would use information ..

Thanks,
Badari


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux