On Wed, Jan 10, 2007 at 02:20:49PM +1100, Nigel Cunningham wrote:
> Hi.
>
> On Wed, 2007-01-10 at 09:57 +0800, Fengguang Wu wrote:
> > On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote:
> > >
> > >
> > > On Tue, 9 Jan 2007, Fengguang Wu wrote:
> > > > >
> > > > > The fastest and probably most important thing to add is some readahead
> > > > > smarts to directories --- both to the htree and non-htree cases. If
> > > >
> > > > Here's is a quick hack to practice the directory readahead idea.
> > > > Comments are welcome, it's a freshman's work :)
> > >
> > > Well, I'd probably have done it differently, but more important is whether
> > > this actually makes a difference performance-wise. Have you benchmarked it
> > > at all?
> >
> > Yes, a trivial test shows a marginal improvement, on a minimal debian system:
> >
> > # find / | wc -l
> > 13641
> >
> > # time find / > /dev/null
> >
> > real 0m10.000s
> > user 0m0.210s
> > sys 0m4.370s
> >
> > # time find / > /dev/null
> >
> > real 0m9.890s
> > user 0m0.160s
> > sys 0m3.270s
> >
> > > Doing an
> > >
> > > echo 3 > /proc/sys/vm/drop_caches
> > >
> > > is your friend for testing things like this, to force cold-cache
> > > behaviour..
> >
> > Thanks, I'll work out numbers on large/concurrent dir accesses soon.
>
> I gave it a try, and I'm afraid the results weren't pretty.
>
> I did:
>
> time find /usr/src | wc -l
>
> on current git with (3 times) and without (5 times) the patch, and got
>
> with:
> real 54.306, 54.327, 53.742s
> usr 0.324, 0.284, 0.234s
> sys 2.432, 2.484, 2.592s
>
> without:
> real 24.413, 24.616, 24.080s
> usr 0.208, 0.316, 0.312s
> sys: 2.496, 2.440, 2.540s
>
> Subsequent runs without dropping caches did give a significant
> improvement in both cases (1.821/.188/1.632 is one result I wrote with
> the patch applied).
Thanks, Nigel.
But I'm very sorry that the calculation in the patch was wrong.
Would you give this new patch a run?
It produced pretty numbers here:
#!/bin/zsh
ROOT=/mnt/mnt
TIMEFMT="%E clock %S kernel %U user %w+%c cs %J"
echo 3 > /proc/sys/vm/drop_caches
# 49: enable dir readahead
# 50: disable
echo ${1:-50} > /proc/sys/vm/readahead_ratio
# time find $ROOT/a > /dev/null
time find /etch > /dev/null
# time find $ROOT/a > /dev/null&
# time grep -r asdf $ROOT/b > /dev/null&
# time cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null&
exit 0
# collected results on a SATA disk:
# ./test-parallel-dir-reada.sh 49
4.18s clock 0.08s kernel 0.04s user 418+0 cs find $ROOT/a > /dev/null
4.09s clock 0.10s kernel 0.02s user 410+1 cs find $ROOT/a > /dev/null
# ./test-parallel-dir-reada.sh 50
12.18s clock 0.15s kernel 0.07s user 1520+4 cs find $ROOT/a > /dev/null
11.99s clock 0.13s kernel 0.04s user 1558+6 cs find $ROOT/a > /dev/null
# ./test-parallel-dir-reada.sh 49
4.01s clock 0.06s kernel 0.01s user 1567+2 cs find /etch > /dev/null
4.08s clock 0.07s kernel 0.00s user 1568+0 cs find /etch > /dev/null
# ./test-parallel-dir-reada.sh 50
4.10s clock 0.09s kernel 0.01s user 1578+1 cs find /etch > /dev/null
4.19s clock 0.08s kernel 0.03s user 1578+0 cs find /etch > /dev/null
# ./test-parallel-dir-reada.sh 49
7.73s clock 0.11s kernel 0.06s user 438+2 cs find $ROOT/a > /dev/null
18.92s clock 0.43s kernel 0.02s user 1246+13 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
32.91s clock 4.20s kernel 1.55s user 103564+51 cs grep -r asdf $ROOT/b > /dev/null
8.47s clock 0.10s kernel 0.02s user 442+4 cs find $ROOT/a > /dev/null
19.24s clock 0.53s kernel 0.03s user 1250+23 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
29.93s clock 4.18s kernel 1.61s user 100425+47 cs grep -r asdf $ROOT/b > /dev/null
# ./test-parallel-dir-reada.sh 50
17.87s clock 0.57s kernel 0.02s user 1244+21 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
21.30s clock 0.08s kernel 0.05s user 1517+5 cs find $ROOT/a > /dev/null
49.68s clock 3.94s kernel 1.67s user 101520+57 cs grep -r asdf $ROOT/b > /dev/null
15.66s clock 0.51s kernel 0.00s user 1248+25 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
22.15s clock 0.15s kernel 0.04s user 1520+5 cs find $ROOT/a > /dev/null
46.14s clock 4.08s kernel 1.68s user 101517+63 cs grep -r asdf $ROOT/b > /dev/null
Thanks,
Wu
---
Subject: ext3 readdir readahead
Do readahead for ext3_readdir().
Reasons to be aggressive:
- readdir() users are likely to traverse the whole directory,
so readahead miss is not a concern.
- most dirs are small, so slow start is not good
- the htree indexing introduces some randomness,
which can be helped by the aggressiveness.
So we do 128K sized readaheads, at twice the speed of reads.
The following actual readahead pages are collected for a dir with
110000 entries:
32 31 30 31 28 29 29 28 27 25 29 22 25 30 24 15 19
That means a readahead hit ratio of
454/541 = 84%
The performance is marginally better for a minimal debian system:
command: find /
baseline: 4.10s 4.19s
patched: 4.01s 4.08s
And considerably better for 100 directories, each with 1000 8K files:
command: find /throwaways
baseline: 12.18s 11.99s
patched: 4.18s 4.09s
And also noticable better for parallel operations:
baseline patched
find /throwaways & 21.30s 22.15s 7.73s 8.47s
grep -r asdf /throwaways2 & 49.68s 46.14s 32.91s 29.93s
cp /KNOPPIX_CD.iso /dev/null & 17.87s 15.66s 18.92s 19.24s
Signed-off-by: Fengguang Wu <[email protected]>
---
fs/ext3/dir.c | 33 +++++++++++++++++++++++++++++++++
fs/ext3/inode.c | 2 +-
include/linux/ext3_fs.h | 2 ++
3 files changed, 36 insertions(+), 1 deletion(-)
--- linux.orig/fs/ext3/dir.c
+++ linux/fs/ext3/dir.c
@@ -94,6 +94,28 @@ int ext3_check_dir_entry (const char * f
return error_msg == NULL ? 1 : 0;
}
+#define DIR_READAHEAD_BYTES (128*1024)
+#define DIR_READAHEAD_PGMASK ((DIR_READAHEAD_BYTES >> PAGE_CACHE_SHIFT) - 1)
+
+static void ext3_dir_readahead(struct file * filp)
+{
+ struct inode *inode = filp->f_path.dentry->d_inode;
+ struct address_space *mapping = inode->i_sb->s_bdev->bd_inode->i_mapping;
+ int bbits = inode->i_blkbits;
+ unsigned long blk, end;
+
+ blk = filp->f_ra.prev_page << (PAGE_CACHE_SHIFT - bbits);
+ end = min(inode->i_blocks >> (bbits - 9),
+ blk + (DIR_READAHEAD_BYTES >> bbits));
+
+ for (; blk < end; blk++) {
+ pgoff_t phy;
+ phy = generic_block_bmap(inode->i_mapping, blk, ext3_get_block)
+ >> (PAGE_CACHE_SHIFT - bbits);
+ do_page_cache_readahead(mapping, filp, phy, 1);
+ }
+}
+
static int ext3_readdir(struct file * filp,
void * dirent, filldir_t filldir)
{
@@ -108,6 +130,17 @@ static int ext3_readdir(struct file * fi
sb = inode->i_sb;
+ /*
+ * Reading-ahead at 2x the page fault rate, in hope of reducing
+ * readahead misses caused by the partially random htree order.
+ */
+ filp->f_ra.prev_page += 2;
+ filp->f_ra.prev_page &= ~1;
+
+ if (!(filp->f_ra.prev_page & DIR_READAHEAD_PGMASK) &&
+ filp->f_ra.prev_page < (inode->i_blocks >> (PAGE_CACHE_SHIFT-9)))
+ ext3_dir_readahead(filp);
+
#ifdef CONFIG_EXT3_INDEX
if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
EXT3_FEATURE_COMPAT_DIR_INDEX) &&
--- linux.orig/fs/ext3/inode.c
+++ linux/fs/ext3/inode.c
@@ -945,7 +945,7 @@ out:
#define DIO_CREDITS (EXT3_RESERVE_TRANS_BLOCKS + 32)
-static int ext3_get_block(struct inode *inode, sector_t iblock,
+int ext3_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
handle_t *handle = journal_current_handle();
--- linux.orig/include/linux/ext3_fs.h
+++ linux/include/linux/ext3_fs.h
@@ -814,6 +814,8 @@ struct buffer_head * ext3_bread (handle_
int ext3_get_blocks_handle(handle_t *handle, struct inode *inode,
sector_t iblock, unsigned long maxblocks, struct buffer_head *bh_result,
int create, int extend_disksize);
+extern int ext3_get_block(struct inode *inode, sector_t iblock,
+ struct buffer_head *bh_result, int create);
extern void ext3_read_inode (struct inode *);
extern int ext3_write_inode (struct inode *, int);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]