bert hubert <[email protected]> wrote:
>
> Hi Andrew,
>
> I'm currently at OLS and presented http://ds9a.nl/diskstat yesterday, which
> also references your ancient 'fboot' program.
>
> I've also done experiments along those lines, and will be doing more of them
> soon.
>
> You mention it was a waste of time, do you recall if that meant:
>
> 1) that the total time for prefetching + actual boot was only 10% shorter,
> but that the actual booting did not (significantly) touch the disk?
>
> 2) that on actual boot there would still be a lot of i/o
>
> ?
eep, this was early 2001, on 2.4.whatever.
I recall trying various preloading schemes - try loading the metadata
first, then the pagecache, pagecache first then metadata, one and not the
other, etc.
Yeah, 10-15% benefit was obtainable but on a little old laptop the amount
of discontiguous I/O was still quite tremendous.
I also recall hacking the initscripts so they were _all_ launched
asynchronously. A few things broke because of dependency problems of
course, but that improved things quite noticeably. I think quite a few of
the scripts and daemons and things have explicit sleeps, and parallelising
all of those helped.
>
> Regarding 1), in my own experiments I failed to convince the kernel to
> actually cache the pages I touched, I wonder if some sequential-read
> detection kicked in, the one that prevents entire cd's being cached.
It depends how you touch the page. Remember that there is no unified
pagecache in 2.6. The pagecache for /dev/hda1 is separate from the
pagecache for /etc/passwd. If one tries to preload /etc/passwd by reading
from /dev/hda1 then that won't be effective. So any userspace preloading
scheme would have to open both /dev/hda1 and /etc/passwd and it would then
read from both fds in some intermingled manner based on disk block address.
Although I'd suggest that it'd be easier to just get the kernel to do it:
set the disk queue size to something enormous (4096 requests?), open 100
files, launch posix_fadvise() against them all (or against sections of
them) then close the files again. Rely upon that large disk queue to
perform all the sorting. Maybe.
> For my next attempt I'll try to actually lock the pages into memory.
It shouldn't be needed. If at the end of preload there's still a decent
amount of free memory, you know that the kernel hasn't gone and thrown
anything away yet. Any machine with 256MB or more of RAM should be able to
fit all the boot-time stuff into RAM fairly comfortably.
> Also, regarding the directory entries, are they accessed via the buffer
> cache?
yes. For ext3 you can preload both inodes and directory entries via
read()s from /dev/hda1. For ext2, directory entries each have their own
pagecache and should be preloaded via read(open(/name/of/directory)).
> Iow, would reading blocks which can't be mapped to files directly via
> /dev/hda be useful?
If the blocks are directories or inodes then you _must_ preload them via
/dev/hda1's pagecache. (/dev/hda1's pagecache is the storage for
/dev/hda1's buffercache - they're the same thing).
So a scheme which would work for 2.6.x would be:
a) Boot the machine
b) Walk /dev/hda1's pagecache, record which pages are present.
c) For all files which are in dcache, walk their pagecache, work out
which pages are present.
(nb: it might be possible to do most of the above from userspace: mmap
the file and use mincore() to find out if the page is in pagecache).
The above data is enough for performing a crude preload:
a) Boot the machine
b) Boost the disk queue size, set the VFS readahead to zero, open
/dev/hda1 and all the regular files, hose reads at the disk via
fadvise(). Restore VFS readahead and queue size, continue with boot.
More sophisticated preload would involve bmap()ping the various regular
files so the reads can be issued in LBA-sorted order. But this could be of
marginal additional benefit.
And I suspect that the whole thing will be of marginal benefit. Although
things might be better now that files are laid out with the Orlov allocator
(make sure that the distro was installed with a 2.6 kernel! The file
layout will be quite different if the installer used a 2.4 ext3).
Of course the next step is to rewrite files so that they are more
favourably laid out on disk. Tricky. Or dump all pagecache to some temp
file in a nice linear slurp and preload that, copying it all to the
appropriate per-inode pagecaches and taking care of files which have been
modified. Trickier ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
|
|