Fedora Users — Re: How to find a needle in a haystack?

On Wed, 2010-05-19 at 15:40 -0430, Patrick O'Callaghan wrote:
> On Wed, 2010-05-19 at 14:07 -0400, aragonx@xxxxxxxxxx wrote:
> > The data in the files is of the unstructured binary type.  When I do a
> > search, I have _most_ of the file name.  Enough to uniquely identify
> > it.
> 
> So you don't need to look into the file to get a match? Sounds like the
> best procedure would just be to keep an index of all the filenames and
> update it when files are added/removed (assuming you have control over
> both of these processes). A simple database should be able to handle
> this easily, which is pretty much what you suggested yourself. In fact
> it looks so simple that a Berkeley DB file would do it, without needing
> all the fancy DB machinery or MySQL or Postgres. See for example "man
> DB_File".

Is there any reason to not use the already existing updatedb/locate
combo? The fedora updatedb seems to be based on mlocate, which as far as
I know uses the mtime of directories to tell if a directory has changed
since the last scan (mtime of the directory will change if files have
been added or deleted). This should speed up runs unless a lot of
directories change between runs.

You can disable the default updatedb configuration and run it manually
(or in cron jobs) specifying one file system for each job. Let them run
in parallel with output to separate bases. Then globally set the
environment variable to tell locate where to look so it finds all the
bases. Look at the man pages for updatedb, updatedb.conf locate and
mlocate.db. The last one is very optional.

-- 
birger

-- 
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines