On Mon, 2 May 2005, Matt Mackall wrote:
> >
> > - you can share the objects freely between different trees, never
> > worrying about one tree corrupting another trees object by mistake.
>
> Not sure if this is terribly useful. It just makes it harder to pull
> the subset you're interested in.
You don't have to share things in a single subdirectory. Symlinks and
hardlinks work fine, as do actual filesystem tricks ;)
> > - you can drop old objects.
>
> You can't drop old objects without dropping all the changesets that
> refer to them or otherwise being prepared to deal with the broken
> links.
Absolutely. This needs support from fsck to allow us to say "commit xxxx
is no longer in the tree, because we pruned it".
Alternatively (and that's the much less intrusive one), you keep all the
commit objects, but drop the tree and blob objects. Again, all you need
for this to work is just feed a list of commits to fsck, and tell it
"we've pruned those from the tree", which tells fsck not to start looking
for the contents of those commits.
So for example, you can trivially have something that automates this: take
each commit that is older than <x> days, add it to the "prune list", and
run fsck, and delete all objects that now show up as being unreachable
(since fsck won't be looking at what those commits reference).
I could write this up in ten minutes. It's really simple.
And it's simple _exactly_ because we don't do deltas.
> > delta models very fundamentally don't support this.
>
> The latter can be done in a pretty straightforward manner in mercurial
> with one pass over the data. But I have a goal to make keeping the
> whole history cheap enough that no one balks at it.
With delta's, you have two choices:
- change all the sha1 names (ie a pruned tree would no longer be
compatible with a non-pruned one)
- make the delta part not show up as part of the sha1 name (which means
that it's unprotected).
which one would you have?
> What is a tree re-linker? Finds duplicate files and hard-links them?
> Ok, that makes some sense. But it's a win on one machine and a lose
> everywhere else.
Where would it be a loss? Esepcially since with git, it's cheap (you don't
need to compare content to find objects to link - you can just compare
filename listings).
> I've added an "hg verify" command to Mercurial. It doesn't attempt to
> fix anything up yet, but it can catch a couple things that git
> probably can't (like file revisions that aren't owned by any
> changeset), namely because there's more metadata around to look at.
git-fsck-cache catches exactly those kinds of things. And since it checks
pretty much every _single_ assumption in git (which is not a lot, since
git doesn't have a lot of assumptions), I guarantee you that you can't
find any more than it does (the filename ordering is the big missing
piece: I _still_ don't verify that trees are ordered. I've been mentioning
it since the beginning, but I'm lazy).
In other words, your verifier can't verify anything more. It's entirely
possible that more things can go _wrong_, since you have more indexes, so
your verifier will have more to check, but that's not an advantage, that's
a downside.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]