Re: Mercurial 0.4b vs git patchbomb benchmark

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mon, 2 May 2005, Matt Mackall wrote:
> 
> Umm.. I am _not_ calculating the SHA of the delta itself. That'd be
> silly.

It's not silly.

Meta-data consistency is supremely important. If people can corrupt their 
metadata in strange an unobservable ways, that's almost as bad as 
corrupting the data itself. In fact, to some degree it's worse, since you 
make people trust the thing, but you don't actually guarantee it.

So how _do_ you guarantee consistency of a tree and the history that led 
up to it? 

And by that I don't mean any of the individual blobs - I realize that it's 
perfectly valid to just check out every single version, and have the sha1 
of that. But how do you guarantee that the sha's you check are the sha's 
that you saved in the first place, and somebody didn't replace something 
in the middle?

In other words, you need to hash the metadata too. Otherwise how do you
consistency-check the _collection_ of files?

It's absolutely not enough to just protect single-file content. That 
doesn't help one whit. It's not what a SCM is all about. You have to 
protect the state of _multiple_ files, ie the metadata has to be 
verifiable too.

If that meta-data is the index, then the index needs to be protected by a
SHA1. In git, that's why we don't just sha1 every blob, but every tree and
every commit. That's the thing that gets consistency _beyond_ a single
file.

> As various people have pointed out, you can hack delta transmission
> and file revision indexing on top of git. But to do that, you'll need
> to build the same indices that Mercurial has. And you'll need to check
> their integrity.

No, absolutely not.

Building indeces on top of git would be stupid. You can _cache_ deltas,
but there's a big difference between a index that actually describes how
random blobs go together, and a cache of a delta between two
well-specified end-points. And in particular, there is no "consistency" to
a delta. You don't need it.

Why? Because either the delta is correct, or it isn't. If it's correct,
the end result will be the right sha1. If it's not, the end result will be
something else. So when you do a "pull" from another repository, you can
trivially check whether the delta's you got were valid: did applying them
result in the same sha1 that the other repository had?

So git really validates the _only_ thing that matters: it validates the 
state of the data. It doesn't validate anything else, but if validates 
that one thing very completely indeed.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux