[email protected] (Paul Jackson) wrote on 23.04.05 in <[email protected]>:
> > It's an unavoidable
> > result of using less bits than the original data has.
>
> Even _not_ using a hash will have collisions - copy different globs of
> data around enough, and sooner or later, two globs that started out
> different will end up the same, due to errors in our computers. Even
> ECC on all the buses, channels, and memory will just reduce this chance.
Umm, the whole point of using a digest for the name is to catch these
things as they happen. So if you'd use the whole original bit sequence as
a name, you'd need to have exactly the same bit errors in the data, in the
name, and in the reference to the object, to miss nopticing the problem
early. And it *still* isn't a collision - the data behind name X is
exactly X, always, or it's easily recognizable as broken.
Whereas a hash collision means that both X and Y should be behind name Z.
Both are *correct* behind name Z.
Entirely different situations.
> There is no mathematical perfection obtainable here. Deal with it.
Actually, there is, and your non-hashed name system achieves it.
> If something is likely to happen less than once in a billion years,
> then for all practical purposes, it won't happen.
If that was a truely random thing, then you might have been right. But it
isn't. All possible blobs to a given digest are NOT equally probably (or
of a probability only depending on their size).
We really, really don't know how likely a collision is for the data we
want to store there - just for truely random data.
MfG Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]