On Sat, 9 Apr 2005, Paul Jackson wrote:
>
> > in order to avoid having to worry about special characters
> > they are NUL-terminated)
>
> Would this be a possible alternative - newline terminated (convert any
> newlines embedded in filenames to the 3 chars '%0A', and leave it as an
> exercise to the reader to de-convert them.)
Sure, you could obviously do escaping (you need to remember to escape '%'
too when you do that ;).
However, whenever you do escaping, that means that you're already going to
have to use a tool to unpack the dang thing. So you didn't actually win
anything. I pretty much guarantee that my existing format is easier to
unpack than your escaped format.
ASCII isn't magical.
This is "fsck_tree()", which walks the unpacked tree representation and
checks that it looks sane and marks the sha1's it finds as being
needed (so that you can do reachability analysis in a second pass). It's
not exactly complicated:
static int fsck_tree(unsigned char *sha1, void *data, unsigned long size)
{
while (size) {
int len = 1+strlen(data);
unsigned char *file_sha1 = data + len;
char *path = strchr(data, ' ');
if (size < len + 20 || !path)
return -1;
data += len + 20;
size -= len + 20;
mark_needs_sha1(sha1, "blob", file_sha1);
}
return 0;
}
and there's one HUGE advantage to _not_ having escaping: sorting and
comparing.
If you escape things, you now have to decide how you sort filenames. Do
you sort them by the escaped representation, or by the "raw"
representation? Do you always have to escape or unescape the name in order
to sort it.
So I like ASCII as much as the next guy, but it's not a religion. If there
isn't any point to it, there isn't any point to it.
The biggest irritation I have with the "tree" format I chose is actually
not the name (which is trivial), it's the <sha1> part. Almost everything
else keeps the <sha1> in the ASCII hexadecimal representation, and I
should have done that here too. Why? Not because it's a <sha1> - hey, the
binary representation is certainly denser and equivalent - but because an
ASCII representation there would have allowed me to much more easily
change the key format if I ever wanted to. Now it's very SHA1-specific.
Which I guess is fine - I don't really see any reason to change, and if I
do change, I could always just re-generate the whole tree. But I think it
would have been cleaner to have _that_ part in ASCII.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]