Re: Kernel SCM saga..

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Sat, 9 Apr 2005, Paul Jackson wrote:
>
> > in order to avoid having to worry about special characters
> > they are NUL-terminated)
> 
> Would this be a possible alternative - newline terminated (convert any
> newlines embedded in filenames to the 3 chars '%0A', and leave it as an
> exercise to the reader to de-convert them.)

Sure, you could obviously do escaping (you need to remember to escape '%' 
too when you do that ;).

However, whenever you do escaping, that means that you're already going to 
have to use a tool to unpack the dang thing. So you didn't actually win 
anything. I pretty much guarantee that my existing format is easier to 
unpack than your escaped format.

ASCII isn't magical.

This is "fsck_tree()", which walks the unpacked tree representation and 
checks that it looks sane and marks the sha1's it finds as being 
needed (so that you can do reachability analysis in a second pass). It's 
not exactly complicated:

	static int fsck_tree(unsigned char *sha1, void *data, unsigned long size)
	{
	        while (size) {
	                int len = 1+strlen(data);
	                unsigned char *file_sha1 = data + len;
	                char *path = strchr(data, ' ');
	                if (size < len + 20 || !path)
	                        return -1;
	                data += len + 20;
	                size -= len + 20;
	                mark_needs_sha1(sha1, "blob", file_sha1);
	        }
	        return 0;
	}

and there's one HUGE advantage to _not_ having escaping: sorting and
comparing.

If you escape things, you now have to decide how you sort filenames. Do
you sort them by the escaped representation, or by the "raw"  
representation? Do you always have to escape or unescape the name in order 
to sort it.

So I like ASCII as much as the next guy, but it's not a religion. If there 
isn't any point to it, there isn't any point to it.

The biggest irritation I have with the "tree" format I chose is actually
not the name (which is trivial), it's the <sha1> part. Almost everything
else keeps the <sha1> in the ASCII hexadecimal representation, and I
should have done that here too. Why? Not because it's a <sha1> - hey, the 
binary representation is certainly denser and equivalent - but because an 
ASCII representation there would have allowed me to much more easily 
change the key format if I ever wanted to. Now it's very SHA1-specific.

Which I guess is fine - I don't really see any reason to change, and if I 
do change, I could always just re-generate the whole tree. But I think it 
would have been cleaner to have _that_ part in ASCII.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux