Re: Kernel SCM saga..

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* David S. Miller <[email protected]> wrote:

> On Fri, 8 Apr 2005 22:45:18 -0700 (PDT)
> Linus Torvalds <[email protected]> wrote:
> 
> > Also, I don't want people editing repostitory files by hand. Sure, the 
> > sha1 catches it, but still... I'd rather force the low-level ops to use 
> > the proper helper routines. Which is why it's a raw zlib compressed blob, 
> > not a gzipped file.
> 
> I understand the arguments for compression, but I hate it for one
> simple reason: recovery is more difficult when you corrupt some
> file in your repository.
> 
> It's happened to me more than once and I did lose data.
> 
> Without compression, I might be able to recover if something
> causes a block of zeros to be written to the middle of some
> repository file.  With compression, you pretty much just lose.

that depends on how you compress. You are perfectly right that with 
default zlib compression, where you start the compression stream and 
stop it at the end of the file, recovery in case of damage is very hard 
for the portion that comes _after_ the damaged section. You'd have to 
reconstruct the compression state which is akin to breaking a key.

But with zlib you can 'flush' the compression state every couple of 
blocks and basically get the same recovery properties, at some very 
minimal extra space cost (because when you flush out compression state 
you get some extra padding bytes).

Flushing has another advantage as well: a small delta (even if it 
increases/decreases the file size!) in the middle of a larger file will 
still be compressed to the same output both before and after the change 
area (modulo flush block size), which rsync can pick up just fine. (IIRC 
that is one of the reasons why Debian, when compressing .deb's, does 
zlib-flushes every couple of blocks, so that rsync/apt-get can pick up 
partial .deb's as well.)

the zlib option is i think Z_PARTIAL_FLUSH, i'm using it in Tux to do 
chunks of compression. The flushing cost ismax 12 bytes or so, so if 
it's done every 4K we maximize the cost to 0.2%.

so flushing is both rsync-friendly and recovery-friendly.

(recovery isnt as simple as with plaintext, as you have to find the next 
'block' and the block length will be inevitably variable. But it should 
be pretty predictable, and tools might even exist.)

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux