Re: [rfc] git: combo-blobs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Paul Jackson <[email protected]> wrote:

> Hmmm ... I have this strong sense that I am about 2 hours away from
> smacking my forehead and groaning "Duh - so that's what Ingo meant!"
> 
> However, one must play out one's destiny.
> 
> Could you provide an example scenario, which results in the creation 
> of a combo-blob?
> 
> The best I can come up with is the following.
> 
> Let's say Nick changes one line in the middle of kernel/sched.c (yeah 
> - I know - unlikely scenario - he usually changes more than that - 
> nevermind that detail.)
> 
> In the days Before Combo Blobs (BCB), git would have been told that 
> kernel/sched.c was to be picked up, and would have wrapped it up in a 
> zlib'd blob, sha1summed it, seen it was a new sum, and added that blob 
> to its objects (or something like this -- I'm still a little fuzzy on 
> these git details.)
> 
> But Nick just downloaded the latest git 1.5.11.1 which has added 
> support for combo blobs, so now, guessing here, instead of wrapping up 
> the new sched.c, git instead unwraps the old one, diff's with the new, 
> notices a couple of long sequences that are unchanged, wraps up both 
> of those sequences as a couple of relatively large blobs, and wraps up 
> the new lines that Nick just coded in the middle as a small blob, and 
> puts all three in the object store, along with another small 
> combo-blob, tying them all together.

actually, git would just include by reference the previous blob.

lets say we had the previous version of sched.c in a blob, ID 
cc4ee6107d19f89898a8c89d45810f01710f2ff4. We have the new edit (which is 
small, lets say 20 bytes) in blob e010fab710092b19be6e26de1721e249dff2d141.
We'd create the combo-blob representing the new version of sched.c, the 
following way:

	include cc4ee6107d19f89898a8c89d45810f01710f2ff4 0 54010
	include e010fab710092b19be6e26de1721e249dff2d141 0 20
	include cc4ee6107d19f89898a8c89d45810f01710f2ff4 54030 73061

so we'd include (by reference) most of the previous version, with a 
small blob for the extras. Since sched.c compresses down to 36K, we 
saved ~32K of bandwidth, and somewhere on the order of 20K of storage.

to construct the combo blob later on, we do have to unpack sched.c (and 
if it's already a combo-blob that is not cached then we'd have to unpack 
all parents until we arrive at some full blob).

> So far, not too bad.  Haven't gained anything, and required the 
> unpacking of a zlib blog we didn't require before, and the running and 
> analyzing of a diff we didn't require before, but the end result is 
> only moderately worse - four object blobs instead of one, but of total 
> size not much larger (well, total size typically 3 disk blocks worse, 
> due to a slight increase in fragmentation from using 4 blocks to store 
> what used to be in one.)

we'd have 2 new objects (the 'delta' and the 'combo' blob).

(if # of objects is an issue then we could include new data in the combo 
blob itself too, but that's getting too complex i think.)

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux