Re: Re: more git updates..

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Tue, 12 Apr 2005, David Eger wrote:
> 
> The reason I am questioning this point is the GIT README file.
> 
> Linus makes explicit that a "blob" is just the "file contents," and that
> really, a "blob" is not just the SHA1 of the "blob":
> 
> > In particular, the "current directory cache" certainly does not need to
> > be consistent with the current directory contents, but it has two very
> > important attributes:
> > 
> > (a) it can re-generate the full state it caches (not just the directory
> >     structure: through the "blob" object it can regenerate the data too)
> 
> And he defines "TREE" with the same name: blob

Yes. A tree is defined by the blobs it references (and the subtrees) but 
it doesn't _contain_ them. It just contains a pointer to them.

> Therefore, "TREE" must be the *full* data, and since we have the following
> definition for CHANGESET:

No. A tree is not the full data. A tree contains enough information to 
_recreate_ the full data, but the tree itself just tells you _how_ to do 
that. It doesn't contain very much of the data itself at all.

> That each changeset remembers *everything* for *each point in the tree*.

But only BY REFERENCE. A "commit" is usually very small. For example, the
top-of-tree commit-file for my currest kernel test is literally 401
_bytes_ in size. Because it just references a tree (20 bytes of
_reference_).

> Linus, if you actually mean to differentiate between the full data
> and a SHA1 of the data

There is no differentiation. The sha1 _is_ the data as far as git is 
concerned. 

It's only confusing if you think they are different. 

> Also, the details of just what data constitutes a 'changeset' would be
> lovely... i.e. a precise spec of what Pat is describing below...

	torvalds@ppc970:~/test-tools/linux-2.6.12-rc2> cat-file commit `cat .git/HEAD `
	tree cf9fd295d3048cd84c65d5e1a5a6b606bf4fddc6
	parent c7a1a189dd0fe2c6ecd0aa33f2bd2f414c7892a0
	author NeilBrown <[email protected]> Tue Apr 12 08:27:08 2005
	committer Linus Torvalds <[email protected]> Tue Apr 12 08:27:08 2005

	[PATCH] md: remove a number of misleading calls to MD_BUG

	The conditions that cause these calls to MD_BUG are not kernel bugs, just
	oddities in what userspace is asking for.

	Also convert analyze_sbs to return void, and the value it returned was
	always 0.

	Signed-off-by: Neil Brown <[email protected]>
	Signed-off-by: Andrew Morton <[email protected]>
	Signed-off-by: Linus Torvalds <[email protected]>

That's it. In all it's glory. Compressed and tagged it's 401 bytes. 

The tree it references is 677 bytes in size. That in turn references a 
number of subtrees, but almost all of the sub-trees are shared with 
_other_ tree commits, so their size is spread out over all the commits.

The full archive of the 2.6.12-rc2 kernel that I used for testing (only
_one_ version) is 102MB in size. That's about half of what the kernel is
uncompressed.

The full .git archive for 199 versions of the kernel (the 2.6.12-rc2 one
and a test-run of 198 patches from Andrew) is 111MB. In other words,
adding 198 "full" new kernels only grew the archive by 9MB (that's all
"actual disk usage" btw - the files themselves are smaller, but since they
all end up taking up a full disk block..)

Basically, the whole point of git is that objects are equated with their 
sha1 name, and that you can thus "include" an object by just referring to 
its name. The two are equivalent. 

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux