The latest version of Mercurial is available at:
http://selenic.com/mercurial/
Utilities to convert git repos and interoperate with git are beginning
to appear on the mercurial mailing list, including a port of gitk.
As a practical demonstration, I've imported Ingo's BKCVS patchset into
Mercurial. The result is a 297M archive with 28237 changesets going back
to 2.4.0. Some history is lost because of the BK->CVS flattening. You
can browse it here:
http://userweb.kernel.org/~mpm/linux-hg/index.cgi
Be sure to check out the annotate feature. Unfortunately there are no
branches in this repo because of the BK->CVS flattening, but you can
look at the main Mercurial repo to see examples of pulls.
The full tarball of the Mercurial kernel repo (144MB) can be grabbed here:
http://www.kernel.org/pub/linux/kernel/people/mpm/linux-hg.tar.gz
If you want to browse this repo on your own machine (very fast and
convenient for laptops!), simply install Mercurial, download the
tarball, run 'hg serve' in the repo directory and point your web
browser at http://localhost:8000.
The web interface also serves as a highly efficient merge server:
$ time hg -v merge http://remotehost:8000/
searching for changes
adding changesets
adding manifests
adding files
118549846 bytes of data transfered
modified 23306 files, added 28238 changesets and 188476 new revisions
real 4m51.371s
user 1m25.852s
sys 0m8.303s
That's pulling the whole kernel history over fast DSL with only 113M
of traffic. Compare that to the 2.6.11 tar.bz2 at 35M. Smaller merges
are of course proportionally faster. (Pulls from userweb.kernel.org
are disabled because the machine has limited bandwidth.)
Verifying the archive:
$ time hg verify
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
23305 files, 28238 changesets, 188464 total revisions
real 2m48.986s
user 1m30.055s
sys 0m7.158s
Checking the integrity of the equivalent git archive looks like it
will take an hour or more of seek intensive I/O (though the person
who was timing it for me gave up).
This highlights one of git's most serious problems: storing the
repository by hash. This tends to pessimize layout over time. Initial
check-ins will be nicely ordered by write order, but as changes are
made, the set of files in the tip will get spread further and further
apart on the disk and in more and more random order. Copying the
archive via rsync, cp -a, or the like will tend to exacerbate things
by reordering _everything_ in hash (aka worst possible) order. This is
pretty fundamental to the git design and will cause its scalability to
fall apart as the number of revisions mount.
Mercurial was originally using a similar scheme, and when I ran into
this problem, I spent a day playing with variations on sorting by
inode, prefetching, etc to get the performance back. None of it came
close to the performance of simply having everything layed out well on
disk in the first place.
My eventual solution was a simple 5-line change to switch back to a
tree-structured repo layout like CVS. This lets the filesystem block
allocator assist by putting files in the same directory near each
other on disk. Also, copying repos tends to optimize things rather
than making things worse. Mercurial also inherently stores all file
revisions together so operations like tree diffs or file annotate can
be done with a minimum of seeking.
Here's a quick comparison:
Mercurial git BK (*)
storage revlog delta compressed revisions SCCS weave
storage naming by filename by revision hash by filename
merge file DAGs changeset DAG file DAGs?
consistency SHA1 SHA1 CRC
signable? yes yes no
retrieve file tip O(1) O(1) O(revs)
add rev O(1) O(1) O(revs)
find prev file rev O(1) O(changesets) O(revs)
annotate file O(revs) O(changesets) O(revs)
find file changeset O(1) O(changesets) ?
file tracking stat-based stat-based bk edit
checkout O(files) O(files) O(revs)?
commit O(changes) O(changes) ?
6 patches/s 6 patches/s slow
diff working dir O(changes) O(changes) ?
< 1s < 1s ?
tree diff revs O(changes) O(changes) ?
< 1s < 1s ?
hardlink clone O(files) O(revisions) O(files)
find remote csets O(log new) rsync: O(revisions) ?
git-http: O(changesets)
pull remote csets O(patch) O(modified files) O(patch)
repo growth O(patch) O(revisions) O(patch)
kernel history 297M 3.5G? 250M?
lines of code 3700 6500+cogito+gitweb+.. ??
* I've never used BK so this is just guesses
--
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]