How to convert a mercurial repo to git

When I started working on a few things, and needed to use some version control, I used mercurial. This wasn't really something I thought about all that much: I had used subversion & mercurial before (back on Windows) without problems, whereas with git the few times I had to use it (e.g. to try things quickly, or clone a repo) things weren't as smooth (to me, that is).

Admittedly I never bothered looking into git then, and so mercurial was an obvious "choice" when I needed to use a VCS. Just like bitbucket was obvious for the same reason (it's probably the best hosting solution with mercurial support).

But the Linux ecosystem loves git, they're both quite close historically, and lately I decided to actually looked into git. I'm still not done with all the reading, but already I like it a lot.

It can do a lot, so far I understand & like how things work, and I therefore decided to move. Which means, I'll have to convert my mercurial repos into git ones; starting with kalu.

The right tool for the job

There are many possible solutions to do the conversion. I tried a few :

  • First I used the hg-git plugin. It gives hg the ability to pull to and push from a Git server. Simple enough, however I wasn't entirely satisfied with the results. (I don't remember exactly why, but it had to do with tags being lightweight, maybe also missing branches...)

  • I then tried hg2git, but the results weren't satisfying. Tags were still lightweight, which I was hoping to find a way around, and my branches had all (but master) been renamed "branch-xx"

  • Finally, I looked at fast-export, which gave me the most satisfying results.

The convertion itself is easy enough to do:

git clone git:// .
rm -rf .git .gitignore
git init
./ -r /path/to/hg/repo
git clean -f # remove fast-export files

Note: On Arch Linux, python 3 is the default, so you need to edit to add a 2: PYTHON=${PYTHON:-python2}

However, things still weren't perfect. Tags were still lightweight, and my history was a bit of a mess. Though I believe this is my fault, due to the way things were done in hg.

I'm not sure if I just created this mess, or if it is due to the fact that branches work quite differently in git that they do in mercurial, but the end result is the same.

I ended up with two parallel branches in the history : master, where daily commits were done, and stable, where things were merged and tags done. This seemed alright at the time, but not anymore.

Because I wanted to have a "cleaner" (more linear) history in git, and one where all tags could be seen from the branch master. The branch stable should in fact just be in the same history line, only lower up until things are deemed stable and it gets fast-forwarded (and a tag is added).

So, even though I don't know any python, I started looking at the source code, to see if I could help "improve" things a bit.

Let's hack fast-export a bit

Lucky enough, things looked pretty simple & straight forward, and I was able to make a few adjustements.

As I said, I don't know python so expect things to be ugly. And those are probably quite specific to my needs, and might not be applicable as-is to any other repo.

Create annotated tags

Tags were not annotated, even though in mercurial every time a tag is created, we have a commit (that updates the .hgtags file). fast-export was smart enough to ignore this file, since it's completely useless in git, but still kept those (empty) commits.

I decided to change things, to ignore those commits (again, they were empty/useless in git, even more so without the .hgtags file around) and instead use their info (date, author...) to create the annotated tags.

Should be noted that I didn't make sure this would always work, and it's probably not ideal for a very large repo. But for the small one that I was working on, it worked fine.

Every time a commit is added, we check to see if it's tagged. If so, the next commit will be "ignored", and instead the annotated tag will be created. As mentionned, the commit author becomes the tagger, we re-use the date as well, and I forced the commit message to Add tag for version <tag> since all my tags are version number, and the original commit message was about the same, only referencing a mercurial changeset which, here, meant nothing.

It was also needed to adjust the marks used, to not reference the commit we didn't import but the one before (i.e. the one tagged) instead.

Avoid the unnecessary history branch

One problem remained: at one point I had created a new branch (stable), where tags for stable versions would be done. As described earlier, this resulted in a "messy" history, with two parallel branches of commits in the history : one with the actual work/commits being done, and one where things were merged, and tagged.

This looked bad, but because I had consistently done it the same way, it was easy enough to fix it. It always went like so :

  • commit 1 : [master] some work
  • commit 2 : [stable, tagged] merge latest
  • commit 3 : [stable] add tag (to commit 2)
  • commit 4 : [master] more work

So all I had to do was, when dealing with a commit being the result of a merge, check if it was tagged. If so, in addition to "ignoring" commit 3 (and using its info for the annotated tag), simply have the tag applied to commit 1 instead of commit 2.

This worked well, and while all the "commit 2-s" were still being added to git, they would all go away since their branch wasn't used in any way. I only had to manually delete the branch stable to make it all proper, and then re-create it pointing to the right commit (last tagged one).

And with that, I ended up with a "clean", linear history in git. Success!

kalu is now using GIT

Now kalu is now using a git repo (with a nice linear history), hosted on github. And for those interested/curious, my fork of fast-export is also available on github, in branch annotated-tags.

Top of Page