Mastering Git submodules

jessaustin · on Feb 20, 2015

Nice to see this caveat right at the beginning:

On the other hand, if the technological context allows for packaging and formal dependency management, you should absolutely go this route instead: it lets you better split your codebase, avoid a number of side effects and pitfalls that litter the submodule space, and let you benefit from versioning schemes such as semantic versioning (semver) for your dependencies.

nawitus · on Feb 20, 2015

The drawback is that things like feature branches are more difficult to use. For example, with npm you need to resort to various scripts or handle them by yourself manually.

jessaustin · on Feb 20, 2015

You're right, but in general I don't think any npm module ought to depend on a feature branch of another npm module. If a feature is worth keeping around, it's worth publishing (even if only to your private npm repo). If 19 different projects want 19 different behaviors from the module.foobar() function, then split the function up or pass it flag parameters or whatever, but don't keep 19 different versions of the code floating around. Pain is inevitable in that scenario. Of course while you're developing a new feature you might want to use it from another module on your dev machine, but "npm link" is what you want to use for that.

nawitus · on Feb 20, 2015

>Of course while you're developing a new feature you might want to use it from another module on your dev machine, but "npm link" is what you want to use for that.

Yeah, but that's the manual way. It works okay when you're only developing a small number of modules (say 1 to maybe 3), but at some point doing it manually becomes pretty annoying. And you need scripts and commits to package.json files if you have a build server which makes builds of feature branches.

jawngee · on Feb 20, 2015

I swear I must be the only person in the world that has no problems using submodules and I use them constantly. It puzzles me that people have such problems with it, which I guess illustrates how unreliable anecdotal evidence really is.

I've had detached head issues, but nothing that didn't take a minute or two to solve.

It might be that I'm using SourceTree exclusively, so maybe SourceTree is hiding away the painful bits or something.

dukerutledge · on Feb 20, 2015

If you are using submodules for pinning external resources to a specific hash then you probably won't have problems. If you are using them to manage multiple internal resources with varying levels of coupling...you will.

bronson · on Feb 20, 2015

I'd guess you aren't working on a team, and don't check out commits before your submodules were added?

If you don't roll forward and back in time (bisect), and you don't have merge conflicts, and you can easily resolve abandoned commits on submodules (i.e. you don't have to wait for Bangalore to wake up and push their changes) then, yes, you probably won't have many issues.

nailer · on Feb 20, 2015

Same here on both counts - have been using ST as my main git interface since 2011 (I still use command line for git init, reflog, and checking stuff remotely on VMs, and git bfg if someone committed a credential), and using submodules across a wide variety of projects. I've never had a problem, and only avoid subtrees to be nice to my colleagues that have had problems

e40 · on Feb 20, 2015

which I guess illustrates how unreliable anecdotal evidence really is.

Or, maybe your use case skirts around the problems?

edvinbesic · on Feb 20, 2015

I think you're both saying the same thing.

chocolateboy · on Feb 20, 2015

For a less painful [1] solution, see git-subrepo. [2]

[1] https://github.com/ingydotnet/git-subrepo/blob/master/Intro....

[2] https://github.com/ingydotnet/git-subrepo

perlgeek · on Feb 20, 2015

I use git-subrepo too, but it has its own sets of warts. It generates pretty verbose (and ugly) commit messages made of json. It doesn't let you do anything (like a pull or clone) when you working directory is dirty.

But on the whole, it is often less painful than submodules. I haven't tried subtree yet, will do that next :-)

chocolateboy · on Feb 20, 2015

> It generates pretty verbose (and ugly) commit messages made of json.

This was fixed recently. [1]

> I haven't tried subtree yet, will do that next :-)

The git-subrepo intro has a pretty good overview of the issues with git-subtree. [2]

[1] https://github.com/ingydotnet/git-subrepo/issues/40

[2] https://github.com/ingydotnet/git-subrepo/blob/master/Intro....

perlgeek · on Feb 20, 2015

OK, the commit messages are less ugly now, but they still give the wrong kind of information.

Four lines of the commit message are about git-subrepo, which I don't care about. I'm not a git-subrepo developer, and don't care what version of the tool was used to generate the commit.

What I do care a lot about is why a commit was made, and those auto-generated commit messages don't tell me that at all. They can't, because the tools don't read minds. That's why git usually prompts me for a commit message. I like that. It's one of the reasons I'm using version control. It would be nice if git-subrepo played along.

chocolateboy · on Feb 20, 2015

The metadata in the commit message isn't used in any way, so you can always:

  git commit --amend

I personally find it a useful default, though, i.e. there's not usually much more I want to say than "added a subrepo: [metadata...]", and, for something like this, I'd rather err on the side of too much information than too little.

TheHippo · on Feb 20, 2015

I find git subtrees[1] way better then submodules.

[1]: http://blogs.atlassian.com/2013/05/alternatives-to-git-submo...

P.S.: Since when is GoDoc a dependency manager for Go?

ern · on Feb 20, 2015

Agreed.

It should be noted that there are two confusingly similar concepts in the git - the subtree merge strategy and the git subtree command (which is, in fact a git command like git submodule, a point which seems ambiguous in the OP).

Navarr · on Feb 20, 2015

> For instance, themes and plugins for Wordpress, Magento, etc. are often de facto installed by their mere presence at conventional locations inside the project tree, and this is the only way to “install” them.

> In such a situation, going with submodules (or subtrees) probably is the right solution

As a Magento Developer, I'm afraid submodules wouldn't work for Magento plugins. As they unfortunately have to be installed to multiple top-level project directories. It is a nightmare (and is better served by the composer efforts for dependency management).

tokenizerrr · on Feb 20, 2015

The last time I worked on Magento, and it has been a while, we would install the dependency in a single directory, and then you'd run a tool over it to "install" it into the proper locations. Sadly I cannot remember the name of this tool, but from memory it seems you could combine that with submodules?

We used this so we could track an entire extension's source code in Git, and move them around easily.

edit: I was thinking of modman

icebraining · on Feb 20, 2015

Can't you use symlinks?

kolev · on Feb 20, 2015

I use them daily and they're painful! Thankfully, there's Peru [0]. Unfortunately, it only works on Python 3.3+.

[0] https://github.com/buildinspace/peru

rspeer · on Feb 20, 2015

What's unfortunate there? It's not like Python is hard to install.

kolev · on Feb 20, 2015

For stuff you use daily - not an issue, but having to install Python 3.3+, and then Peru, and then being able to do something with a third-party project would be a bit too much for some.

gitaarik · on Feb 20, 2015

I wrote very a brief how-to on Git Submodules [1] for my colleagues a while ago, because a lot of people seemed to have trouble with it. It explains the real basics and how to use them painlessly in most situations. So it's much shorter than the OP's article and that might help if you don't want to force everyone in your team to read a big article like this.

[1]: https://gist.github.com/gitaarik/8735255

bronson · on Feb 20, 2015

This guide seems woefully incomplete and optimistic. No coverage of submodule merge conflicts? No describing how to checkout a revision before the submodule was added or after it was removed, or how to bisect a project with submodules? No warnings about accidentally abandoning commits on submodules? (maybe: don't edit files in a submodule and for pity's sake never commit to one without immediately pushing?) Should also probably cover what your whole team must do when a submodule moves or gets renamed upstream.

Also, you might want to mention how git status is affected by submodules (& diff, log, etc).

Finally, I'm not sure you should make --init --recursive the default. If you don't realize a project contains submodules, you're going to make mistakes.

donatj · on Feb 20, 2015

We used them for a few months and it was always painful. Always. We ended up writing our own composer installer plugin that let us put things in the various places we need them and it's been soo much better. Someone needs to make a dead simple non language specific git-tag package manager where I just specify github.com/blah:1.0.* -> folder/blah and it keeps it up to date. It's what we're using composer for but its a stretch.

tinco · on Feb 20, 2015

At work we have a git repository called the 'projectname/super-project'. Developers clone this repo, and all it contains is a pair of shell scripts and a bunch of git modules. A module for every project, it's a SaaS cluster, we have projects like 'http-frontend', 'http-backend', 'gateway', 'indexer', 'marketing-website' and some forks of open source projects we customized for our use case.

After a fresh developer has cloned the super project, they just run the `./setup` shell script and it will inform them of any dependencies their system doesn't meet and then start the docker provisioning process. All subprojects have docker containers associated with them, and we have shell scripts that launch all docker containers and link them together.

A new developer can cold provision the entire cluster on their laptop in a matter of minutes, pretty neat :)

ericclemmons · on Feb 20, 2015

Nice to see this post! We're dismantling a monolith and choose this exact same solution. I'm glad it's working out in the wild.

Mathiasdm · on Feb 20, 2015

Interesting overview!

Some of this caveats are kinda surprising for me, coming from a Mercurial background:

* Every time you add a submodule, change its remote’s URL, or change the referenced commit for it, you demand a manual update by every collaborator. Forgetting this explicit update can result in silent regressions of the submodule’s referenced commit. -- This is something handled automatically in Mercurial. Is there any reason why the same behaviour is not used in Git?

* Commands such as status and diff display precious little info about submodules by default. -- This should be possible to implement, no? It's also something that's available in Mercurial (using the --subrepos flag), and it's a huge boost to usability.

perlgeek · on Feb 20, 2015

> Commands such as status and diff display precious little info about submodules by default. -- This should be possible to implement, no?

Not just them. git archive and git grep for example totally ignore submodules. The whole thing is bolted on, with minimal integration into the core commands. So, usually something to avoid.

tddian · on Feb 26, 2015

By the way, the subtrees article I alluded to in the original article was written since, and I do favor subtrees over submodules, every time: https://news.ycombinator.com/item?id=9080096

crdoconnor · on Feb 20, 2015

I was thinking of using this on a project where a part of the code base needs to be kept secret from some members of the team (stupid proprietary requirement).

It sounds painful, though.