Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Git submodules are fine and can be really useful, but they are really hard. I've run into problems like:

1. Git clone not cloning submodules. You need `git submodule update` or `git clone --recursive`, I think

2. Git submodules being out-of-sync because I forgot to pull them specifically. I'm pretty sure `git submodule update` doesn't always work with this but maybe only when 3)

3. Git diff returns something even after I commit, because the submodule has a change. I have to go into the submodule, and either commit / push that as well or revert it. Basically every operation I do on the git main I need to also do on the submodule if I modified files in both

4. Fixing merge conflicts and using git in one repo is already hard enough. The team I was working on kept having issues with using the wrong submodule commit, not having the same commit / push requirements on submodules, etc.

All of these can be fixed by tools and smart techniques like putting `git submodule update` in the makefile. Git submodules aren't "bad" and honestly they're an essential feature of git. But they are a struggle, and lots of people use monorepos instead (which have their own problems...).



Switching branches in a repository with submodules is a huge pain, especially if (like the Ansible repo) some branches have the subdirectory in the same repo like normal, and some branches have the same subdirectory in a submodule.


There are git options for managing these difficulties like:

git config --global submodule.recurse true

https://git-scm.com/book/en/v2/Git-Tools-Submodules search for "git config"


> There are git options for managing these difficulties

This is git in a nutshell. Most defaults are very bad, and so using git from the command line is an exercise in learning which flags to set to achieve a sane workflow.


Thanks! That one option simplifies like 2/3 of the pain of using submodules. Doing a pull or checkout would just be regular commands.

Though it’d be nice if `git commit’ supported it too and just did a `git submodule foreach git commit …`.


Interesting!

I've never thought of doing commits in submodules. We use them a lot at work but only in a "import a specific revision of this other repo into our repo" sense, if we needed to make changes we would do them in the repo that the submodule points to and then just update the ref that the submodule points to.

What's your use case for making and committing changes inside the submodule tree?


I naively thought that was the whole point of sub modules. Otherwise why not use a package manager?

The use case being that you can work on and update some independent system/repo while getting real-time feedback on how the changes interact with the system as a whole.


That's how I've come to use git submodules. It's been helpful when working with various embedded projects. They often don't get updated for months or years (ideally), and then you only want to pull in specific changes.


In a previous project we had a main repo that had the deploy/orchestration scripts, docs, etc and the actual components (frontend, backend, periodic and ondemand jobs and jobqueue, blob serving thing that did ACL in front of Ceph - because for some reason S3 was too mainstream :D)

doing a quick fix that changed an API and had to be done in backend and frontend and in the main.

though it's quite similar to todays gitops flow, but with submodules :)


It's easy to overlook that the submodule is a complete git repository in its own right.

When working on both, it's really annoying to have to commit to a dependency's repo, push, and pull down in the dependent repo. If you do the commit in the submodule, it's just immediately available...and you can still have whatever remote to push to. So it just cuts down on the number of checkouts you have to have.


That's all true, but work trees are cheap, and the workflow you describe means that your submodules are tracking a branch rather than pinned to a revision, right?

For our purpose that's definitely worse; the submodule is supposed to be a pointer to a specific tree, with that tree being the same for all developers. If we want to change the tree that is pointed to, we should commit and push a change to the submodule ref.


Yeah if you commit a change in a submodule, the parent repo gets marked as dirty as the ref changes and you need to commit.


Right. But why doesn’t it Just Work by default?


From a design point of view, is there a good reason why this isn't the default?


Agreed with all of this!

In git parlance, the submodule porcelain is hard to use (but the plumbing is good)


That's the entirety of git. Extremely fast. Bummer UI. It's status quo and no changing. Despite using git for 10+ years, I frequently have to look up commands and then I end up scratching my head as to why the CLI UI is like that.


I suppose the biggest problem is that the concept of SCM/VCS is just not simple enough to make both easy and useful/advanced at the same time.

You can have a 'pull, merge, push'-only system, but at that point we're re-inventing subversion. So making it more advanced would mean we also need to have the knowledge and skills to do other activities correctly and that means the tooling can't make as many choices for you because there simply isn't a default way that works all the time.

Most efforts at git-alternatives run in to the same problems and either they'll be just as advanced and have the samen benefits and downsides or they end up less advanced but now it's not equally useful and you can't really make it work right.


Mercurial covers generally the same concepts as git and is thus also not trivial to learn for someone uninitiated; yet its interface was like day vs. night when compared to git since their very early days. It proves one can design a decent interface if one actually tries to care about usability and friendliness. How I remember the past days, git won the rivalry squarely due to GitHub becoming popular (though I assume there were some reasons why GH chose git over hg).


Back in the day there were actually 2 or 3 different cloud SCM hosting providers that chose Mercurial. As I recall, a couple of them, BitBucket and Kiln, also had better Web UIs than GitHub did. Versus just GitHub offering Git, and it was kind of a duffer in my opinion.

GitHub has come a long way. But I would guess the main reason it managed to become dominant is not because it had a better product. (At the time, it didn't.) It's because Git benefited from the celebrity of its author, Linus Torvalds.


It's crazy how people forget the past. Back in the day, git was "rewrite, rebases modify past to make beautiful commit" vs hg "rewriting past is bad, beautiful commits are lies about history". Turns out people don't care about truthful history.

(Nowadays mercurial can do rebase/amend just fine.. But it is too late)


I think that the truth is probably somewhere in between.

I do like to squash and rebase before moving changes upstream. But, to me, that isn't really history quite yet. Or at least, it's not history that's worth recording. All those micro-commits from the work in progress are, in some ways, more akin to my editor's undo history. Which is also something I don't save.

It's also clear, in hindsight, that Mercurial's original position on this subject failed to anticipate AWS credentials accidentally being committed to source control.

But I have also seen (and done) some amount of history rewriting in long-lived branches that I don't think would have been necessary if Git had had some of Mercurial's ergonomics. Workflows for merging two different repositories while retaining the commit history from each, for example.


FWIW, Mercurial has had "censor" command for blowing away the contents of those revisions with AWS keys since 2015.

Although once stuff is pushed to the public repo you're probably going to want to change those keys regardless. And if it's on the local one, there's plenty of options for removing the commit.


Rewriting history is actively good because it means you can view it as a series of logical patches. I have worked on large projects using earlier versions of hg and they were absolutely full of merge commits just labeled "Merge" - some of them were safe, some of them had random changes in them, and some of them had automatic merged changes that actually caused problems.

It was also much slower than git. But I knew someone working on Google Code at the time who liked it better because it was "clean" and in Python.


It was much much faster than git at HTTP at the time though. That's why Google Code selected it. Also it was faster at imports which was why Mozilla selected it for their transition.

Some other things it was slower, that's changed over time ofc.

Also, in terms of clean history, mercurial has best of both worlds with phases and hidden by default commits to keep track of such cleanup.


> Also, in terms of clean history, mercurial has best of both worlds with phases and hidden by default commits to keep track of such cleanup.

Yeah, it has more features now but at the time it didn’t. There was something called patch queue you could use for early stage work but that was all.


The fact that mercurial is substantially slower than git was probably also a big factor.


in other words, the comment you’re replying to rewrote the past of rewriting the past.


Around 2013, having only knowledge of svn at the time, I tried both git and mercurial to see which I liked more and found git to be a lot more intuitive than mercurial.

It's been long enough I don't remember the details about why I didn't like mercurial, but fame power had nothing to do with it, nor did website integration - I was only using it locally. How it worked just didn't fit with how I thought about version control.


That's not what I remember. Back in the days when people started implementing dvcs git was just much faster than everyone else (that was in fact the reason why Linus wrote it). Once the kernel was using it, its mindshare just grew much faster because of the publicity this implied. In other words it was largely a case of "if the kernel devs are using it it must be good". When GitHub and all the other hosting services started many still had mercurial or other dvcs (launchpad was bzr for example), but by that time the ship had sailed already I would argue.


> I suppose the biggest problem is that the concept of SCM/VCS is just not simple enough to make both easy and useful/advanced at the same time.

Git has and always had a singularity bad UI amongst dvcs.


> but at that point we're re-inventing subversion

So? Maybe Subversion is all most developers need?


Solo devs maybe. Merging and branching are so much worse in svn that it’s not good enough for “most developers”, aka those working on professional projects with a team of developers. Sure we used to make do with svn, but I have no desire to go back.


Odd, I found branching and merging so much easier in SVN.

Git doesn't even technically have branches, just pointers to commits which can easily get mixed up, go headless and fail out in ways that just never happened in SVN.

And rebasing a branch with several commits can be a nightmare since you have to re-merge almost (but not exactly) the same code over and over again at whatever state it was in at some time in the past when some previous commit happened - unless you abort out and squash first. In SVN, you just merged the whole branch once and when both were in their final current state.

Of course, git's a lot more powerful, but with that comes complexity. SVN branching and merging was a snap comparatively.


SVN will frequently insist that merge conflicts exist where there shouldn't be any if trees have been modified by deleting or moving directories. This is so pervasive that organizations will avoid doing merges because of the manual fixups you have to do on long lived branches. There's metadata now for tracking branch history but older versions couldn't figure out that two branches in a merge had a common ancestor. A puzzling thing to omit from a VCS.



> Git doesn't even technically have branches, just pointers to commits which can easily get mixed up, go headless and fail out in ways that just never happened in SVN.

I mean, if that's a reason to say git doesn't technically have branches, then neither does svn. It has subdirectories that you can make copies of at any level, and copy commits between them at any level, such that you can make an amazing repo-within-a-repo mess not possible in git.


Honestly I’d take CVS over subversion. Merging was just bad on subversion and I found the IDE integrations to be confusing, obtuse and buggy.

Sure, CVS was limited but it was reliable and straightforward.


> Sure, CVS was limited but it was reliable and straightforward.

It was so reliable that people complained about SVN using a DB (Berkeley DB) as backend, as manual fixing of CVS files was a "normal" part of operation and people didn't believe that might not be needed ...


...And better than SCCS! ;)


> Solo devs maybe.

Like I’m gonna bother to set up a Subversion remote for every little repo that I create for myself.


The most beautiful part of svn, and the one I miss the most in git, is the fact that there is no need to set up tons of separate repo. Every subdirectory acts as a separate git repo.

This means you usually only have one svn repo, and you set it up the way you like. As an example, you may set things up so you can checkout:

server:/proj/small/hello - to get a single project

server:/proj/small- to get all small projects

server:/proj - to get all projects

If you already have one of those checked out, adding new project is as simple as "mkdir bar", "svn add bar", "svn commit". So juch easier than making new github repo. And multi-level hierarchical project nesting is still something impossible in git.


> So juch easier than making new github repo.

Were you paying attention to what I just wrote!? `git init`. What do I need a remote on the Internet for?


You don't need internet, you can create the main repo somewhere else on your pc (preferably on a second disk to have very minimal backup) and you use file url to access it "file:///F:/MyRepositoriesAreHere/MyProject-repo/ProjectName/trunk"


That’s better but still an extra step.


Don't you ever share your projects between machines? I at least have laptop, a desktop, and occasional raspberry pi. An ability to have personal projects on both is very handy. It also acts as a nice backup.

But yes, if all your work is on one machine, and you have backups of it, there is not much point in svn.


That’s a file sync. problem, not a VCS problem. And backup is a separate problem.


Git makes sense for the Linux kernel and merging patches at scale. 99% of software development would be fine with subversion


Git completely replaced Subversion so quickly because the benefits were apparent even at a small scale. Subversion was centralized and slow whereas Git branches were cheap and fast. It turns out the distributed model is just a lot better. Even my college project teams benefited from the superior experience of Git.


Interestingly, there was a distributed SCM build on top of SVN, called SVK (https://wiki.c2.com/?SvkVersionControl).

Being distributed, it solved the main gripes with SVN; it also added a better merging algorithm (https://foswiki.org/pub/Development/SVK/svk-visual-guide.pdf), solving another big gripe.

I was actually satisfied of it, and surprised that it never got attention, in particular, because there were no requirements in order to use it with existing SVN repositories. I'm actually baffled, because SVN is still active, so SVK would still be useful nowadays.


May as well use Git SVN integration for this.


Except that everybody changed to Github not git. And effectively recreated Subversion with caching.


"Subversion with caching" is not subversion.

I used subversion for a long time and was resistant to moving ardour.org to use git instead. 24hours after we switched (we never use 3rd party git hosting as our canonical repo), I was already convinced it was not the right choice, but an excellent choice.


It's also Subversion where you can commit your changes and write the message before pushing, instead of at the same time, so you get the chance to review it. That's enough to make it better.


No, I worked with gitlab, bitbucket, custom git server installs in the last 3 years alone.


Anything can be made to work. But I don’t see why I would want to handicap myself with a truly centralized SCM system.

Sure, we use one canonical repository. All our “pull requests” are really merge requests, mostly to the main branch. So that’s pretty centralized, right? So why use a distributed VCS? Well, why use a local editor or IDE for code that is ultimately going to end up in the cloud somewhere? Sure, you might want to out of preference, but why should you be forced to? The fact is that wherever the code will end up is besides the point when it comes to how to develop it.

The truly important thing about distributed VCS is that it forces almost all of the operations on the repository to be usable locally. And why should it not be? What’s “git log”, “git blame”, or “git merge” got to do with whether there is one canonical repo or a hierarchy of upstreams?

I think that this idea that non-distributed VCS is somehow the default—as in the obvious, simple thing to implement—is just backwards. Of course the default assumption for any VCS operation—unless it has a name like “send-email”—should be that it operates on your own local copy.

Sure, we use a centralized repo structure. And the only call-central-command operation I use is “git push”. All the other fiddling and querying—and all the things that make version-control-as-history useful—is local.


Have you use subversion ? If yes, do you remember how slow it was? How cutting a branch and then merge it was seen as something for senior dev to handle ?


I did use subversion in college. Never professionally, so creating our own branches wasn't a pain.

I think there is a reason why this XKCD was made: https://xkcd.com/1597/


“Delete your work and clone again” is ridiculous. Just goes to show that you don’t know what you’re talking about (or the xkcd guy for that matter).


I'm talking about my needs, ditto with the XKCD guy. Basically I just need to be able to create a PR from my local changes. Not much more.


Subversion requires a server, so it’s not suitable for the small local one-off repo


No. It doesn’t. svn checkout file://path/to/repo works absolutely fine.


But it does need a server to collaborate, doesn't it?


Define a server. NFS will be enough for more than one person to use a repo over the file protocol.

It will work fine over a Windows share, Samba, NFS, and so on. It doesn’t need svn or http protocol to operate.


I could define a server but I'm not a dictionary so I won't.

SVN doesn't work peer-to-peer or over email. And that's fine. It's just not ready to go with only local tools.

Of course, you can use GitHub with Subversion just fine, but that wasn't the point. The point was that Subversion alone is never enough if you want to collaborate.


> I could define a server but I'm not a dictionary so I won't.

That’s a bummer. It would help us point the discussion in the right direction.

> SVN doesn't work peer-to-peer or over email.

Why not? What is so different about people having their own subversion repositories over file protocol vs people having their git repositories?

Why would you not be able to send a subversion patch in an email?

As someone who uses git for 10 years, I understand it may not be as ergonomic as with git. But why not?

> The point was that Subversion alone is never enough if you want to collaborate.

Why not? Is git enough?


In which case they can use Subversion. Or Dropbox since it essentially offers the same features. I don't think there is anything bad about it, just that it solves different problems.

Older systems like CVS are also still in use, but it appears that none of the old systems really lasted more broadly and aren't useful for the needs of today.


I loved this XKCD. https://xkcd.com/1597/ use these commands unless you get an error, then save your changes, and download a fresh copy.


Pull, merge, rebase, push with local commits is what almost everyone cares about.


I think you're really, really undervaluing branches. The fact that patches are mostly shared as branches changes everything.

It's what enables three-way merges, and makes rebasing much more manageable.

The ability to traverse and jump to some other point in history is really missing here too.


You absolutely don't need the way Git does branching to do three-way merges. If you have local commits as first-class citizens, the need for local branch names disappears completely.

Local commits imply traversing history, but even that is terrible with Git. You can't just do obvious things like "git previous" or "git next" or "git checkout <commit hash>".


I'm not sure why you wouldn't want local beach names once commits are first class. And you don't have to do it the way got does, but aside from pijul I think everything does it about the same way git does.

Also, git checkout <commit hash> works?


Local branch names for short-lived branches are a crutch, there's nothing they convey that commit messages don't express better.

What happens if you amend a commit after checking it out?


> Local branch names for short-lived branches are a crutch

Hm. We must use git completely differently then. I can't really imagine what what I'd do without them.

> there's nothing they convey that commit messages don't express better.

Presumably you'd still want to maintain a list of HEADs, so you just want to alway refer to them by hash instead of a branch name? That's fine I guess -- not sure what it buys you.

> What happens if you amend a commit after checking it out?

Then it becomes a new commit? Not sure what you're getting at.


It's slowly changing. They finally added `git restore`, for example.


> It's status quo and no changing

Not entirely true: checkout was split into switch and restore, which is something I guess.


`git submodule update --init --recursive` is the magic phrase.

And, yes: submodules are really useful, as well as a PIA.


And you can also --recurse-submodules when cloning


> Git submodules are fine and can be really useful, but they are really hard

If an important software tool is hard to use to the point that most people avoid it, then it's not fine. It's broken.


I agree with all of this. Submodules aren't easy but they perform a useful job. It's hard to see how they could be made significantly easier. Where else in software is dependency management easy and convenient?


"Make 'git checkout' of the top level repo also set the submodules to the contents they should have for that top level commit hash" is probably the main change I'd want. The current setup means that checking out a branch or doing a git bisect gives you an inconsistent source tree, which seems like a really unhelpful behaviour.


What tools fix this?


If you're on windows, it just takes a few clicks with tortoisegit. I never have to remember these command line git commands.


The irony of needing a tool to make the first tool work well


Using something like Nix to specify the dependencies instead.


That sounds like a problem that exists between the chair and the keyboard.


In 25 years of software dev, and most of that in developing user interfaces to things, I have never found a commonly repeated error that could be attributed to user error. It's always badly designed software that ignores the user's mental model that's developed based on using the software.

In this case, "git clone" clones a repo 99% of the time, because 99% of repos are shallow and simple. For "git clone" to only clone the top level when you have su modules instead of prompting to ask if you want a deep clone, or doing a deep clone by default because that's the expected behavior, is pretty poor design IMO.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: