Git submodules are fine and can be really useful, but they are really hard. I've...

fanf2 · on June 18, 2022

Switching branches in a repository with submodules is a huge pain, especially if (like the Ansible repo) some branches have the subdirectory in the same repo like normal, and some branches have the same subdirectory in a submodule.

agilob · on June 18, 2022

There are git options for managing these difficulties like:

git config --global submodule.recurse true

https://git-scm.com/book/en/v2/Git-Tools-Submodules search for "git config"

lolinder · on June 19, 2022

> There are git options for managing these difficulties

This is git in a nutshell. Most defaults are very bad, and so using git from the command line is an exercise in learning which flags to set to achieve a sane workflow.

elcritch · on June 18, 2022

Thanks! That one option simplifies like 2/3 of the pain of using submodules. Doing a pull or checkout would just be regular commands.

Though it’d be nice if `git commit’ supported it too and just did a `git submodule foreach git commit …`.

jhugo · on June 19, 2022

Interesting!

I've never thought of doing commits in submodules. We use them a lot at work but only in a "import a specific revision of this other repo into our repo" sense, if we needed to make changes we would do them in the repo that the submodule points to and then just update the ref that the submodule points to.

What's your use case for making and committing changes inside the submodule tree?

s__s · on June 19, 2022

I naively thought that was the whole point of sub modules. Otherwise why not use a package manager?

The use case being that you can work on and update some independent system/repo while getting real-time feedback on how the changes interact with the system as a whole.

elcritch · on June 19, 2022

That's how I've come to use git submodules. It's been helpful when working with various embedded projects. They often don't get updated for months or years (ideally), and then you only want to pull in specific changes.

pas · on June 20, 2022

In a previous project we had a main repo that had the deploy/orchestration scripts, docs, etc and the actual components (frontend, backend, periodic and ondemand jobs and jobqueue, blob serving thing that did ACL in front of Ceph - because for some reason S3 was too mainstream :D)

doing a quick fix that changed an API and had to be done in backend and frontend and in the main.

though it's quite similar to todays gitops flow, but with submodules :)

chrsig · on June 19, 2022

It's easy to overlook that the submodule is a complete git repository in its own right.

When working on both, it's really annoying to have to commit to a dependency's repo, push, and pull down in the dependent repo. If you do the commit in the submodule, it's just immediately available...and you can still have whatever remote to push to. So it just cuts down on the number of checkouts you have to have.

jhugo · on June 19, 2022

That's all true, but work trees are cheap, and the workflow you describe means that your submodules are tracking a branch rather than pinned to a revision, right?

For our purpose that's definitely worse; the submodule is supposed to be a pointer to a specific tree, with that tree being the same for all developers. If we want to change the tree that is pointed to, we should commit and push a change to the submodule ref.

Tyr42 · on June 19, 2022

Yeah if you commit a change in a submodule, the parent repo gets marked as dirty as the ref changes and you need to commit.

BenFrantzDale · on June 19, 2022

Right. But why doesn’t it Just Work by default?

mhdhn · on June 19, 2022

From a design point of view, is there a good reason why this isn't the default?

lelandfe · on June 18, 2022

Agreed with all of this!

In git parlance, the submodule porcelain is hard to use (but the plumbing is good)

42e6e8c8-f7b8-4 · on June 18, 2022

That's the entirety of git. Extremely fast. Bummer UI. It's status quo and no changing. Despite using git for 10+ years, I frequently have to look up commands and then I end up scratching my head as to why the CLI UI is like that.

oneplane · on June 18, 2022

I suppose the biggest problem is that the concept of SCM/VCS is just not simple enough to make both easy and useful/advanced at the same time.

You can have a 'pull, merge, push'-only system, but at that point we're re-inventing subversion. So making it more advanced would mean we also need to have the knowledge and skills to do other activities correctly and that means the tooling can't make as many choices for you because there simply isn't a default way that works all the time.

Most efforts at git-alternatives run in to the same problems and either they'll be just as advanced and have the samen benefits and downsides or they end up less advanced but now it's not equally useful and you can't really make it work right.

akavel · on June 18, 2022

Mercurial covers generally the same concepts as git and is thus also not trivial to learn for someone uninitiated; yet its interface was like day vs. night when compared to git since their very early days. It proves one can design a decent interface if one actually tries to care about usability and friendliness. How I remember the past days, git won the rivalry squarely due to GitHub becoming popular (though I assume there were some reasons why GH chose git over hg).

mumblemumble · on June 18, 2022

Back in the day there were actually 2 or 3 different cloud SCM hosting providers that chose Mercurial. As I recall, a couple of them, BitBucket and Kiln, also had better Web UIs than GitHub did. Versus just GitHub offering Git, and it was kind of a duffer in my opinion.

GitHub has come a long way. But I would guess the main reason it managed to become dominant is not because it had a better product. (At the time, it didn't.) It's because Git benefited from the celebrity of its author, Linus Torvalds.

theamk · on June 18, 2022

It's crazy how people forget the past. Back in the day, git was "rewrite, rebases modify past to make beautiful commit" vs hg "rewriting past is bad, beautiful commits are lies about history". Turns out people don't care about truthful history.

(Nowadays mercurial can do rebase/amend just fine.. But it is too late)

mumblemumble · on June 19, 2022

I think that the truth is probably somewhere in between.

I do like to squash and rebase before moving changes upstream. But, to me, that isn't really history quite yet. Or at least, it's not history that's worth recording. All those micro-commits from the work in progress are, in some ways, more akin to my editor's undo history. Which is also something I don't save.

It's also clear, in hindsight, that Mercurial's original position on this subject failed to anticipate AWS credentials accidentally being committed to source control.

But I have also seen (and done) some amount of history rewriting in long-lived branches that I don't think would have been necessary if Git had had some of Mercurial's ergonomics. Workflows for merging two different repositories while retaining the commit history from each, for example.

capitainenemo · on June 19, 2022

FWIW, Mercurial has had "censor" command for blowing away the contents of those revisions with AWS keys since 2015.

Although once stuff is pushed to the public repo you're probably going to want to change those keys regardless. And if it's on the local one, there's plenty of options for removing the commit.

astrange · on June 19, 2022

Rewriting history is actively good because it means you can view it as a series of logical patches. I have worked on large projects using earlier versions of hg and they were absolutely full of merge commits just labeled "Merge" - some of them were safe, some of them had random changes in them, and some of them had automatic merged changes that actually caused problems.

It was also much slower than git. But I knew someone working on Google Code at the time who liked it better because it was "clean" and in Python.

capitainenemo · on June 20, 2022

It was much much faster than git at HTTP at the time though. That's why Google Code selected it. Also it was faster at imports which was why Mozilla selected it for their transition.

Some other things it was slower, that's changed over time ofc.

Also, in terms of clean history, mercurial has best of both worlds with phases and hidden by default commits to keep track of such cleanup.

astrange · on June 20, 2022

> Also, in terms of clean history, mercurial has best of both worlds with phases and hidden by default commits to keep track of such cleanup.

Yeah, it has more features now but at the time it didn’t. There was something called patch queue you could use for early stage work but that was all.

thayne · on June 20, 2022

The fact that mercurial is substantially slower than git was probably also a big factor.

tessierashpool · on June 19, 2022

in other words, the comment you’re replying to rewrote the past of rewriting the past.

Izkata · on June 19, 2022

Around 2013, having only knowledge of svn at the time, I tried both git and mercurial to see which I liked more and found git to be a lot more intuitive than mercurial.

It's been long enough I don't remember the details about why I didn't like mercurial, but fame power had nothing to do with it, nor did website integration - I was only using it locally. How it worked just didn't fit with how I thought about version control.

cycomanic · on June 19, 2022

That's not what I remember. Back in the days when people started implementing dvcs git was just much faster than everyone else (that was in fact the reason why Linus wrote it). Once the kernel was using it, its mindshare just grew much faster because of the publicity this implied. In other words it was largely a case of "if the kernel devs are using it it must be good". When GitHub and all the other hosting services started many still had mercurial or other dvcs (launchpad was bzr for example), but by that time the ship had sailed already I would argue.

masklinn · on June 19, 2022

> I suppose the biggest problem is that the concept of SCM/VCS is just not simple enough to make both easy and useful/advanced at the same time.

Git has and always had a singularity bad UI amongst dvcs.

pan69 · on June 18, 2022

> but at that point we're re-inventing subversion

So? Maybe Subversion is all most developers need?

dalyons · on June 18, 2022

Solo devs maybe. Merging and branching are so much worse in svn that it’s not good enough for “most developers”, aka those working on professional projects with a team of developers. Sure we used to make do with svn, but I have no desire to go back.

Falkon1313 · on June 19, 2022

Odd, I found branching and merging so much easier in SVN.

Git doesn't even technically have branches, just pointers to commits which can easily get mixed up, go headless and fail out in ways that just never happened in SVN.

And rebasing a branch with several commits can be a nightmare since you have to re-merge almost (but not exactly) the same code over and over again at whatever state it was in at some time in the past when some previous commit happened - unless you abort out and squash first. In SVN, you just merged the whole branch once and when both were in their final current state.

Of course, git's a lot more powerful, but with that comes complexity. SVN branching and merging was a snap comparatively.

kevin_thibedeau · on June 19, 2022

SVN will frequently insist that merge conflicts exist where there shouldn't be any if trees have been modified by deleting or moving directories. This is so pervasive that organizations will avoid doing merges because of the manual fixups you have to do on long lived branches. There's metadata now for tracking branch history but older versions couldn't figure out that two branches in a merge had a common ancestor. A puzzling thing to omit from a VCS.

COMMENT___ · on June 19, 2022

AFAIK, conflict resolution in Subversion is being improved:

* https://subversion.apache.org/docs/release-notes/1.10.html#c...

* https://subversion.apache.org/docs/release-notes/1.11.html#c...

Izkata · on June 19, 2022

> Git doesn't even technically have branches, just pointers to commits which can easily get mixed up, go headless and fail out in ways that just never happened in SVN.

I mean, if that's a reason to say git doesn't technically have branches, then neither does svn. It has subdirectories that you can make copies of at any level, and copy commits between them at any level, such that you can make an amazing repo-within-a-repo mess not possible in git.

mnd999 · on June 18, 2022

Honestly I’d take CVS over subversion. Merging was just bad on subversion and I found the IDE integrations to be confusing, obtuse and buggy.

Sure, CVS was limited but it was reliable and straightforward.

johannes1234321 · on June 19, 2022

> Sure, CVS was limited but it was reliable and straightforward.

It was so reliable that people complained about SVN using a DB (Berkeley DB) as backend, as manual fixing of CVS files was a "normal" part of operation and people didn't believe that might not be needed ...

DonHopkins · on June 19, 2022

...And better than SCCS! ;)

avgcorrection · on June 18, 2022

> Solo devs maybe.

Like I’m gonna bother to set up a Subversion remote for every little repo that I create for myself.

theamk · on June 19, 2022

The most beautiful part of svn, and the one I miss the most in git, is the fact that there is no need to set up tons of separate repo. Every subdirectory acts as a separate git repo.

This means you usually only have one svn repo, and you set it up the way you like. As an example, you may set things up so you can checkout:

server:/proj/small/hello - to get a single project

server:/proj/small- to get all small projects

server:/proj - to get all projects

If you already have one of those checked out, adding new project is as simple as "mkdir bar", "svn add bar", "svn commit". So juch easier than making new github repo. And multi-level hierarchical project nesting is still something impossible in git.

avgcorrection · on June 19, 2022

> So juch easier than making new github repo.

Were you paying attention to what I just wrote!? `git init`. What do I need a remote on the Internet for?

Rexxar · on June 19, 2022

You don't need internet, you can create the main repo somewhere else on your pc (preferably on a second disk to have very minimal backup) and you use file url to access it "file:///F:/MyRepositoriesAreHere/MyProject-repo/ProjectName/trunk"

avgcorrection · on June 19, 2022

That’s better but still an extra step.

theamk · on June 19, 2022

Don't you ever share your projects between machines? I at least have laptop, a desktop, and occasional raspberry pi. An ability to have personal projects on both is very handy. It also acts as a nice backup.

But yes, if all your work is on one machine, and you have backups of it, there is not much point in svn.

avgcorrection · on June 19, 2022

That’s a file sync. problem, not a VCS problem. And backup is a separate problem.

abirch · on June 18, 2022

Git makes sense for the Linux kernel and merging patches at scale. 99% of software development would be fine with subversion

lambda_lord · on June 18, 2022

Git completely replaced Subversion so quickly because the benefits were apparent even at a small scale. Subversion was centralized and slow whereas Git branches were cheap and fast. It turns out the distributed model is just a lot better. Even my college project teams benefited from the superior experience of Git.

pizza234 · on June 18, 2022

Interestingly, there was a distributed SCM build on top of SVN, called SVK (https://wiki.c2.com/?SvkVersionControl).

Being distributed, it solved the main gripes with SVN; it also added a better merging algorithm (https://foswiki.org/pub/Development/SVK/svk-visual-guide.pdf), solving another big gripe.

I was actually satisfied of it, and surprised that it never got attention, in particular, because there were no requirements in order to use it with existing SVN repositories. I'm actually baffled, because SVN is still active, so SVK would still be useful nowadays.

kevin_thibedeau · on June 19, 2022

May as well use Git SVN integration for this.

bsder · on June 18, 2022

Except that everybody changed to Github not git. And effectively recreated Subversion with caching.

PaulDavisThe1st · on June 18, 2022

"Subversion with caching" is not subversion.

I used subversion for a long time and was resistant to moving ardour.org to use git instead. 24hours after we switched (we never use 3rd party git hosting as our canonical repo), I was already convinced it was not the right choice, but an excellent choice.

astrange · on June 19, 2022

It's also Subversion where you can commit your changes and write the message before pushing, instead of at the same time, so you get the chance to review it. That's enough to make it better.

dopidopHN · on June 18, 2022

No, I worked with gitlab, bitbucket, custom git server installs in the last 3 years alone.

avgcorrection · on June 18, 2022

Anything can be made to work. But I don’t see why I would want to handicap myself with a truly centralized SCM system.

Sure, we use one canonical repository. All our “pull requests” are really merge requests, mostly to the main branch. So that’s pretty centralized, right? So why use a distributed VCS? Well, why use a local editor or IDE for code that is ultimately going to end up in the cloud somewhere? Sure, you might want to out of preference, but why should you be forced to? The fact is that wherever the code will end up is besides the point when it comes to how to develop it.

The truly important thing about distributed VCS is that it forces almost all of the operations on the repository to be usable locally. And why should it not be? What’s “git log”, “git blame”, or “git merge” got to do with whether there is one canonical repo or a hierarchy of upstreams?

I think that this idea that non-distributed VCS is somehow the default—as in the obvious, simple thing to implement—is just backwards. Of course the default assumption for any VCS operation—unless it has a name like “send-email”—should be that it operates on your own local copy.

Sure, we use a centralized repo structure. And the only call-central-command operation I use is “git push”. All the other fiddling and querying—and all the things that make version-control-as-history useful—is local.

dopidopHN · on June 18, 2022

Have you use subversion ? If yes, do you remember how slow it was? How cutting a branch and then merge it was seen as something for senior dev to handle ?

abirch · on June 18, 2022

I did use subversion in college. Never professionally, so creating our own branches wasn't a pain.

I think there is a reason why this XKCD was made: https://xkcd.com/1597/

avgcorrection · on June 18, 2022

“Delete your work and clone again” is ridiculous. Just goes to show that you don’t know what you’re talking about (or the xkcd guy for that matter).

abirch · on June 18, 2022

I'm talking about my needs, ditto with the XKCD guy. Basically I just need to be able to create a PR from my local changes. Not much more.

pantalaimon · on June 18, 2022

Subversion requires a server, so it’s not suitable for the small local one-off repo

rad_gruchalski · on June 19, 2022

No. It doesn’t. svn checkout file://path/to/repo works absolutely fine.

oneplane · on June 19, 2022

But it does need a server to collaborate, doesn't it?

rad_gruchalski · on June 19, 2022

Define a server. NFS will be enough for more than one person to use a repo over the file protocol.

It will work fine over a Windows share, Samba, NFS, and so on. It doesn’t need svn or http protocol to operate.

oneplane · on June 19, 2022

I could define a server but I'm not a dictionary so I won't.

SVN doesn't work peer-to-peer or over email. And that's fine. It's just not ready to go with only local tools.

Of course, you can use GitHub with Subversion just fine, but that wasn't the point. The point was that Subversion alone is never enough if you want to collaborate.

rad_gruchalski · on June 19, 2022

> I could define a server but I'm not a dictionary so I won't.

That’s a bummer. It would help us point the discussion in the right direction.

> SVN doesn't work peer-to-peer or over email.

Why not? What is so different about people having their own subversion repositories over file protocol vs people having their git repositories?

Why would you not be able to send a subversion patch in an email?

As someone who uses git for 10 years, I understand it may not be as ergonomic as with git. But why not?

> The point was that Subversion alone is never enough if you want to collaborate.

Why not? Is git enough?

oneplane · on June 19, 2022

In which case they can use Subversion. Or Dropbox since it essentially offers the same features. I don't think there is anything bad about it, just that it solves different problems.

Older systems like CVS are also still in use, but it appears that none of the old systems really lasted more broadly and aren't useful for the needs of today.

abirch · on June 18, 2022

I loved this XKCD. https://xkcd.com/1597/ use these commands unless you get an error, then save your changes, and download a fresh copy.

howinteresting · on June 18, 2022

Pull, merge, rebase, push with local commits is what almost everyone cares about.

kjeetgill · on June 18, 2022

I think you're really, really undervaluing branches. The fact that patches are mostly shared as branches changes everything.

It's what enables three-way merges, and makes rebasing much more manageable.

The ability to traverse and jump to some other point in history is really missing here too.

howinteresting · on June 19, 2022

You absolutely don't need the way Git does branching to do three-way merges. If you have local commits as first-class citizens, the need for local branch names disappears completely.

Local commits imply traversing history, but even that is terrible with Git. You can't just do obvious things like "git previous" or "git next" or "git checkout <commit hash>".

kjeetgill · on June 19, 2022

I'm not sure why you wouldn't want local beach names once commits are first class. And you don't have to do it the way got does, but aside from pijul I think everything does it about the same way git does.

Also, git checkout <commit hash> works?

howinteresting · on June 20, 2022

Local branch names for short-lived branches are a crutch, there's nothing they convey that commit messages don't express better.

What happens if you amend a commit after checking it out?

kjeetgill · on June 22, 2022

> Local branch names for short-lived branches are a crutch

Hm. We must use git completely differently then. I can't really imagine what what I'd do without them.

> there's nothing they convey that commit messages don't express better.

Presumably you'd still want to maintain a list of HEADs, so you just want to alway refer to them by hash instead of a branch name? That's fine I guess -- not sure what it buys you.

> What happens if you amend a commit after checking it out?

Then it becomes a new commit? Not sure what you're getting at.

ajkjk · on June 19, 2022

It's slowly changing. They finally added `git restore`, for example.

ximm · on June 19, 2022

> It's status quo and no changing

Not entirely true: checkout was split into switch and restore, which is something I guess.

dtgriscom · on June 18, 2022

`git submodule update --init --recursive` is the magic phrase.

And, yes: submodules are really useful, as well as a PIA.

acadapter · on June 18, 2022

And you can also --recurse-submodules when cloning

esperent · on June 18, 2022

> Git submodules are fine and can be really useful, but they are really hard

If an important software tool is hard to use to the point that most people avoid it, then it's not fine. It's broken.

fburnaby · on June 18, 2022

I agree with all of this. Submodules aren't easy but they perform a useful job. It's hard to see how they could be made significantly easier. Where else in software is dependency management easy and convenient?

pm215 · on June 18, 2022

"Make 'git checkout' of the top level repo also set the submodules to the contents they should have for that top level commit hash" is probably the main change I'd want. The current setup means that checking out a branch or doing a git bisect gives you an inconsistent source tree, which seems like a really unhelpful behaviour.

andi999 · on June 18, 2022

What tools fix this?

oreally · on June 19, 2022

If you're on windows, it just takes a few clicks with tortoisegit. I never have to remember these command line git commands.

alephnan · on June 19, 2022

The irony of needing a tool to make the first tool work well

abathur · on June 18, 2022

Using something like Nix to specify the dependencies instead.

Phurist · on June 18, 2022

That sounds like a problem that exists between the chair and the keyboard.

onion2k · on June 18, 2022

In 25 years of software dev, and most of that in developing user interfaces to things, I have never found a commonly repeated error that could be attributed to user error. It's always badly designed software that ignores the user's mental model that's developed based on using the software.

In this case, "git clone" clones a repo 99% of the time, because 99% of repos are shallow and simple. For "git clone" to only clone the top level when you have su modules instead of prompting to ask if you want a deep clone, or doing a deep clone by default because that's the expected behavior, is pretty poor design IMO.