IMO garbage collection is the epitome of sunk cost fallacy. Thirty years of good research thrown at a bad idea. The reality is we as developers choose not to give languages enough context to accurately infer the lifetime of objects. Instead of doing so we develop borderline self-aware programs to guess when we're done with objects. It wastes time, it wastes space, it wastes energy. If we'd spent that time developing smarter languages and compilers (Rust is a start, but not an end) we'd be better off as developers and as people. Garbage collection is just plain bad. I for one am glad we're finally ready to consider moving on.
Think about it, instead of finding a way of expressing when we're done with instances, we have a giant for loop that iterates over all of memory over and over and over to guess when we're done with things. What a mess! If your co-worker proposed this as a solution you'd probably slap them. This article proposes hardware accelerating that for loop. It's like a horse-drawn carriage accelerated by rockets. It's the fastest horse.
> It wastes time, it wastes space, it wastes energy.
But all of these are much cheaper than developer labour and reputation damage caused by leaky/crashy software. The economics make sense.
Anecdotally, I spent the first ~6 years of my career working with C++, and when I started using languages that did have GC, it made my job simpler and easier. I'm more productive and less stressed due to garbage collection. It's one less (significant) cognitive category for my brain to process.
> But all of these are much cheaper than developer labour and reputation damage caused by leaky/crashy software.
And that's why so much effort has gone into making it fast and low latency, but it was a false dichotomy: We can have memory safety without garbage collection.
- Automatic reference counting: Most people know about Objective-C's efforts in this space, but it's admittedly less automatic than programmers would like so perhaps it doesn't get enough attention. And yet it should, since q/kdb+ uses reference counting exclusively, and it holds top performance numbers on a large number of data and messaging problems.
- Linear lisp[1] asked functions to fully consume their arguments making (cons x y) basically a no-op. Again, no garbage collection, and no fragmentation (at least as long as all objects are cons), and yet no matter how promising this path looked, garbage collection got much more attention.
- Rust's borrow checker/tracker makes ownership explicit; somewhat of a middle-ground between the two...
There's other scattered efforts in this space, and I don't know about all of them but for more on "everything we know about languages is wrong", also consider that Perl5 uses plain old reference counting, executes the AST directly, and still outperforms python for data and IO[2]!
I think the thing to take away is that memory management has to be automatically managed for developer sanity, not that garbage collection is the way to do it.
This comment is just bad and misinformed all over.
(1) Automatic Reference Counting doesn't work; its equivalent in interpreted languages is, well, reference counting, which can be optimized quite a lot (though has some issues with multithreading), but cannot collect cycles.
(2) therefore, if you want reference counting, you have to either also have GC (for cycles), or program carefully to avoid creating cycles (which is then only marginally better than C++)
(3) your comment on Python vs Perl5 is just nonsense, Python uses reference counting as well (along with the occasional GC to collect cycles)
(4) linear / uniqueness types (not exactly the same, but both can be used to ensure safe memory management) impose significant mental overhead on programmers as they prohibit many common patterns
(5) Rust features a lot of "cheating" - pointers that allow mutation from multiple threads, reference-counted pointers, etc. - so obviously just ownership&borrowing isn't good enough
conclusion: you can't have your cake and eat it too (at least according to current cutting edge research and implementations) - you either have GC or you have to be very careful/restricted when writing code
> (1) Automatic Reference Counting doesn't work; its equivalent in interpreted languages is, well, reference counting, which can be optimized quite a lot (though has some issues with multithreading), but cannot collect cycles.
This is what weakrefs (or better data structures) are for. The Linux kernel uses reference counting incredibly effectively for almost every structure. I think that pretty much discounts any argument that reference counting cannot work.
> (5) Rust features a lot of "cheating" - pointers that allow mutation from multiple threads, reference-counted pointers, etc. - so obviously just ownership&borrowing isn't good enough
"unsafe" isn't cheating and isn't even related to whether garbage collection is a good idea or not. Yes, Rust's ownership and borrowing model doesn't cover all programs and hence "unsafe" is required -- but "unsafe" doesn't use a garbage collector so the existence of "unsafe" doesn't change that Rust works without one.
Weakrefs require “careful programming” i.e. knowing which ref must be a weakref.
By “cheating” I don’t mean “unsafe” but all kinds of pointers that aren’t just unique or borrowed such as different reference-counted pointers, the existence and use of which demonstrates that ownership+borrowing isn’t a complete memory-management technique. You need reference counting, therefore you need GC or careful programming.
Hopefully we discover some technology that makes these needs go away, but not so far AFAIK.
I think we're both in agreement, I was phrasing things in a manner which I thought would better reference how GP thinks of unsafe. Obviously the API itself is safe, but the "cheating" GP was referring to was the fact that Rust requires "unsafe" under the hood in order to implement some safe operations (because the memory model doesn't cover all cases). You and I agree that this isn't a problem.
Reference counting is a garbage collection algorithm. It just isn't a very good one compared to semi space because its runtime is the number of dead objects not live objects.
Refefence counting is barely classifiable as garbage collection. The runtime is O(1) because you free the object immediately when it dies, there's no stage where you start looping over dead objects.
>Rust features a lot of "cheating" - pointers that allow mutation from multiple threads
You're confounding GC and synchronization of multi-threaded access here. What rust does with things like Arc other languages will need some other synchronization primitive to enable. Their GC or manual memory management does not guarantee the multi-threaded semantics. That's one of the advantages of the rust memory model, some patterns become multi-threaded without extra primitives (read-only borrows) and all of the primitives are compile-time checked for correctness.
>reference-counted pointers, etc. - so obviously just ownership&borrowing isn't good enough
In practice ownership and borrowing gets you a long way. I've only had to use a reference counted primitive (Arc) when that was the exactly correct semantics. I had a cache and both the caller and the original owner could drop the value first based on external input. No other static decision can be made and the ergonomics of writing that are very good. Rust seems already quite a bit better than having to be "very careful/restricted when writing code".
You start your response with "No" but don't actually argue against anything I've said.
> Rust is conflating memory safery and multithreading safety.
Rust doesn't conflate anything. It just gets some multi-threading safety for free as a result of its memory model. And then gets quite a few other multi-threading safety characteristics that no other mainstream language has from the type system. That several features in the language work together well is just a sign of good design.
This is where you're mixing the two unrelated concepts. Garbage collection does nothing for multi-threading safety. While the object is live and two threads hold a reference to it you need to synchronize access somehow. GC can't do anything about that.
> AFAIK many wait-free data structures are pretty much impossible without a GC.
This is unrelated to the previous points I made but isn't true either:
By "conflating" I mean that Rust's memory management technology provides just one kind of multi-threading safety (i.e. "multiple readers/single-writer" model), but to implement other kinds of multi-threading-safe algorithms/data structures, you need to transcend beyond ownership+borrowing.
I think "epoch-based memory reclamation" (which Linux kernel also uses IIRC) is still a (primitive / limited / contained) form of GC. For those that aren't familiar with it, a quote from the article @pedrocr posted:
> The basic idea is to stash away nodes that have been unlinked from the data structure (the first source of reachability) until they can be safely deleted. Before we can delete a stashed node, we need to know that all threads that were accessing the data structure at the time have finished the operation they were performing.
So basically, the runtime needs to keep track of allocated objects until it's certain there are no more references to them... sounds like GC (the difference between this and a real GC is that (1) you know there's no cross-references between the nodes, so you don't need to mark&sweep but just need to keep track of the epoch, and (2) the nature of lock-free data structures is such that threads will move to the next epoch fairly soon (unless they infinite-loop)).
> By "conflating" I mean that Rust's memory management technology provides just one kind of multi-threading safety (i.e. "multiple readers/single-writer" model), but to implement other kinds of multi-threading-safe algorithms/data structures, you need to transcend beyond ownership+borrowing.
That's not "conflating" anything. You're just saying that "to do other things you need other stuff". Sure, to implement multi-threading you also need locks and semaphores and atomic variables sometimes. Rust provides all of those. There's nothing specific to GC vs non-GC there.
> So basically, the runtime needs to keep track of allocated objects until it's certain there are no more references to them... sounds like GC
Reference counting is not GC by most definitions. And if you want to define it like that Rust already has it.
> (the difference between this and a real GC is that (1) you know there's no cross-references between the nodes, so you don't need to mark&sweep but just need to keep track of the epoch, and (2) the nature of lock-free data structures is such that threads will move to the next epoch fairly soon (unless they infinite-loop)).
So the difference between this and GC is that it's nothing like GC... If reference counting, RCU, etc are all GC then there's nothing to discuss as it turns out languages like Rust have all the GC you need.
I agree, the model of lock-free data structures is pretty much perfect for reference counting (no cycles, shallow reference tree... just needs to be thread-safe). I wonder then why epoch-based approaches are used, which (sound like they) are more difficult / tricky to implement... I honestly have no idea. The only thing that comes to mind is that they seem to require another atomic primitive (atomic increment, otherwise lock-free data structures only require compare-and-swap), which might not be available on all platforms...
> AFAIK many wait-free data structures are pretty much impossible without a GC.
That makes no sense. You can implement a garbage collector in a manually managed program (or even just the specific parts that gain you the attributes required to accomplish something), so even if a GC made certain data structures easier, there's nothing to prevent their use in a manually managed language. The argument you're making is sort of like saying that certain things can only be accomplished in ain C or Java, and not in assembly.
More specifically to your actual assertion, anything a GC does for lock free data structures could be handled by a special routine or manually in a language where you handle memory manually.
(1/2): It most certainly does work: q/kdb is an interpreted language and doesn't have cycles. Being an array language, nesting is uncommon, so chasing pointers isn't an issue either.
(3): I didn't say otherwise Python doesn't also do reference counting. Python however does have a bytecode and a GC, and perl5 does not and executes the AST directly. And yes perl5 is faster for many problems[†]. I think it's quite common that people believe that GC and bytecode are a win, and this is a simple and accessible example where it is not true.
(4): More significant than what? q/kdb is implemented in this style, and it's a huge advantage (reducing bugs) having your library handle allocation/deallocation over having the caller do these things so I tend to do it as well. What exactly are you comparing it to? The mental load of figuring out what's causing pauses is quite high since it requires considering the entire program, but linear programming is very easy and it obviates a lot of need for other kinds of memory management techniques.
(5): Good enough for what? Why is mixing static analysis with runtime analysis "cheating"? What would be "fair" in your mind?
It is weird, I do not feel restricted when I write non-GC code. Btw. you still need to pay attention with GCd languages no to leak references that cannot be GCd. Java has this sort of problems. Rust made it easy and simple to deal with memory in a non-malloc level while not having GC. Erlang on the other hand has a VM that makes GC much faster than in other languages. I can live with that because if you do it right it does not impact end user latency. For me both Rust and Erlang gives the cake + eat it feeling.
"I can live with that because if you do it right it does not impact end user latency."
The vast majority of GC use does not impact end user latency. Thinking otherwise is either the availability heuristic at work, or you work in AAA videogames or another ultra-high-real-time-performance industry and don't realize you're not in the center of programming work, you're on an extreme. (A perfectly valid and sizeable extreme.)
This may also be why you don't feel restricted; if you happen to work in any of the many places where a tree of references is adequate, you'll find Rust-style management to be a treat. If you work pervasively with graph structures in the core of your program, though, the experience can change. Pervasively-graph-type-structures are generally just a question of how you want your automated garbage collection to work, because I'm pretty sure getting the manual management correct otherwise is effectively impossible without an algorithm that's going to look a lot like a GC.
That said, despite my disagreement, I also disagree with the people who, as of this writing, have downvoted your comment into grey. I feel like I've been seeing a burst of "downvote as disagreement" in the past couple of weeks.
"Downvote as disagreement" is such a pervasive pattern that I Find myself doing it automatically even when I know that's not what downvote is "supposed" to be for. I suspect it's because I'm more likely to view posts that I disagree with as being poorly thought-out.
This even extends to platforms where downvoting is not a thing - witness 4chan's struggles with "sage is not a downvote" ("sage" being a function which allowed you to reply to a thread without bumping it to the top and extending its lifespan - again, intended to mark a thread as low qualit) until they finally gave up and made saging invisible.
Given how thoroughly inescapable downvoting-to-disagree seems, I wonder if it might not be better to just declare defeat, allow downvotes and upvotes to mean nothing beyond a signal of protest or agreement, and have a separate "this comment is low quality / this comment is high quality" buttons.
I don't think it's easy to point to one thing Erlang is doing and say that is why it's faster, partly because I think the way Erlang programmers are encouraged to write their programs also has an effect; The language itself may have a greater effect on its GC performance than the GC implementation itself.
As an example: In Erlang, we have function calls like F(X) but we also have message sends like F!X. Message sends get one type of memory treatment, while function calls get another one. The lifetime of objects you send via messaging is going to be different, and yet Java only has function(method) calls, so there's no way to hint this to Java. Maybe this information can be recovered with static/dynamic analysis, but in Erlang we have syntax to tell the VM what's going to happen.
If you want to learn more about Erlang's internals, I can recommend Joe's original paper on the subject:
Thanks for the links. That being said, new developments in GCs on the JVM (like Shenandoah and ZGC) are yielding very impressive results. Things like < 1ms pause times for heaps in the TB range. No other free GC for any other language approaches this as far as I'm aware, including golang's which people claim is fast, but completely gets destroyed for even heaps in the GB range (I did some benchmarks).
>It is weird, I do not feel restricted when I write non-GC code.
Some people don't feel C has any safety issues either -- perhaps they think all others are idiots and their code is mistake proof because they do it right.
> (4) linear / uniqueness types (not exactly the same, but both can be used to ensure safe memory management) impose significant mental overhead on programmers as they prohibit many common patterns
Such as?
I've found that linear style is generally just good coding style. When I get caught up on something I can't express linearly, it generally means I was trying to plough through with a dodgy approach; stopping and thinking through would have resulted in better code even in a GCed language.
You came in guns blazing and then covered yourself in embarrassment already in the first two points you made.
The thing you said doesn't work is in use on millions of devices, working just fine. Or maybe you'd like to be more precise than "work".
Avoiding cycles and thinking about ownership is not programming carefully, it's merely programming. Really, manual memory management is not rocket science folks, just error prone. And RAII is reliable to the point that anyone that is capable of writing software can wield it.
C++ and Objective C inherited C's memory model and thus don't have a good memory management option. Rust is explicitly a systems language, meaning stronger memory models are out of scope, and then inherited most of C++'s idioms.
Other than those, what popular languages are specifically choosing reference counting as a primary GC method?
Swift requires reference counting for compatibility with Objective-C code.
Apple could have gone the .NET way (RCW) to interface with COM reference counting, but that naturally would complicate the bindings to Objective-C runtime and existing libraries.
> but it was a false dichotomy: We can have memory safety without garbage collection.
but then say
> Automatic reference counting
(ARC henceforth). Naive ARC is AFAIK very expensive as it causes a lot of updates at each pointer 'take' even if it's a read only (chasing pointers), trashing caches. Poss. even worse if multithreading is used as a memory barrier may have to be issued. Also it does not collect cycles. Also I've read it causes memory fragmentation (take that with a pinch of salt, that's from my memory!).
ARC has some advantages but it has some real costs.
Also and even more, if ARC is what you think it is, how is it not an implementation of GC?
> Perl5 uses plain old reference counting, executes the AST directly, and still outperforms python for data and IO[2]!
Ok... 1) outperforming python is not a high bar. 2) IO is usually handed off to the underlying OS so talking about IO performance is probably misleading. 3) Python now has a fallback non-ARC GC to pick up cycles. 4) GC in python is IIRC an implementation detail, not language-mandated. Finally, if it outperforms it, please provide a reference.
I'll check out linear lisp if I can find the time.
> Naive ARC is AFAIK very expensive as it causes a lot of updates at each pointer...
Yes, and I think that's why people shy away from it. Language design can help a lot though: Array languages use reference counting almost exclusively because nesting objects is uncommon. That means no pointers to chase the vast majority of the time.
> if ARC is what you think it is, how is it not an implementation of GC?
Garbage Collection involves crawling live objects from roots to determine what's still alive.
If you're not doing that, you might still be doing automatic memory management, but you're not doing GC.
There's tricks (gardens, barriers, treadmills, colouring, etc) to do less work or less work at a time, but it remains predicated on the assumption is that you have more garbage than you have live objects so you're doing less work by only crawling the live objects. If this is true, then GC should be a win, but it's not true an awful lot of the time, which is why manual memory management always beats GC for performance (latency and throughput).
[†]: notwithstanding your earlier point about pointer chasing...
> I'll check out linear lisp if I can find the time.
I can recommend everything Baker has written on this subject: it's all gold.
> Garbage Collection involves crawling live objects from roots to determine what's still alive.
> If you're not doing that, you might still be doing automatic memory management, but you're not doing GC.
Whether GC and automatic memory management are [significantly] different terms depends on who you ask, I don't think there is a consensus on that. In many cases however people tend to say GC as shorthand for tracing GC.
I think in many cases we see ARC is "good enough", and in a JIT scenario you could remove many RC changes by effectively collapsing/merging scopes. OTOH it is quite clear that for the same workload tracing inevitably performs better.
A simple trick gets you around this: Don't use your system malloc() but instead map a tempfile (or a memfd) with MAP_SHARED and use the STM primitives to update your references. If you combine this trick with __attribute__((address_space(256))) your "pointers" are also the offsets in the tempfile.
Or don't: Maybe you want two heaps. fork() gives you the option!
Lots of things that people do don't work after `fork()`, though, to the point where `fork()` has become largely irrelevant in new code. For example, threads are not generally compatible. You can also easily run into issues with open files, file locks, etc.
Of course, your point might still be relevant in "legacy" environments with poor threading support anyway. I would argue that those are gradually going away, as even interpreted languages without exposed threading support are increasingly using threads or adopting threads to accommodate performance needs.
This is not legacy. Any application in Ruby, Python, Perl, ... is affected. Majority of the backends of current web have the issue of duplicated heap in all the workers. It's made slightly better by Python introducing gc.freeze in 3.7 (https://docs.python.org/3/library/gc.html#gc.freeze) and Ruby doing something similar soon, but not great.
Garbage collection (to my knowledge) was invented by McCarthy back in the late 1950s[1] in the following three paragraphs of his paper "Recursive Functions of Symbolic Expressions and Their Computation by Machine":
Nothing happens until the program runs out of free storage. When a free register is wanted, and there is none left on the free-storage list, a reclamation cycle starts.
First, the program finds all registers accessible from the base registers and makes their signs negative. This is accomplished by starting from each of the base registers and changing the sign of every register that can be reached from it by a car − cdr chain. If the program encounters a register in this process which already has a negative sign, it assumes that this register has already been reached.
After all of the accessible registers have had their signs changed, the program goes through the area of memory reserved for the storage of list structures and puts all the registers whose signs were not changed in the previous step back on the free-storage list, and makes the signs of the accessible registers positive again.
That's mark/sweep!
He adds as a footnote "We already called this process ``garbage collection'', but I guess I chickened out of using it in the paper--or else the Research Laboratory of Electronics grammar ladies wouldn't let me." but doesn't explain where the term came from. I suppose it's possible he didn't invent it, but I've never seen anything on that and I think most of the existing strategies used for memory allocation at the time were very explicit -- FORTRAN didn't even have ALLOCATE yet!
>it causes a lot of updates at each pointer 'take' even if it's a read only
This isn't required in the read-only case. If you're only borrowing the value, for example, taking a const& in C++ and not holding onto it, then you can cast to the underlying type and use the reference. In Rust you can also just borrow:
ARC is usually atomic reference counting. Indeed that will trash caches because it must invalidate the cache line where the reference count lives in other cores whenever it's updated. RC without atomicity won't invalidate cache lines on other cpus.
FWIW, GCC has an extension to shared_ptr that doesn't use atomic reference counting:
Any heap allocations not bound to an arena and not part of a compacting garbage collector is likely to cause memory fragmentation. Stack allocations for example are implicitly compacted.
>Also and even more, if ARC is what you think it is, how is it not an implementation of GC?
This is a "can submarines swim" type of question and is completely uninteresting (imo).
>Finally, if [perl5] outperforms [python], please provide a reference.
which is why I carefully used the word Naive, however I had little time to expand on it so I can't blame anyone for overlooking that.
> If you're only borrowing the value...
Then you give examples of manually offloading memory management to the programmer. It should be done, and I believe the book I reference covers it (can't find it now, sorry), but definitely not by the programmer else it's not automatic GC.
> ARC is usually atomic reference counting.
That won't make any necessary inter-core or inter-socket coherency traffic disappear.
> Any heap allocations [...] is likely to cause memory fragmentation.
My memory was that refcounting was particularly bad but feel free to disregard that as I can't justify it. I may well be wrong.
> This is a "can submarines swim" type of question and is completely uninteresting (imo).
In regards to memory fragmentation, you probably think of GC with compaction. Not all GCs are like that, but that is a plus for GCs if use the heap a lot and need cache friendliness.
Compare the "Unified Theory of Garbage Collection" (https://www.cs.virginia.edu/~cs415/reading/bacon-garbage.pdf). Reference counting is a variant of garbage collection. But yes, you can take ideas from their to make your garbage collection better.
Linear/uniqueness typing is interesting. Clean is another language that uses something like it, and you can implement these ideas in Haskell as well.
Though in practice you want affine typing, not linear typing. (Basically, you want to be able to drop arguments without explicitly having to consume them.) The important bit about linear typing / affine typing is that you use your arguments at most once, not that you have to use them at least once.
That's how you can model mutation in a pure FP way, too. Not just GC.
I think you mean "Automatic Garbage Collection", rather than just "Garbage Collection". The latter is exactly what you do in C when you use `free()`, the former is an automated system that does `free()` on your behalf at runtime. Indeed, the fact that you consider Rust's borrow checker to be part of the non-gc methods of handling memory, make it clear that you do indeed recognise this distinction, even thought it is merely implicit.
For the record, Automatic Reference Counting can't collect self-referential objects. And is also considered a method of automatic garbage collection. I haven't heard of any research on ARC being done statically, if you have that it would indeed be interesting.
Linear Lisp indeed looks extremely promising, I don't remember finding any code examples when I looked it up, it seems to be something that's languished in the theoretical department. HBaker has a series of really interesting papers, by the way, it's worth just going through his website :) There are a lot of nuggets of gold there.
> I think you mean "Automatic Garbage Collection", rather than just "Garbage Collection". The latter is exactly what you do in C when you use `free()`, the former is an automated system that does `free()` on your behalf at runtime.
To me, the difference between garbage collection and other memory management systems is that gc scans memory looking for memory that can be reclaimed. I don’t consider free() to be garbage collection. free() is the final step you do in any memory management scheme, it’s not a specific technique of memory management.
With moving/compacting collectors, you never even call free(), and this is a huge advantage over a mark/sweep:
First mark touches all the live objects, then sweep touches all the objects that aren't live. You're touching every object every cycle. Yuck.
A moving collector touches all the live objects, but now they have new memory addresses. We never need to touch the old garbage at all! Most advanced GC techniques are around trying to revisit live objects less frequently, but they all have this basic theme.
Other memory management techniques explore the other side: Just touching the garbage. That's what happens when you malloc+free manually, or do (automatic) reference counting.
So the value-question restated is this: Is there more ham than spam?
If we have more garbage, then the ideal GC strategy wins. If we have more living objects (and few/no garbage) then literally anything else wins.
We do seem to have more garbage in atom languages like Java, Erlang, Python, and so on, so it makes sense that GC gets much more love than the other side of the fence, but I wonder often if we're missing a trick.
Your definition of garbage collector excludes reference counting, which does not require scanning memory. Reference counting is a form of garbage collection. The type of garbage collection you are referring to is the sub-type of "tracing" garbage collection.
However, I agree, "free()" is not garbage collection, but rather a form of memory management, of which garbage collection is a subset, and manual (free) is another.
It depends who you ask. In my point of view, "reference counting" is a strategy that has nothing inherently to do with memory management and can even be manual instead of automatic. GC as the name implies should involve a collector of sorts.
Plain old refcounting cannot resolve circular references, which are inevitably leaked. To avoid leaks, most refcounting garbage collectors have a heuristic to occasionally run traces to clear out the cruft.
I think the most widely accepted definition for "garbage collection" is something which frees memory automatically, as opposed to manually freeing memory (or other memory management schemes). In fact, the second paragraph of Wikipedia's page on garbage collection[1] says that "Garbage collection is essentially the opposite of manual memory management, which requires the programmer to specify which objects to deallocate and return to the memory system".
Reference counting is garbage collection. It's part of the mix of potential strategies for automatic collection of memory. As is static analysis of data flow, fwiw.
In taxonomy, I say we have "automatic memory management" when we have some kind of `malloc()` type device and no need to `free()` it.
Garbage collection is about visiting live objects to figure out what's alive and ignoring the dead stuff. Every theory about why GC can work (or be useful) is predicated on the idea that there is more garbage than live objects so you're touching less stuff.
If all you do is mark/sweep, you're visiting the garbage as well, so GC research has led neatly into moving collectors (which don't ever touch the garbage). That's a win, and that's why GC is worth study.
There are other kinds of "automatic memory management" though: reference counting, arena allocators (at a stretch), and object memory (I've only seen this on mainframes), and they're all worth some attention.
> Garbage collection is about visiting live objects to figure out what's alive and ignoring the dead stuff
That's a description of tracing GC, not a definition of GC.
GC is whatever technology that enables what you've called automatic memory management: a kind of illusion that there is infinite memory, from the programmer's perspective.
That might involve some mix of reference counting (Python is probably the biggest example), tracing (what you seem to think is all of GC), data flow analysis (e.g. escape analysis in some Java GC / JIT integration), etc.
> GC is whatever technology that enables what you've called automatic memory management: a kind of illusion that there is infinite memory, from the programmer's perspective.
That's definitely not what McCarthy meant by it. I understand that's probably a common view nontheless, but I can use sbrk() and the fact my computer has gobs of ram/swap, and I'm not going to call that garbage collection. I don't think this definition is useful.
If you re-read what I wrote now understanding what I mean when I say "garbage collection", perhaps we can talk about strategies for providing that infinite view of memory instead.
Actually, I think process termination is pretty useful as garbage collection. It's normally much more reliable for releasing resources, for one thing. And there's little sense in cleaning up RAM when you're just going to quit.
The kernel still deallocated those pages as an explicit [manual] result of a free-like call -- that just so happens to be named exit().
To seriously tell people that every C program running on unix has automatic "garbage collection" built-in seems like it would create very confusing conversations.
GC has always meant tracing GC while reference counting (ever since McCarthy invented it) was considered another form of automatic memory management (like using arenas, doing escape analysis, etc...). Only recently in hacker news have they become all merged into the same word, for some reason.
> Only recently in hacker news have they become all merged
This is demonstrably not true. The Garbage Collection book by Jones is the key text in this area, and it covers reference counting under that term. It was published in 1996, before Hacker News existed.
There is also Bacon's Unified Theory of Garbage Collection, 2004, which said the two are on a spectrum that meets in the middle.
> all high-performance collectors (for example, deferred reference counting and generational collection) are in fact hybrids of tracing and reference counting
> Rust's borrow checker/tracker makes ownership explicit
So no compaction, no pointer-bump allocations? You have to reimplement these manually, aka reinvent GC while having ref-counter/region-based-mm overhead?
In the ~15 years I spent building software in C++ I don't recall a single time that I wished for garbage collection. By using RAII techniques, it was always possible (nay, easy!) to write code that cleaned up after itself automatically. I always found it easier to reason about and debug programs because I knew when something was supposed to be freed, and in which thread. The only time that use of simple reference counted smart pointers did not work was when you had cyclic references: but cyclic object graphs caused other problems in any case - so I would always refactor my design to avoid creating them in the first place.
In contrast, in the ~10 years I spent working in Java, I frequently ran into problems with programs which needed an excessive amount of memory to run. I spent countless frustrating hours debugging and "tuning" the JVM to try to reduce the excessive memory footprint (never entirely successfully). And of course, as soon as your JVM needed to interact with native code (e.g. a quant library written in C++), you were even worse off, because there was no language support whatsoever for ensuring when, or even if, some native memory would be freed. It made it even harder to debug these leaks - because the "leak" was always very complicated to reproduce.
Garbage collection is an oversold hack - I concede there are probably some situations where it is useful - but it has never lived up to the claims people made about it, especially with respect to it supposedly increasing developer productivity.
While RAII is a very good technique, it is not the solution to all issues here. I read that some concurrent datastructure implementations would not be possible without a GC. Furthermore, when a project reaches a certain level of complexity, it ends up implementing GC anyway (see Unreal engine).
I agree that RAII does not solve everything - in particular, issues with memory fragmentation. However, I prefer the direction being taken by Rust and suggested by the top level poster: rather than having to run a separate thread which tries to infer which resources are no longer needed, with potentially large runtime costs, instead augment the language semantics with a clear ownership model, and thereby give the compiler enough information to generate code that runs in a stable, predictable memory footprint.
That reminds me of the VILW argument: we will make languages really expressive and compilers really smart, then we can get rid of fancy CPU hardware that do instruction scheduling dynamically.
It doesn’t always work. A dynamic tracing GC (or reference counting or using arenas) can be more efficient in ways that static approaches probably won’t be able to effectively emulate. Rust is a great experiment for seeing how far we can go in this area, so we will see.
Solving memory fragmentation requires moving objects around. Usually you get that via a moving (or compacting) garbage collector. But there's no reason RAII style resource allocation couldn't move things around.
And there are plenty of garbage collection strategies that don't move objects around.
If you have a use-after-free it means that you made a wrong assumption in your code about the lifetime of the object. A GC would hide the issue and potentially lessen the gravity of the bug but it doesn't resolve the core issue which is that you probably have a fundamental logical problem in your code.
I'm a bit "GC hater" so obviously I'm heavily biased but to me it just means that GC make it easier to write sloppy, poorly architectured code where you have a mess of a data-dependency graph and nobody knows who owns what and when things need to be deleted, so you let the GC (hopefully) figure it out. In my experience this leads to all sorts of issues independently of memory management, such as having obsolete data lingering in certain parts of the program because it's not expunged properly.
IMO GC or not having clear ownership semantics are necessary to properly architecture anything more complicated than a script. And if you have ownership semantics then why do you need a GC anyway?
The nicest point about GC is that, for the vast majority of data, you no longer need to have a concept of ownership at all.
Ownership is not a fundamental architectural property - it is "only" a powerful technique for tracking data with a specific lifetime (of course, even in GC languages you still need proper ownership semantics for some pieces of data).
> If you have a use-after-free it means that you made a wrong assumption in your code about the lifetime of the object. A GC would hide the issue and potentially lessen the gravity of the bug but it doesn't resolve the core issue which is that you probably have a fundamental logical problem in your code.
The basic assumption with a GC'ed language is that the lifetime ends once the object isn't referenced anymore. You rarely have to deal with it at all, so those "logic errors" happen much less frequently.
If we look at the major real-world languages without GC (C/C++), it is your job to figure out the lifetime. You will fuck this up unless you structure your program in very disciplined way and that's frankly too much to ask of the vast majority of programmers.
What if your domain problem (say, compilers, symbolic algorithms) doesn't have a clear notion of ownership because everything is a shared DAG that may survive for arbitrarily long? Not every problem has clear ownership imho (and tools in C or C++ in these domains resort to their own refcount/custom GC implementation anyway).
imo you solve these problems by segregating the ownership logic and relationship logic of the data structure.
In other words the DAG would never be traversed to allocate or free memory, each vertex would be held by a vector or other container and the edges represented by an adjacency list/matrix.
> A GC would hide the issue and potentially lessen the gravity of the bug but it doesn't resolve the core issue which is that you probably have a fundamental logical problem in your code.
No, it solves the problem because resource lifetime is no longer a concern. A field advances as we decrease the number of irrelevant details we need to attend to.
Turns out, most of the time, memory is one such concern. You can argue that we need better approaches when memory is a concern, but that doesn't discount the value of GC as the right default.
Once upon a time, this statement may have been true, but it isn't any more. ASAN and Valgrind are widely available, and runtime test suites are much more prevalent than they used to be.
> But all of these are much cheaper than developer labour and reputation damage caused by leaky/crashy software. The economics make sense.
Of course, with traditional languages, that's the trade-off we're being asked to make. That's my point! We need to develop languages that accurately encapsulate lifetimes statically so that we can express that to the compiler. If we do, the compiler can just make instances disappear statically when we're done with them -- not dynamically! Like Rust but without the need for Rc and Arc boxes. Rust gets us 80% of the way there.
The truth is with most of the Rust I write, I don't have to worry about allocation and deallocation of objects, and it happens. It's almost ubiquitous. We need to extend that to 100%.
This only works if your language is strict. With a lazy language like Haskell, you can't statically infer (at least naively with respect to scoping rules) when something lives and dies.
Performing this statically is hard, and GHC tries to do it (Demand analysis).
But without a GC, these sorts of languages would be a no-go, as would logic-based languages like Prolog which need to GC their internal data structures.
While we now have the expertise for eliminating the GC from single threaded strict programs (multi threading in Rust is still quite complex, and you do see Rc<Box<..>> more often than not in these settings, which is essentially refcounting/GC), this _does not scale_ to all models of languages.
However if he says "Rust gets us 80% of the way there" as you point out then he is obliged, IMO, to point out explicitly the 20% where it fails or we can't even begin to discuss it.
>> If we do, the compiler can just make instances disappear statically when we're done with them -- not dynamically!
> Not possible generally. It would be easy to create a situation where some kind of refcount is a necessary final fallback.
Either refcount or a GC. But those situations might be rare in practice with a good system.
If you think about it, generational garbage collection implements lots of those heuristics. (But it's closer to a JIT optimizer, not a static optimizer.)
How do you see multi-threaded anything working without Arc? Some owners (threads) are created and destroyed dynamically. Are you thinking Arcs get added wherever necessary?
Oh to be clear I don't have an answer for this off-hand, I'm not that smart. However, if we'd given the 30 years of people developing faster for-loops the task of figuring out how to manage memory statically, I'd wager we'd not be having this conversation right now. It is by no means an easy problem, however.
I think there are essentially two ways of making the compiler statically aware of lifetimes: you can annotate the program with lifetimes, as in Rust, or you can have the compiler infer all of the lifetime information somehow.
Adding annotations requires more programmer effort and reduces the level of abstraction of your programming language. Rust takes this approach, but tries to keep the level of annotation somewhat reasonable, and provides the "unsafe" escape hatch for when you cannot convince the compiler that what you're doing makes sense. (Although I do believe that it would be possible for the Rust compiler to infer most/all lifetimes).
For instance, in a functional programming language, you are typically creating closures all over the place, like one of the other posters mentioned. It seems quite unlikely that there is any kind of reasonable lifetime annotation system that would capture all of the complex lifetime patterns that you would see in such a language.
One can say "Oh, but surely someone smarter than me must be able to come up with something that works for those situations", but I don't think that is a very constructive line of thought, since you could say that about pretty much everything. Rust is an example of a language where lifetimes work reasonably well, but Rust already makes some tradeoffs that reduce its level of abstraction to some degree. It doesn't provide the same level of abstraction that, say, Haskell does.
Program analysis would allow you to keep your level of abstraction and ideally not require only little annotations. However, since any non-trivial program property is undecidable, program analysis is fundamentally incomplete and it is again unlikely that it would be able to deal with all lifetime patterns that you see in practice.
Therefore, it seems to me that there is no free lunch here. Either you sacrifice abstraction by requiring annotations or you don't require any annotations but then you have an incomplete program analysis.
(Also, you seem to think 30 years of good research has been wasted on the "obviously bad idea" of garbage collection, but somehow none of these researchers were good enough to realize this and come up with the silver bullet that makes GC unnecessary? After all, people were already thinking about GC-less languages long before Rust was a thing)
possible for the Rust compiler
to infer most/all lifetimes)
I'd be interested to learn why you think this is possible. As you point out yourself, by Rice's theorem, precise lifetimes are not statically decidable. Rust's lifetimes are essentially based on nesting (think well-balanced brackets) and tracked by enriching the standard Damas-Hindley-Milner approach with a notion of affin-ness which is a weakening of Girard's linearity. Circular data structures immediately violate affine-ness. What mechanism do you propose to get around this conundrum?
"All" might not be possible, but every Rust release tends to see improvements in lifetime elision. I just went through a few Rust apps I wrote 6 months ago and was able to get rid of nearly every explicit lifetime annotation thanks to improvements.
So there's definitely room for improvement, but also good progress being made.
I think that for function definitions in Rust you would in principle be able to leave out lifetime annotations on arguments/return values in more cases than is currently allowed by lifetime elision. I believe they don't allow this since it can become very confusing if the lifetimes are not written out when there are multiple references arguments. I'm not entirely sure about this, though.
I know this is possible because Rust already does this. Rust's types in closures can be fully inferred. Try replacing all Rust functions with closures. It goes surprisingly far.
It's a little bit contradictory that you simultaneously believe that the people who worked on GC are much smarter than you, but at the same time they were wrong to choose to work on GC rather than your research program ;-) Not all problems can be solved by throwing smart people at it. There has been research along the lines you suggest. Look at the MLkit region inference research, or escape analysis research, for example. It doesn't work very well in practice. It can correctly infer like 95% of memory allocation/deallocation, but that remaining 5% leaks memory which means you need a GC anyway. It's still valuable research though. Go, for example, made the bet that they can use a low-pause low-throughput GC by reducing the pressure on the GC by doing escape analysis. Maybe that would work even better with the far more powerful region inference of the MLkit.
> Like Rust but without the need for Rc and Arc boxes. Rust gets us 80% of the way there.
I think you're qualifying Rc/Arc as kludges to get around the type system. They may end up being used like that sometimes but when well used they are actually encoding extra semantics, just like &Type and &mut Type.
This allows you to get and use a cache value across your program and depending on outside input the value can disappear first in the cache by eviction or in the consumer. So Arc<V> is already expressing to the compiler the exact semantics. It just so happens that the actual lifetime is runtime defined from outside input so it must be managed dynamically.
I don't see a general way around that without copying but maybe a lot more can be inferred automatically so you get similar ergonomics to a GC language by introducing Rc/Arc/etc automatically at compile time. I don't think that's a good fit for a system programming language like Rust though where you don't pay extra costs without asking for them explicitly. But maybe there's space for a higher level language that does all that magic and doesn't use a GC. Someone trademark Rustscript.
It isn't clear that what you're asking for is even theoretically possible, and it's far from obvious that the end result would be comparable to GC in terms of productivity.
>In Rust, does the standard way of implementing a graph structure still use a vector of nodes and indices for edges?
You'd probably want to do this whenever you implement a graph in a performant language, it's faster and safer (depending on the problem). Implementing edges as vectors of pointers is an exercise in memory leaks and data races, imo.
You can see an implementation with adjacency lists here:
Anecdotally for me, I spent more time in my first 0.5 years of Java tuning GC and debugging GC issues, than I spent chasing memory leaks in my previous 5 years of C++. The worst part is that you quickly learn to not introduce too many memory leaks, but as far as GC tuning goes the skill-set feels more like voodoo then science and is constantly obsoleted by GC code changes (or wholesale replacement).
I can see how this could be true in some cases and especially for throw-away code and when perf doesn't matter, but for anything at scale and/or with perf requirements, GC is terrible and wastes tons of developer time fighting it (debugging, tuning semi-opaque configs) and working around it (object pools, off-heap buffer managers, controlling object lifetime to avoid certain characteristics, controlling object size, etc.)
I think C++ could get most GC usability benefits if there was a better/simpler syntax for shared_ptr and unique_ptr. I have written code with extensive shared_ptr usage and it was
quite pleasant if you ignore the ugly syntax. On the other hand even after years of C# I still miss deterministic destructors.
One downside of reference counting is the long delete delay you get when you remove the last reference to a large data structure. That's pretty easy to fix, but then you lose deterministic destructors.
My first 8-9 years was full of very competitive, bleeding edge C++ and I can tell for sure that your productivity increase was not caused by garbage collection, but by using a more orthogonal language. I feel it in delphi the same way I feel it in go and python.
> But all of these are much cheaper than developer labour and reputation damage caused by leaky/crashy software.
Exactly! And that is the biggest advantage of not relying on a GC, it is easier to detect awkward usage and requires more thought upfront (but not any more than what is actually needed).
C++ is not bad. I think it sits in the middle between C and GC languages. I actually like what C++ offers. Languages with GC are actually easy to debug. I remember once I had my own memory management code, suddenly the write beyond boundary is not segfaulting any more, took me more time to figure out the problem.
The problem is not that you don't know when/where the lifetime will end — that can usually be characterized by a terse "English" description. The problem is that this lifetime is dynamic in nature.
The end of the lifetime of an object may coincide with some user input, for instance.
At this point, either you go back to manual management, with the potential for errors (and for what it's worth, I think manual management is fine in many cases) ... or .... or what, exactly?
Even Rust doesn't solve that problem at all. If your objects can survive outside of a "linear" execution, which is the case in interactive or multi-threaded programs, you're falling back on reference counting — garbage collection.
No one has ever devised a scheme that lets you specify very dynamic lifetimes AND statically checks that no leak can occur.
I'd stake a lot on that being impossible in general, and the best we can hope for being an AI that searches a way to insert the manual memory management calls (mostly, "free" and a minimal set of reference counts) assorted with a proof that this change will not cause any leaks or invalid accesses. We're a long way off that yet.
I don't see your point. Of course sometimes the lifetime of an object is not tied to code scope but actually to something dynamic. Let's say for instance when you close a tab in your browser you expect the resources to be freed (ignoring caching to simplify the argument).
Clearly somewhere in your code you have to explicitly handle tab closing and break the references to allow the GC to do its job. Why not free the resources here while you're at it?
When you say "manual memory management" if you're thinking C-style malloc-free then you have a point, it's very easy to forget a free() somewhere. But any language with destructors can handle these situations without much more overhead than GC-based approaches. Just remove your object from whatever data structure was holding it and let the language automagically run the destructor for you. I find RAII a lot easier to model and think about than "you drop this reference and one day maybe your object is destroyed, but don't really rely on that".
>No one has ever devised a scheme that lets you specify very dynamic lifetimes AND statically checks that no leak can occur.
GC doesn't solve that either. If you're not careful and let a reference to old data dangle somewhere in your code you can effectively "leak" data just fine. When there's no clear ownership and nobody knows who's supposed to clean what it's very easy to end up with data dangling everywhere through complex ownership graphs. GC may mitigate some use-after-free problems (generally by masking the problem) but that is something that can be solved without GC either.
I'm not saying that applications with (justifiably) extremely complicated ownership semantics that also require very high performance don't potentially benefit from using a GC and that it could outperform manual memory management by, for instance, batching the object deletions. What I'm saying is that it's that such applications are few and far in-between, basically 100% of all the code I've ever wrote in my ~15years as a professional software developer didn't have such requirements.
GCs should be an opt-in niche tool used to solve specific problems, not a kludge to let developers write sloppy code and get away with it.
> Why not free the resources here while you're at it?
Because you do not have an exclusive reference to all of them; we could be freeing something that is still in use somewhere.
So why not reference-count? Because reference counting is slow, and must be meticulously maintained (here, languages help with constructs like smart pointers), and doesn't handle cycles in the object structure.
> I find RAII a lot easier to model and think about than "you drop this reference and one day maybe your object is destroyed, but don't really rely on that".
The RAII model doesn't do away with this "maybe" in any shape or form. We drop this reference, and the smart pointer decrements the refcount; maybe it has hit zero so the object is destroyed, maybe not. Maybe the zero refcount triggers a whole cascade of thousands of other objects hitting their zero refcount, which takes many cycles to execute, or maybe it doesn't.
RAII has nothing to do with reference counting, it stands for resource acquisition is initialization. In C++ that means the lifetime of a resource is tied to the creation and destruction of an object. Often when people talk about RAII and C++ they are also referring to destructors being called when an object goes out of scope, so if an object that contains an object containing some allocated resource goes out of scope, the resource is released as the destructor path is called.
If an object with a smart pointer goes out of scope, then of course there's the overhead of the smart pointer, but the previous comment wasn't referring to that situation
RAII is commonly used for representing smart pointers that manage references to objects that themselves have otherwise pervasive lifetimes.
RAII not managing any shared resources is trivial and uninteresting. The lifecycle of an object that is confined to a single lexical scope can be handled manually and verified by visual inspection (though of course it is more reliable to automate that with RAII or macros, not to mention less verbose).
> Let's say for instance when you close a tab in your browser you expect the resources to be freed
That's basically region-based memory management. Which is a great idea, but unfortunately only applicable in narrow scenarios (another example besides browser tabs is user requests in a HTTP server).
The real issue, however, is what if you can't keep all current task's data in RAM! E.g. browser tabs still have GC for JavaScript... Going a bit meta, you can easily think of a hierarch of such "regions" (e.g. computer > process > browser tab), each is self-contained in the sense that when you turn it off/terminate it/close it all its resources are reclaimed, but the real issue is when they need more resources than you have... Then you're back at needing GC.
I agree with your second point; GC doesn't solve the problem of reference leaks... Whereas regions mostly do! So maybe more language support for regions (finer-grained than "whole process" - think Erlang) would be useful (each region can still of course optionally have its own GC).
> Clearly somewhere in your code you have to explicitly handle tab closing and break the references to allow the GC to do its job. Why not free the resources here while you're at it?
You are missing the point. The main point behind GC is removing the complexity of writing the software. Writing your own destructors, thinking about when to free your memory or writing lifetime annotations, needing to design your app in specific way to satisfy the borrow checker etc in languages like Ada/C/C++/Rust all are additional complexity and mental overhead that is removed by GC.
Your point about "just do this X and you don't need to use GC" is a reason people use GC languages, because they don't want to think about or write this "X".
My point is that you need a "destructor" even with a GC. Continuing with my example you need to pop your tab from the data structure containing your tabs and you have to make sure that all references are dropped so that the GC will do its work.
When I write Rust code I basically never have to explicitly free anything outside of FFI code dealing with C pointer or the like. The most straightforward way to allocate anything in Rust is to reify a Box which frees its contents when it's dropped. The data is freed when I remove it from containers or when a Box gets dropped. The closest I get to garbage collection is using a reference-counted pointer here and there (and only after careful consideration).
I really don't see what complexity the GC removes in this situation, besides letting you write code without having to care about ownership which, as I stated earlier, I consider to be an antipattern.
>"just do this X and you don't need to use GC" is a reason people use GC languages, because they don't want to think about or write this "X"
A lazy programmer is a good programmer but it's also important to recognize when something can't be pushed under the rug. I don't want to think about or write documentation and unit tests either, yet here we are. If I had to maintain an application and I ask the coder "when exactly are these objects deleted?" and they answer "dunno lol, I let the GC figure it out" I'd be extremely worried.
> it's also important to recognize when something can't be pushed under the rug.
If you haven't heard of it, you might be interested in the Waterbed Theory of Complexity[1]. The idea behind it is that it's not that you can't push down complexity in an area, but often that complexity just pops up in a different area. For example, you can do away with the vast majority of allocating and freeing memory, but the complexity of doing that pops up again when you start having to tune your GC or use very special patterns to avoid/minimize GC freezes.
That's not to say there's not a benefit to that sometimes. If you can push the complexity away from the majority of use cases to a few special use cases, that might be a net win. It doesn't always feel like that though if you're the person stuck trying to coax that last few percent of performance optimizations out of a runtime that only gives you so many levers and dials to work with.
I think some of the same concerns that make a dynamic/scripting language a poor choice compared toa compiled and static one are the same things that might make a GC language a poor choice compared to a manually memory managed one, and it's just a matter of scale that separates them. Is performance important? A scripting language might be a poor choice. Is performance really important? Then a GC language might be a poor choice. On the other hand, if I'm writing some program to run correctly once or twice ever, and it might actually take more tome to write than it will ever run for (e.g. a script to parse and/or transform data on disk for me to fix a problem), then is anything really gained from manual memory management? Writing that in Perl or Bash is a valid option there, and that's about as far away from memory management as you can get. For portions of my career, I've spent half or more of my time writing programs like that to help with sysadmin tasks.
The Waterbed theory is stupid, because a given requirement can be handled in one place in the system, or it can proliferate into every module, giving rise to a more complexity to achieve the same functionality.
So that is to say, when we push down a "bump" of complexity in one place, seventeen bumps at least as large can crop up elsewhere. Moreover, that seventeen is just today; we might have paved the way for more such bumps now being necessary going forward as functionality is added, whereas if we kept the original bump, it would stay the same. Water is not a good model for complexity because it is incompressible for all practical purposes; if we push down a liter of water here, the places where it pops up add up to just a liter.
For exmaple, virtual memory is considerably simpler than individual modules of a program implementing a scheme for swapping code and data to disk and back.
> when we push down a "bump" of complexity in one place, seventeen bumps at least as large can crop up elsewhere.
I think you've taken the metaphor about a waterbed and design advice a bit too literally. :)
When you push complexity down in one area and it pops up in another, there's no guarantee that it's of the same overall magnitude, as it would be in a real waterbed. Sometimes it's larger, and sometimes it's smaller.
The same idea applies to whether we should all write assembly or use a higher level language to accomplish our computing needs. We push down the complexity of knowing exactly what the processor is doing at any one moment for a easier to manage overall view of what we want to accomplish. Overall, the net effect is a lessening of complexity for most people, but for those that want or need to know exactly what is going on, there more complexity. Still, overall I think the net effect is less complexity for most people, vastly outweighing the time and effort a few put into figuring out why something isn't as expected.
On the other end of the spectrum, using a dynamically typed scripting language might save people a lot of time initially, or when prototyping, or on smaller programs, but as those programs become larger, the lack of constraints (which provide a small but immediate complexity hurdle) can build up and overwhelm a large project that might otherwise had an easier time if some more constraints were required along the way.
No one tool or paradigm is perfect for every task, and we should not expect one to be. We don't usually commute in dump trucks, and we don't usually haul lots of debris in compact cars. I see no reason why it should be any different for programming languages. Sometimes all you need is a 10 line bash script, and given that information, I doubt there's a huge level of complexity that's going to rear its head later because I didn't write it in some lower level language.
The waterbed theory in fact explicitly states that the amount of complexity is preserved no matter how it is distributed. The Wikipedia definition of it explicitly talks about how the volume of water remains constant due to its incompressibility.
If that isn't taken literally, it informs of nothing we don't already suspect. It tells us that for each functional requirement, there is at least one piece of code in the system somewhere. If we remove these lines of code from one place, or places, something has to be added elsewhere (possibly way more lines, or way fewer) so that the requirement remains implemented.
> If the waterbed metaphor isn't taken literally it informs of nothing we don't already suspect.
It informs you of nothing you didn't already suspect. I think it's useful to remind people of something that's fairly obvious ans self explanatory if you think about it, but not everyone does, and having an easy and slightly odd metaphor to point to helps the point to get across. The fact that you think it's obvious is actually one of its major benefits. You can immediately start exploring the consequences of it instead of spending time proving it. Honestly, of all the criticisms to level at a metaphor, "too obvious" isn't one I would consider worth bringing up. Obvious metaphors are used to good effect all the time.
Edit to address your prefixed paragraph:
> The waterbed theory in fact explicitly states that the amount of complexity is preserved no matter how it is distributed.
If you read carefully, it says that the minimum level of complexity may not be reduced. Who is to say the minimum level of complexity has already been reached? Or that pushing down a level above the minumum in some area won't cause only the minimum level to rise elsewhere (or vice-versa)?
I like the waterbed theory, thanks, I hadn't heard of it before.
You're absolutely right, in the GC context you:
(1) Have to manually reference count / manage external resources like sockets and files because you can't count on the destructor to ever run.
(2) Lose determinism and then have to spend huge amounts of effort tweaking GC parameters to fix your issue, but to your point, it becomes a Jenga tower where you touch one part and the whole thing falls down again.
(3) Can easily still leak huge portions of your object graph by, for instance, throwing them into long-lived dictionaries.
In what memory management paradigm can you free the object in a long-lived dictionary without removing it from that dictionary, and do you advocate this paradigm?
and it is replaced by GC when your program behaves unexpectedly because, whoopsie, that's the one time the GC decided to run. Good luck debugging or even reproducing that.
>that's the one time the GC decided to run. Good luck debugging or even reproducing that.
GC usually runs on allocations, so it's quite predictable. The amount of time spent on collection is pretty much as indeterminate as the time spent on malloc/free.
When will a correctly-implemented GC ever alter your program's functional correctness, aside from timing-related bugs that aren't fixed by manual memory management either?
Just run your program normally and use eBPF/bpftrace with something like a flamegraph to track your system and its GC. This isn't an easy problem by any stretch of the imagination, however, it is a solved problem.
Having spent years writing code in both GC’ed and non-GC’ed languages, it’s pretty clear that it programmers spend about the same amount of time worrying about memory management with either paradigm.
The main differences are that people using GC systems spend their time doing magical incantations to manage the GC wizard, and they’re a lot more smug about the goodness of their system.
>programmers spend about the same amount of time worrying about memory management with either paradigm
I have no idea how you've come to that conclusion. For the vast majority of memory allocation in any GC'd language I can think of, you spend zero time thinking about memory management 98% of the time.
whereas in a manual language, you must consider it every time you allocate memory. That's not a bad thing, it's often dead simple in the vast majority of contexts, but still.
I've also very rarely seen code in professional code that's mean to poke or prod the GC into running.
>For the vast majority of memory allocation in any GC'd language I can think of, you spend zero time thinking about memory management 98% of the time.
>whereas in a manual language, you must consider it every time you allocate memory.
This is like bizarro world to me. I genuinely can't comprehend that.
Here's how memory management looks like to me, using concrete examples in languages featuring RAII like C++ or Rust:
- I have a function that needs to compute a hash so I need to allocate a hash context for it. If it's small I'll put it on the stack, otherwise it'll go on the heap. In either case when the function returns the destructor is automatically called.
- I have a function that takes the name of a file as parameter and returns its contents in a buffer. Clearly the buffer needs to be allocated on the heap, so I allocate a large enough vector or string read into it and return that. The caller will then either use it and drop it immediately or store it somewhere to be dropped later, either on its own or as part of the structure it belongs to. In either case the destructor will be called automatically and the memory will be freed.
That's easily 99.9% of what memory management looks like in the programs that I write. I have absolutely no idea what using a GC would change at any point here. Note that while this is technically "manual" memory management I never have to explicitly free anything and in the vast majority of cases I don't even have to bother implementing the destructor myself (I basically only need to implement them if I need to release external resources like, for instance, raw OpenGL handles, file descriptors and the like. The GC wouldn't help either here).
It's genuinely a non-issue as far as I'm concerned. I literally never think "uh, I have no idea when this object won't be used anymore, I wish I had a GC to figure it out". I can't even imagine when such a scenario would crop up.
- I have a function that needs to compute a hash so I need to allocate a hash context for it. At some point after the function has returned the destructor is automatically called.
- I have a function that takes the name of a file as parameter and returns its contents in a buffer. I allocate a large enough vector or string read into it and return that. The caller will then either use it or store it somewhere. In either case the destructor will be called automatically and the memory will be freed.
>he caller will then either use it and drop it immediately or store it somewhere to be dropped later, either on its own or as part of the structure it belongs to.
You're glossing over a lot of complexity there. That's the entire point of GC, you don't have to think about when it's dropped.
Also, I wasn't arguing for the blanket necessity of GC in all contexts and for all problems.
Can you point out a specific example then? I'm genuinely not trying to play dumb. I very honestly not see that complexity that should be obvious to me given that I spend most of my time writing code in non-GC languages.
Maybe it's just Stockholm syndrome and I'm so used to working that way that I don't even see the problem anymore but there are now a long string of replies talking abstractly about the overhead of programming in a non-GC environment and I simply don't relate at all. It's a complete non-issue as far as I'm concerned.
>store it somewhere to be dropped later, either on its own or as part of the structure it belongs to.
Wherever you're storing the reference, you have to eventually free the memory. That propagates memory-management code throughout your codebase, and is error-prone.
I totally agree that if you're allocating on the stack it's a non-issue. If you're returning a reference and always freeing in the caller, that's pretty easy too, but you have to have that logic in every single caller. If you're storing a reference on the heap somewhere, you suddenly have dynamic lifetimes with indeterminate reference counts, and it gets quite hard to 1) correctly manage memory and 2) verify that you're correctly managing memory.
The alternative is that in all three instances in a GC language, I don't have to care about memory management. Barring unusual edge cases, I can't forget to deallocate something and end up with a memory leak.
”If you're storing a reference on the heap somewhere, you suddenly have dynamic lifetimes with indeterminate reference counts, and it gets quite hard to 1) correctly manage memory and 2) verify that you're correctly managing memory.”
It is axiomatic that GC programs leak memory. I’ve never encountered one that didn’t. The difference is that it’s harder to see, and most programmers who have grown up with GC systems don’t know how to fix it.
”The alternative is that in all three instances in a GC language, I don't have to care about memory management.”
Again, no. The alternative is thar you aren’t paying attention, and don’t see the leaks until they become fatal. Just like cockroaches, you have them, somewhere.
”I have no idea how you've come to that conclusion.”
30 years of experience.
”For the vast majority of memory allocation in any GC'd language I can think of, you spend zero time thinking about memory management 98% of the time.”
No, you think about 98% of the code 0% of the time. The remaining 2% ends up causing so many problems that your amortized time spent on memory management is the same.
> GCs should be an opt-in niche tool used to solve specific problems.
That is true. Sciter (https://sciter.com) contains implementation of DOM(tree), CSS and script.
DOM is a regular tree structure - each element of the tree has strictly one parent and no cycles on the tree in principle. It does not need GC at all and is not using it for DOM tree management.
CSS is a collection of collections of name/value pairs. No loops. It does not need GC at all.
And only script uses GC for the simple reason: in UI, ownership graph of script objects frequently contains loops. Yet lifespan of objects is unknown at compile time - depends on user interactions.
Even purely native GUI application benefits from GC when that GC is used in places where it is needed. GC is just one way (of many) of doing memory management.
Ideally it should be a multi-paradigm programming language and environment that allows to use as explicit memory management (and so RAII, memory pools, etc) as a GC.
int *ptr = …;
gcable int* gptr = …;
But unfortunately such dualism is hard to achieve.
This is very hard to achieve because GC + RAII do not play nice together. It's hard to reason about whether an RAII resource has been cleaned up since managed objects have non-deterministic destruction. This non-determinism is unsuitable for common RAII patterns like scoped mutex locks. Now, the question being begged is "Can this be avoided by not using destructors with managed objects?" Yes, but not reasonably. Any unmanaged objects that are owned in the entire managed object graph now also have non-deterministic destruction. Thus, the only way to reasonably follow this constraint is to have the compiler disallow/warn when an object with a non-trivial destructor (explicit or implicit) is used with GC. If that wasn't enough of a pain, any runtime-polymorphic types may also need to be destructed and these can't be known at compile time if dynamic linking is used. In essence, these concepts are incompatible to a point where any language supporting both will likely make extensive trade-offs.
I find this really sad because many developers may never be exposed to how elegant RAII can be if they are never exposed to non-memory-managed languages. Sure it can be approximated using `with` constructs, but those don't allow resource allocations to cross scope boundaries.
> When you say "manual memory management" if you're thinking C-style malloc-free then you have a point, it's very easy to forget a free() somewhere. But any language with destructors can handle these situations without much more overhead than GC-based approaches.
What happens when that object is shared and still referenced from somewhere else? Now you have a use after free error waiting to happen.
The typical argument for GC is that it is less error-prone than even RAII. And I agree with that point. It's a trade-off on which you have to decide on. I think the tiny perf penalty is often worth it.
You can still retain stale data under a GC, but real leaks (unreachable allocated memory) are precluded — which is usually considered (not unreasonably) to be the more common and serious problem.
I think GC is a sane default because it's less error prone, and god knows the average developer will make enough errors as it is. I'd trust myself with manual memory management, but not a random group of developers I don't know. What about you?
You can of course introduce a space leak in a honest GC environment. Make a linked list, and keep adding items to it while keeping its head referenced. Of course you do it by mistake: you kept two references, the main and the auxiliary, and the main reference is duly gone.
>I find RAII a lot easier to model and think about than "you drop this reference and one day maybe your object is destroyed, but don't really rely on that"
Heh, I just realized this is what we depend on to remove commits from Github. "Uh, just delete any branches that contain it, and, uh, it'll get removed ... like, eventually." (of course you have to assume any secret data is compromised)
If the lifetime can be expressed with a state machine then it needn't be garbage collected so long as the language is sufficiently expressive.
Reference counting isn't garbage collection, as most people consider it. There is no need to sweep or otherwise traverse memory to discover objects that are not referenced.
Claiming that ref counting is gc is a bit like claiming malloc/free is gc.
> Tracing and reference counting are uniformly viewed as being fundamentally different approaches to garbage collection that possess very distinct performance properties.
This last one is great if you want to understand GC tradeoffs btw, highly recommend it.
---
And as the first paper implies, refcounting is often slower (because it trashes caches). The issue is having to propagate the diminution of refcounts amongst a graph of references. This also creates de facto "pauses", much like tracing GC (although arguably more predictable!).
And this "propagation of diminished refcount" is very much a memory traversal — (probably) smaller in scope than (non-incremental) tracing, but also much less local.
"If the lifetime can be expressed with a state machine then it needn't be garbage collected so long as the language is sufficiently expressive."
One programmer here told me he does that in Ada and/or SPARK Ada using enumerated types encoding states. Now there's two of you. Most stuff I see like this are linear/affine/dependent types or separation logic. I rarely see people talking about simple solutions like this. Do you have any links that go in-depth on it with examples? It would be good to have something to pass along to people interested in this.
Any lifetime that depends on an io loop or user input cannot be determined within the loop Period. Even if you have an asic designed for garbage collection. It cannot even be dynamic.
Basically in these cases you have to extend the lifetime to beyond the point of indeterminism into the point of determinism or in other words move the life time outside of the loop.
Worst case scenario you move the lifetime to right before termination of the program.
I think that garbage collection has proven to be the best idea in programming languages in the past decades, and probably the only one that has made an impact at all. It's getting so good these days that few are willing to consider "moving on." In terms of throughput, it beats manual memory management (in fact, in principle, the cost of GC can be made arbitrarily low; a famous Andrew Appel paper shows how it can be cheaper than a stack), and in terms of latency we now have GCs with worst-case pauses of ~1ms on terabytes of heap, with an acceptable impact on throughput.
> Think about it, instead of finding a way of expressing when we're done with instances, we have a giant for loop that iterates over all of memory over and over and over to guess when we're done with things. What a mess!
It would be a mess if that's how modern GCs worked, but they don't. The problem with "expressing when we're done with instances" is that it's just wasteful, especially when concurrency is involved. This is one area of computing where the computer can offload a human task, and do so quite efficiently.
Good throughput with GC seems to come with significant extra memory use:
"We compare explicit memory management to both copying and non-copying garbage collectors across a range of benchmarks using the oracular memory manager, and present real (non-simulated) runs that lend further validity to our results. These results quantify the time-space tradeoff of garbage collection: with five times as much memory, an Appel-style generational collector with a non-copying mature space matches the performance of reachability-based explicit memory management. With only three times as much memory, the collector runs on average 17% slower than explicit memory management. However, with only twice as much memory, garbage collection degrades performance by nearly 70%."
> Good throughput with GC seems to come with significant extra memory use
Of course. RAM is the price we pay for GC. Good thing it's so cheap on servers.
BTW, just note that the paper doesn't compare GC (never mind that the algorithms have much improved since then) to real explicit memory management, but to an oracle (i.e. explicit memory management by an all-knowing God). Paying nothing other than for 3x RAM to get performance that's 17% worse than what God could achieve is a bargain, and why GC is considered one of the greatest success stories of modern computing.
FWIW, one mitigating factor is that GC isn't the only approach that wastes memory. You can also end up with memory waste in manual memory management. Due to heap fragmentation, for example, or retaining objects for longer than necessary due to programmer caution or error. I don't (personally) dare estimate their relative sizes, because I'm guessing a huge, perhaps dominant, share of differences in memory usage across languages is driven by features other than GC. Dynamic typing, for example.
I didn't say it's free, only that it's relatively cheap. That RAM can simply be used for caching is incorrect, though. There are very-non-neglible costs to maintaining caches in distributed systems.
My point is that you can't increase the cache size indefinitely to get performance improvements, because you also have to handle cache invalidation. There is an optimal working set.
Using RAM for "caching" sounds extremely wasteful. There could be applications that want to use it for something more useful than a 1% performance improvement.
Caching is way more impactful than a 1% speed up. It is hard to know exactly, because it is impossible to disable on most modern operating systems, but I wouldn’t be surprised if it were an order of magnitude improvement in some situations.
By the way, caching doesn’t prevent RAM for being used by applications. If an application wants more memory, the os can always just evict some of the cache.
Thanks, but I do not plan to extend this project to other languages. I put it together a while ago as an illustration that conventional wisdom regarding GC cost is not what many people think.
If I were to expand it, I'd look at other allocation patterns rather than more languages; I've already good a fairly good cross-section of garbage collectors and malloc() implementations, so I don't really need any more.
Fair enough. Well, too late, I just hacked together a Haskell version based on the OCaml version. It's naive code, but 16 times faster (0.235s vs 3.818s).
So I expect some GHC specific optimizations are kicking in. Eg GHC is probably smart enough never to construct the whole tree at once. Whether that counts as the kind of static analysis static of object lifetimes someones else in this thread talked about or not, I don't know.
Update: I think it's just common subexpression elimination kicking in..
> I think that garbage collection has proven to be the best idea in programming languages in the past decades, and probably the only one that has made an impact at all.
Depends on how you set your thresholds. Paying more attention to side-effects / purity / const has been pretty useful, too. And so have mainstream languages adopting first class functions. And making it easy to compose data structures. (Compare eg how Python allows you to return essentially two values via a tuple, and how cumbersome the same tasks is in Java; so people are tempted to use side-effects instead.)
> Paying more attention to side-effects / purity / const has been pretty useful, too...
I don't know. Studies haven't been able to find any big effects, and the industry isn't seeing them, either. You may like those things, but unlike GCs, it doesn't seem like they've made a real impact.
I tend to take the position that arguments about the merits of GCs are proxies for a different issue at a higher level of abstraction that is rarely tackled directly. One of the correctly argued advantages of GC is that it automatically handles some difficult edge cases like ambiguous lifetimes and cyclical references. In my opinion, there is an interesting discussion to be had around why these edge cases need to be solved in real software systems.
At least in my experience, the cases that GCs solve uniquely well are a code/architecture smell. Every time I've written some complex code that creates the edge cases GCs solve well, it has been a red flag for poor design. I've never run into one of these cases where there was not a different, better design that made these edge cases go away in a sensible manner. In a sense, GCs primarily help developers by allowing them to ignore issues of poor design (which may not be their fault) in order to get things done. While there are obvious merits to the "just get things done" view, the reality is that the sloppy design that leads to these kinds of situations tend to be strongly correlated with other software problems that a GC won't paper over.
Programming languages strive to enable and encourage good design with their feature set. There is an interesting philosophical debate about language features that exist almost solely to mitigate the effects of poor design, which IMHO is where GCs fit. Does availability lead to enablement in practice?
This hits on something I've been wondering about lately:
For those of us who might want some of the performance characteristics of Rust, but not to the extent that we're willing to take on all of the strictures it imposes, what's the feasibility of building a compiler that can avoid GC for objects whose lifetime can be determined statically, while still retaining GC for the rest?
My (wild) guess is that it wouldn't actually gain you any real practical advantage over a good generational GC, but still, I'm curious if anyone has looked into something like that.
Knowing when you’ll be finished with an object solves the issue of knowing which objects to free but not the issue of heap fragmentation.
The main reason that eg C programs don’t suffer so much from heap fragmentation is that writing a C program which uses ephemeral objects in the way a GC’d language might is so incredibly painful that one mostly avoids the convenient programming techniques which lead to fragmentation.
The usual solution in a language like C (or a safe language like rust) to data where it is hard to know when it should be freed is to use reference counting yet reference counting is typically slower than an actual gc, especially on a multithreaded system (but this needs a quite advanced gc) where reference counting needs atomics. This leads some C programs to use gc when allocating and tracking these objects because it is cheaper than the overhead of malloc and reference counting.
Garbage collection is godsend when it comes to concurrency algorithms.
Determine liveness is hard on its own, ABA issues are totally eliminated by GC enabled setups.
As for energy costs, I'd bet generation/copy collectors are cheaper than malloc/free. They are way cheaper than ref. counting which requires deeper pipelines + branch predictor extra load; ref. counting with concurrency requires atomics, cache coherency traffic, flushing of the write queue and memory barriers.
I wonder what that be... It can happen with primitives only and the usual solution is: increment only and use prepare/publish with two distinct counters (for circular buffers alike).
I tried to have a quick look:
10.6 Memory Reclamation and the ABA Problem
"There are several reasons we might want to do this. Languages such as C or C++ do not provide garbage collection. Even if garbage collection is available, it is often more efficient for a class to do its own memory management, particularly if it creates and releases many small objects. Finally, if the garbage collection process is not lock-free, we might want to supply our own lock-free memory reclamation."
There is =absolutely= no reason to attempt and recycle nodes (for queue/stacks) in Java, esp. 'small object'. It'd perform worse than pretty much any GC. The solution to use "AtomicStampedReference" is a weak one as - it does not enjoy JVM intrinsics. It's an example of course [and AtomicStampedReference has it uses], but the better option is just keep creating new objects and let them trivially die in the young gen.
Again ABA does not exist in Java unless attempting to reuse objects (pooling direct memory would be the closest case; Direct memory [direct ByteBuffer] is one of the areas where GC sucks hard)
Personally I have not read the book, so I cannot comment on its contents. I have quite extensive experience coding lock free data structures.
IMO manual memory management is the epitome of sunk cost fallacy. Sixty years of good programmer effort thrown at a bad idea that has produced an endless stream of security bugs and performance problems that have only hidden the cost of memory management behind non-obvious barriers and done severe violence to systems languages with an absurd obsession with custom allocators and baked ownership into otherwise straightforward APIs and made all programming harder.
> we have a giant for loop that iterates over all of memory over and over and over
I don't think you understand how garbage collectors work.
How is that not how garbage collectors work? You start at a gc root and chase pointers all through memory. Then do the same thing again and again. (I'm assuming we're talking about a standard, generational mark-and-sweep gc.)
>> we have a giant for loop that iterates over all of memory over and over and over
> How is that not how garbage collectors work?... I'm assuming we're talking about a standard, generational mark-and-sweep gc
GCs do not scan "all memory", but small fraction of memory. In case of generational GC the scanned fraction of memory is (usually) limited to single generation. Even without generational approach scanning heap itself is frequently avoided in favor of scanning separate data structure with highly compressed representation of object set.
GCs do not generally iterate over memory just because they can. They either reclaim space for new allocations, move things around to reduce fragmentation or fire periodically in response to increased allocation rate. If your program does not make allocations, it may never incur a GC at all.
The grandparent comment makes it sound like garbage collection is a simple effort, conducted solely by distinct GC code ("giant for loop"). This is often not the case: for example, JVM may generate additional memory barriers in any code, that uses references (exact nature and purpose of memory barriers depends on GC being used [1]). Augmenting the code with those barriers allows GC to operate more efficiently and quickly: achieve smaller pauses, scan less memory, collect memory for some threads without disturbing others.
While I can't agree with the GP at all, I also can not agree with you. There are more options for memory management than fully manual and fully runtime.
You say a lot of things without justification: "IMO garbage collection is the epitome of sunk cost fallacy. Thirty years of good research thrown at a bad idea." "Garbage collection is just plain bad." And more.
> The reality is we as developers choose not to give languages enough context to accurately infer the lifetime of objects.
Possibly because it's hard and until compilers + maths developed enough, it wasn't possible. Maybe. I dunno, but there's probably a reason Rust didn't appear in the 1970s. They're probably harder to use than conventional langs, judging by what I read about Rust's borrow checker (dunno, Rust's on my todo list so no actual experience).
> we have a giant for loop that iterates over all of memory over and over and over to guess when we're done with things
This is from memory so I may be wrong, but a conservative GC will iterate over the whole memory but if you're using one of those you've probably got other problems because yes, conservative GCs do 'guess'. (Edit: and when it guesses wrongly it can fail to free, causing leaks. And some optimisations that temporarily make pointer values 'disappear' can cause premature frees of the object pointed to. I believe an example of the latter is using branch-free XOR on pointers to swap them. 'proper').
Most modern systems will use something more sophisticated and known-correct such as tracing and will trace just the live stuff. Mark-compact for sure only touches live data which makes it efficient for functional langs (for some definition of 'efficient', I have questions).
I'd recommend reading a book ('The Garbage Collection Handbook: The Art of Automatic Memory Management' 2nd edition, Richard Jones et al) before abusing GC. I actually agree with some of what you're saying but you need to say why.
> It wastes time, it wastes space, it wastes energy.
I quite possibly agree! (see a recent previous post where I call lack of optimisation in scala 'immoral') But if you just give your opinion without some solid backup I can't endorse you. Please read the GC book, it really is very good indeed. The author's a nice guy too, met him years ago.
Historical aside: the first paper on GC is [1] which was written in 1963, by Minsky, who's now famous for AI research. So we have GC research for > 50 years.
Thanks. That's amazing. I always though GC must have been in Lisp from Day 1, but the Minsky article used to be the oldest one I was aware of. Has anybody dug into the original LISP sources (do they still exist?) to see when working GC first arrived?
I'm pretty sure that is the first implementation of Lisp. Earlier "versions" of Lisp were hand-compiled, because it was considered too difficult to produce a compiler. Then Steve Russell realized that McCarthy's "eval" function could be implemented in assembly language (which apparently hadn't occurred to McCarthy, he considered it merely theoretical) and produced the first Lisp interpreter. Note that the manual is from March 1960 and McCarthy's first paper on Lisp was published in April 1960, so nobody outside his circle at MIT would have known about Lisp at the time.
The manual also states that it's for "a version of Lisp being prepared for the IBM 709", implying the manual was written while the interpreter was still being developed, and all references I've found state that the IBM 709 version was the first practical implementation.
According to Prof. Herbert Stoyan, probably the oldest GC algorithm description is in 'J. McCarthy, M.L. Minsky: Artificial Intelligence. Quarterly Progress Report No. 53, Res. Lab, of electronics MIT, Cambridge, April 1959.' Prof. Stoyan then mentioned that the 'first garbage collector was implemented by Dan Edwards during June/July of 1959'.
In my opinion, shared memory mutability is the actual bad idea that's going to eventually die. It's not just error prone, it doesn't scale; try mutating the same cache line from two threads on an Intel desktop chip and see how that goes. Let alone across mesh buses, chiplets, dual sockets or (God forbid) the network.
Garbage collection on immutable data is vastly easier; it's intrinsically non blocking, concurrent and parallel. You can build a garbage collected immutable data store over arbitrary numbers of computers.
Au contraire, as memory, and hence workloads, get bigger, the cost of copying increases, and copying rather than sharing becomes ever more expensive. I just spoke with a physicist who told me that his current experiment produces 200 Gigabytes of data per second! Copying that amount of data around for immutability is inconceivable. Indeed the single biggest factor for performance for such workloads is minimising the amount of copying.
Erlang-style shared-nothing is brilliant if you can afford it, but it's affordable only for small data.
Erlang style immutable reference counted shared memory is not at all state of the art. Persistent data structures are common and effective. I have implemented several myself, to store and update trees that don't fit in main memory. If you're updating one byte of your physicist's (say) 1 petabyte dataset, it requires ten memory allocations (log[branch factor] of the size of the dataset).
If you're updating the whole thing linearly, do consider that updating something in place or creating an updated copy is identical in terms of performance; a storage device doesn't care whether the bits streaming back are overwriting the same locations or new ones. Though of course you end up with both copies at the end.
Are there high-performance matrix multiplication libraries, e.g. for fluid-dynamics simulations, that are implemented using persistent data structures?
Are there any where multiple threads are mutating overlapping sections of memory? How exactly do you think a GPU matrix multiply violates my initial assertion?
Usually when working with immutable data you don't need to copy it around, you can just pass it around via reference, and the data structure will support cheap "updates". You can see this in F# and Clojure where immutable maps are supported by tree structures, and operations like add/remove return a new reference without touching the original while being reasonably performant.
Am I wrong in thinking that a system comprised of entirely immutable memory doesn't even need a garbage collector? If you built such a system in Rust would not all memory just be deallocated once no longer utilized? The rust principal is (many readers XOR one writer) leads to fully static memory management. This is my point. We need to go back and re-think our language design principals keeping memory management in mind instead of pretending it's a solved problem.
Unless your memory is infinite. Rust also automatically deallocates memory that is no longer used, so it's some kind of GC, but one where the point of deallocation is either statically determined (based on nesting of life-times) or dynamically (using reference-counting). The problem with static dealloc is that the compiler cannot always work out when memory becomes free and must make conservative approximations (see Rice's theorem [1]), the problem with ref-counting is circular data structures (and also the cost in space and time and synchronisation across threads of maintaining an index).
In high-performance systems, software architectures have actually been moving toward non-shared mutability. Shared immutability tends to waste a performance-limiting resource (memory bandwidth).
I'm a bit confused by your phrasing. Non-shared mutability makes perfect sense (and is not the thing I believe needs to go), but at some point you have to share the results of a computation.
If not shared immutability, how is this done in the systems you're describing? I assume message passing, queues or similar?
Sorry, I wasn't clear. There are two common software architecture schools for scaling practical concurrency.
The first is the copy-on-mutate style commonly used in functional programming paradigms. This is what I assumed was meant by "shared immutability".
As you surmised, in the second model individual threads own all mutable state, and operations on that state are requested via message passing (usually SPSC queues within individual servers, which are scalable and extremely efficient). This makes all the code effectively single-threaded. The key technical element for these designs is that there often needs to be a mechanism to sporadically transfer ownership of state between threads to balance load, also arbitrated over the SPSC queues. This has a number of straightforward solutions.
The second model has come into favor mostly because it is very memory-friendly, due to excellent natural cache locality and requiring minimal copying of state.
My rant was only really aimed at shared mutable memory; passing a piece of it around with one owner at a time is totally fine. One writer, multiple readers has its uses too.
I am a fan of persistent data structures and I think that as we keep slapping on cores they're going to be more appealing for general purpose computing, but they're certainly not one size fits all, and the performance tax is substantial.
Utterly unfounded and myopic.
Yes, you're right to stress new languages, but being liberated from memory management in FP languages like haskell and scala is amazing and I will never even consider going back to manual management unless I absolutely need the last ounce of performance.
The reality is we as developers choose not to give languages
enough context to accurately infer the lifetime of objects.
The reality is we as developers choose not to give languages enough context to just do what I mean.
instead of finding a way of expressing when we're done with instances
So, when are you done with an instance? How to enumerate all possible situations and devise a comprehensible encoding for them? Without requiring developers to keep a lookup table of lifetime declarations or other non-local reasoning in mind? Concurrency. Laziness. Runtime code replacement.
You're handwaving over fundamental research problems here. You don't know this is even possible in a way that doesn't completely hose the comprehensibility of a language.
It wastes time, space and energy, but it also lowers the bar for programming ( don't have to know or care much about memory ) and removes the potential for errors. It's like programming on training wheels. Sure it, there are costs upfront, but it also saves costs down the line.
Also, garbage collection doesn't iterate over all memory. There are lists, structures and variables that the runtime/GC manages.
And we already have languages that doesn't use GC and give us ways of "releasing memory" - C being one of the most famous ones. But C also has its share of problems.
The more control you give to the programmer, the lower the footprint but the higher the learning curve and the costs of bugs. The more control you give to the runtime, the higher the bloat and lower the learning curve, but lower the cost of bugs. The GC exists because it removes a class of bugs that human programmer seem prone to write and it theoretically allows programmers to work on more business/productive aspects of software development.
Software development is such a vast space that I think there will be room for GC and non-GC. And as you said, sunk costs ( on both sides ) almost dictate they both be around for a very long time.
Rust is not a start, it's a next iteration over old ideas (look at Ada, Cyclone etc). All of those languages were niche and will stay niche for a reason.
> Garbage collection is just plain bad. I for one am glad we're finally ready to consider moving on.
We are not moving anywhere. You still need lifetime annotations in Rust and design your application in specific way to satisfy the borrow checker. You can't just write it the same way like you would in GC language. You trade productivity for complexity to avoid GC. Simplicity and productivity will always win in current market with complexity. Developers time is money, a lot more money than money spent on energy and hardware. That's why companies throw money at hardware and use GC languages, because it's a lot cheaper, simple and productive way to create software.
So unless you solved the halting problem GC is here to stay.
Productivity has many facets and is hard to measure. A lot of people feel very productive in rust because it catches a lot of problems early on. GC allows you to be sloppy in reasoning about the lifetime of an object, which is often an important design point that affects the overall system.
Also, you typically have to think about lifetimes when writing libraries, not so much when writing applications that use the libraries.
> If we'd spent that time developing smarter languages and compilers (Rust is a start, but not an end) we'd be better off as developers and as people.
What about Swift? Automatic Reference Counting avoids the typical problems of mark and sweep garbage collection. Has its own trade offs of course but the benefits include more consistent latency, efficient memory usage and cache friendliness. Like Rust it uses a smarter compiler (LLVM).
Really interesting stuff happing on the server side of Swift right now. It’s early days but I’d keep a close eye on it. Long tail latency is a big big server problem.
> we have a giant for loop that iterates over all of memory over and over and over to guess when we're done with things. What a mess!
Yeah, that's like having dedicated busboys clean up customers mess and clear tables when they are done in a restaurant. Why can't the customers get up and remove dishes as courses are served? Or the waiter could do it. Why does the poor busboy have to ask "are you done with this?" Now this article is proposing busboys on powered skateboards. What a mess!
Maybe, but you forgot something: memory fragmentation, if this HW solution enable a "bup the pointer" allocation and also allow "real time" deallocation (I don't know I cannot access the paper), the result could produce better latency than "normal" manual memory management and better resource usage than "allocate everything at the beginning" than "real time" software use.
Well, you can use weak pointers appropriately in a safe setting. It's not fun though. Alternately, you can use a layer of indirection, such as throwing all your graph nodes in a big vector and having them use indices to refer to one another.
So I would perhaps revise that statement to "easy cyclic data structures, memory safety, memory management without tracing, pick two".
Naive question here, as many in this discussion are way more experienced with the inner workings of languages:
It seems like doing away with GC is somewhat akin to the Halting Problem, in that it's impossible in the general case but may be practical in the common case. Obviously there are languages without GC, but they often end up reimplementing the patterns of GC for particular problems, so I really mean doing away with GC-like patterns and concerns (memory management).
With the Halting Problem, the specific feature that triggers it is unbounded loops. The solution is that most languages have two classes of loops. Ones that are bounded and ones that aren't. We still need unbounded loops for a language to be Turing-Complete, but frown upon their use unless it's absolutely necessary.
To the question: it seems to me that the equivalent feature that triggers the need for GC is passing/scoping items by reference rather than value, because it violates the simpler memory lifetime model of local variables in the stack frame. Could it be that the solution would be a language that makes it easier to explicitly reason about, detect, and limit objects that are passed around by reference?
I'm thinking the initial reaction will be no way, it's way too convenient to pass by reference, all languages pass larger objects by reference to avoid unnecessary copies, etc., but am wondering if it's the next "GOTO considered harmful".
Perhaps there's a way to rethink the way program flow and object persistence interact. Not in the general case, but as a way to make it more explicit which things might leak, in the way that most languages now allow you to mentally discount "foreach" loops as not subject to the halting problem and only worry about the rare "while" or recursive call.
This may be just me groping back to the idea of functional purity, as pure functions don't mutate external state, but perhaps there are other, less restrictive ideas that would also help, such as a notion of larger flow control regions that guarantee they won't return to some distant part of the program and thus will not reference an object used there again.
Overall, I'm just thinking there are no silver bullets, but that perhaps the cases where the problems occur can be corralled into a corner, as probably the majority of a lot of programs have no issues with memory management and would benefit from making that explicit, so it's easier to focus on the bits that do.
Edit: I'm guessing a lot of what I'm groping towards is exactly what Rust and its kin deal with, but should probably learn one of those kinds of languages before I armchair about this stuff.. :)
You seriously should just start the discussion by actually checking out how garbage collection actually works instead of touting random garbage about how it is just a predictions game.
Garbage collection has multiple variations, multiple optimizations, multiple tradeoffs. Different variations are powering the most popular and the most used languages all around the world (see also: C#, Javascript, Java).
To think that that is just "research thrown at a bad problem" is sheer insanity.
I propose we split this discussion up, because dealing with leaks is a completely different problem than dealing with garbage collection. You can have memory leaks in manually managed languages too, it has nothing to do with GC. But lets not throw the whole field out just because you do not want to deal with automatic memory management tradeoffs.
Azul Systems has asked Intel to do this once... but instead created their own processors with interesting memory barrier properties for awhile that greatly sped up JVMs beyond what was capable (at the time) on x86-32/ppc/sparc. Eventually they gave up and became a purely software company, but their "Java Mainframe" product was many times faster than the Intels of the age executing the same code despite much slower CPUs. Died a quick life despite the cool factor.
I had a mentor who worked at Intel labs when this was happening.
The reason this died was because someone invented a gc algorithm which consistently outperformed this, leading Intel to drop their hardware gc plans.
Seems to be one of the main risks for any specialized circuits, if I understand you correctly. You always have to guess "will this really be relevant long enough to invest the money to bake it into hardware?" .. and if you guess wrong you just wasted a part of your silicon budget for something no one will use.
When i thought about this (using my programmer brain which knows nothing about hardware), i came up with the idea of a 'store pointer' instruction, which took two addresses and an offset, and stored the first address into the field of the object pointed to by the second address. And also, if the two addresses referred to different memory regions, recorded the pair of addresses into some kind of buffer on the processor. When that buffer got full, the processor would trap to some preconfigured location.
That could be used as a basis for a write barrier.
The devil would be in the detail of how the regions were defined.
And maybe the trapping would mean this wasn't even all that fast.
Indeed, OpenJ9's Pause-less GC is implemented using IBM z14 guarded load instructions and is able to outperform software by 10x in pause times and 3.4x in throughput for a given SLA [1].
I think everyone is about to make custom hardware.
A recent custom chip project I have been a part of for basically a decade is nearing production.
Made on a fairly large, old process. Despite this, many custom features have been added. For many tasks, the performance will be competetive with much more complex, resource intensive devices. It is built in a way that allows for efficient, multi core computing, concurrent or parallel, sans an OS.
People will write those, but many will just grab the pieces they want, put them on cores, then write their target app on top. The prior version, Propeller 1, made doing that a lot easier than one might think.
What struck me was the combination of very well planned silicon, coupled with software, can really perform. It reminds me of the custom chips we saw in early computing. Amiga, SGI, many others made adapter cards to get things done.
All of that stuff nailed the tasks cold, would always perform. As CPUs got quicker, more could be move to software of course.
But now that is hard again
Software plus purpose built silicon is going to deliver peak performance.
Always has.
In that project, it took years, but a great many use cases were considered, FPGA simulations ran, code written, and then augmented with hardware, special instructions and sub systems intended to maximize performance while retaining a lot of flexibility.
The way I see it, general purpose computers may just end up back where they were before.
In 8 bit times, an Apple 2 was an all software machine. People bought and made cards to do specific things very well. Other computers had custom chips that focused on games, etc...
In the later era, Amiga, SGI made great hardware that was focused on specific things, while the PC was more like the Apple 2.
We are currently leaving a long era where general purpose computing made sense for most cases, and software got refined, things got faster, and we saw many good cycles.
Soon, more efficient CPU designs, often with well tuned instructions for given tasks may be directing a lot of purpose built silicon. Phones and tablets vs laptops give us a tiny look at one part of how it might go.
Many use cases can benefit from real time or just faster performance per watt. Software plus custom silicon will nail that, and it is getting easier to do.
RISC V has plans for specialized instructions baked in. It may well be that "one to bind them all", nudging out the more expensive ARM, for example. And maybe not. ARM is lean and mean and mature. Who knows?
All I know is the drive to do more per watt as well as the drive to improve peak sequential compute, because there are still too many use cases where doing that makes sense, given either the nature of the problem space, or accumulates software warrant the effort.
We are already seeing the GPU idea expanded on.
Gonna be interesting and more difficult times ahead.
Azul essentially provided tagged architecture AND I think forwarding pointers. The former gave you precise GC for free, the latter allowed concurrent GC to move data around without pauses.
ETA: this doesn't seem to be quite the paper that the story refers to, but undoubtedly describes the same work in enough detail for people to get the gist of it. Darn paywalls.
They're comparing to an in-order CPU. Given that most CPUs are out-of-order (at least of the non-embedded variety, and GC is less used in such applications anyway), it would be better and more intellectually honest to actually compare to a typical CPU that performs GC. They kind of address this in the paper but only in a short aside: "Note that previous research [1] showed that out-of-order CPUs, while moderately faster, are not the best trade-off point for GC (a result we confirmed in preliminary simulations)." So they don't quantify what any of this means.
I think it's an interesting idea, but it doesn't bode well when they seemingly choose the wrong target for comparison and hand-wave away the difference as insignificant.
The comparison at least in the abstract is energy efficiency. It's quite likely that a small in order CPU is very good at chasing dependent pointers around the heap for its power consumption.
Imagine a linked list. Each pointer access is likely to miss to main memory, and no concurrency is possible. Both the highest and lowest end cores will sit around making a single request every 80ns.
They claim that the comparison was to the best alternative and I'd probably take them at their word barring any specific evidence.
I think an interesting comparison here is GPU cores, where a core will get blocked on a memory access and it will switch out its state for another. It looks like this is the approach here, which is a bit more aggressive than ordinary out-of-order approach. It's less ordered.
> globally this represents a large amount of computing resources.
Much of which would just sit idle otherwise, on client machines. Of course, the energy savings still apply.
> He also points out that many garbage collection mechanisms can result in unpredictable pauses, where the computer system stops for a brief moment to clean up its memory.
This is more of a hard barrier that's being solved.
All in all pretty cool idea, but I think the impact would be different from what's discussed here. Truly high-performance computing is already written in non-GC languages. This hardware would give medium-intensity GC programs (read: web servers on JVM, .NET, Node, Ruby) a boost, and could also allow some higher-but-not-peak intensity software to be written with GC where it might not be today (games come to mind), although that could actually encourage more energy usage than what it would save.
Looking at the underlying paper, I'm actually skeptical that there's any practical performance/energy boosts from this accelerator. The best case they're giving is something like a ~3-5× boost on GC (which is generally ~10% of total time, much of it hopefully idle time anyways if you're tuning GC well), with a 15% power reduction, assuming your collector is a stop-the-world, non-concurrent collector (and you rewrite your object layout). There's no measurement of the energy cost of the idling core, nor of the cost of extra memory and cache traffic trundling objects to and from the accelerator rather than sticking on the core. Those costs are likely to be even worse if you try to adapt this into incremental and concurrent collectors.
Especially when you ask the question "is it better to spend the area/power budget on a highly-specialized accelerator, or to give it over to another full core that can do other things when it's not collecting garbage," it's hard to motivate this kind of accelerator.
We wrote latency-sensitive and high-performance code in GCed languages back in 1980s - avoiding pauses or having predictable latencies (in fact, more predictable than usual manual memory management!) are more of "we don't teach people how to program" rather than issue with GC.
As for energy savings, many garbage collectors have amortized energy use lower than malloc/free. Even pretty simple ones (some of the simplest I've seen beat it so hard it's not competition, but they are specific to their application)
”As for energy savings, many garbage collectors have amortized energy use lower than malloc/free”
For those wondering: that’s because malloc/free ‘visits’ every dead object, while garbage collection visits the live ones (another way to say that: free collects garbage, the moment it is created, GC collects non-garbage, periodically) and there typically is way more garbage than non-garbage.
I haven’t seen energy usage comparisons that include extra energy use due to GC’s higher memory usage, though. That possibly is because it is hard to quantify. On current technology, you pay for your RAM’s energy usage the moment you power it on, whether you need it or not)
His point is it's not rocket science. Preallocate and pool what you might need, don't call new in your tight loop. Change the GC algorithm to something that never runs unexpectedly. If you're still allocating such that you need GC eventually, manually GC at an appropriate time like a load screen.
> Preallocate and pool what you might need, don't call new in your tight loop.
This piece of advice is also valid in languages without GC. In C++ it's well known one should avoid allocating memory in tight loops, and it's strongly advised to call e.g. `reserve` on vectors to avoid allocating.
The point is that you lose not only in tight loops. (Tight loop allocating with good GC is actually well known good [1] case, it's deallocation that is PITA).
free() as well as malloc() and related have significantly complex algorithms behind them, unless one can simplify base memory management in certain ways - and those ways tend to also allow region-enabled GC to outperform them anyway.
[1] Allocating memory in many Garbage Collected systems can be down to one instruction, and if using per-thread heap, it can be atomic add without CAS
A good "keyword" to search is "non-consing code", from lisp lore meaning it doesn't construct (allocate) new space.
As sibling comment says, it often involved object caches and static allocations.
Interestingly enough, similar rules are necessary with malloc()/free where dynamic allocation is often forbidden for latency critical code (or safety critical one)
> also allow some higher-but-not-peak intensity software to be written with GC where it might not be today (games come to mind), although that could actually encourage more energy usage than what it would save.
Just look at Minecraft for that (it's written in Java). A recent version created up to 200 MB of garbage per second!
Definitely a problem with modded minecraft. One weird solution to the problem is due to some design decisions its pragmatically single core because there's a primary loop that does 99% of the work, freeing up plenty of cores to do side issues and garbage collection.
It reminds me of the days I was reading Knuth's quote "97% of the time premature optimisation blabla" every time someone was trying to make something faster.
CPUs are not getting faster, yet it seems using tools that makes things run faster are somehow taboo.
Wirth's law:
Wirth's law is an adage on computer performance which states that software is getting slower more rapidly than hardware becomes faster.
Why is java taught in university, and why is this language considered like some kind of standard? Most OSes are written in C, yet most of silicon valley frowns upon writing C because of arrays. Even C++ is getting a bad reputation.
Think most companies resist using lower level languages due to their ultimate purpose being to build products that provide value and sell them. They are generally far less concerned with the technical details and conciseness of the implementations. "Good enough" is a very squishy term but for most companies, for better or for worse, that bar is pretty low. There are plenty of industries that focus on lower level langs and use them pretty well but it's not the norm for big corporations who value rapid turn around over all other factors.
You're right, but you cannot accuse the entire language of being insecure, ultimately it's the responsibility of the developer.
Also you can't always say we don't use C only for security reasons, as other languages also have their security issues. There are many modern ways to avoid those flaws. Like someone answered, it boils down to a matter of development cost.
I'm also quite skeptical when people always rise the objection of security when writing software. Security is not so simple, and so far it's its own specialty, and pretending that it's worth it to make things slower and that the security gain is actually there, is not really completely accurate.
Security is almost a post 9/11 paranoia knee jerk reaction.
Objective-C ARC (automatic reference counting) solved the problem neatly for my iOS apps.
Is there some overhead? Maybe, but it's neatly spread out through the entire application life time, so there is rarely[1] a UI-freezing stutter associated with GC. To reduce the overhead I turned off thread-safety and simply never access the same objects from more than one thread (object has to be "handed off" first if it comes to that).
One wart on the body of ARC is KVO, which I avoid like a plague for many other reasons anyway.
The other wart is strong reference loops. This can be solved by the app developer by designing architecture around the "ownership" concept (owners use strong references to their ownees, all other links are weak references). This is a good idea in itself as it increases clarity of the program. I do make an occasional slip, which is where I need to rely on Instruments, and I do wish I had better tools than that, something more automatic that would catch me in the act. Maybe a crawler that looks for loops in strong references during the development process but is quiet in release builds. Or at least give me a pattern to follow that makes it easy to catch my errors. For example, we could assign a sequential number to each allocated object, and only higher-ranked object could strongly refer to lower-ranked object. This won't work for everyone but I wouldn't mind fitting my app to this mold if that gave me immediate error when I slip.
[1] if you release a few million objects all at once it may stutter for a second. Could be handed off to a parallel thread maybe.
True! And ARC in Rust suffers from the same problem (note that while ARC in Rust and ARC in Swift are the same thing, the "A" happens to stand for different words in each case).
Hi! I think you may have repeated what I said in my comment, that the letter "a" stands for different things. Although they do the same thing: they are both atomic (in the sense of concurrency) and automatic (in the sense that you do not have to explicitly retain or release objects).
It sounds automatic to me. Maybe we disagree about what "automatic" is.
It also sounds like you are hunting for ways in which Swift and Rust are different. This is completely unnecessary! It turns out that I already know that they are different languages, and that the features do not map exactly 1:1 to each other. I hope next time you give me a little credit.
> I do make an occasional slip, which is where I need to rely on Instruments, and I do wish I had better tools than that, something more automatic that would catch me in the act.
Xcode's debug toolbar has a "debug memory graph" button that will visualize the object graph for your entire program. Reference cycles are automatically detected and listed in the issue navigator.
I’m probably wrong, but didn’t the Symbolics LISP machines have some kind of hardware support for GC?
I think for platforms like Android this makes a lot of sense. Should help quite a bit with battery consumption and responsiveness. Also makes sense for server loads in Java or Go.
A https://en.wikipedia.org/wiki/Tagged_architecture yes. Tags make life a bit easier for the CPU when it accesses objects and is deciding what to do with them, but the CPU is still doing all the work and walk the RAM to do the GC. (I vaguely recall stories about Lisp machine users who would turn off GC while working, and then let it run when they left work and returned the next day.) The idea here seems to be to have an entire separate chip, a specialized CPU, which walks RAM independently of the 'main' CPUs, whose only task is freeing up memory.
> I vaguely recall stories about Lisp machine users who would turn off GC while working
That was more in the early days when the GC was more primitive.
The later GC was actually several different ones. A huge impact had the so-called 'Ephemeral GC', where the machine tracks changed RAM memory (it uses virtual memory in general) and where the EGC focuses on those areas with objects which were only very short-lived. That means, given a sufficient amount of memory, the machine stayed fast and responsive for a long time.
The main problem was the low amount of RAM (20MB were common after the mid 80s), because RAM was very expensive, and the large amount of virtual memory (possibly hundreds of MB on slow disks).
You have a lot useless bits in a 64 bits pointers and thanks to the virtual memory, you can manage to have an untagged address and its corresponding tagged address referencing the same physical memory. This give you free bits that you can use to track the liveness/evacuation of a graph of objects.
Lisp Machines didn't use tags for tracking liveness/evacuation of objects, though. They used them for safety, which automatically gave them precise, as opposed to conservative, GC which always knew whether it was dealing with a pointer. They also had special, CPU-handled type of forwarding pointers, which when accessed "normally" would transparently redirect you to forwarded location.
The forwarding pointer you are describing is equivalent to a ZGC colored pointer with the evacuation bit set that a GC barrier (a load barrier) will rewrite to the evacuation address.
and yes ZGC doesn't use colored pointer to track if a value is an integer or a pointer because Java unlike Lisp is typed so the VM derives those information from the bytecode.
Most Lisp implementations use typed memory via tags. Lisp doesn't have pointers, but references and usually knows if something is an integer or a reference - because it's encoded in the tags.
Symbolics (and other systems like it) had typed memory with hw type checking (so you had safe memory by default), and had one important bit for efficient GC (one that is also utilized by Azul and Shenandoah) - forwarding pointers, in their case implemented "transparently" to actual code.
With forwarding pointer, you can copy data and update a reference in a way that is transparent to concurrently running code, as when it access the data again the CPU (or software implementation) will notice the data had been moved and follow the forwarding pointer to new location.
The rest of the support for GC was centered around structure of virtual memory, without anything very specific to GC in hw - we're talking about things like "memory is divided into X areas, inside each area the lower side is ephemeral... " etc. There was also some register use to keep track of GC status. I think some versions had MMU or page fault handler update the GC register data.
My day job is writing C for on an embedded real-time system. No manual memory management necessary... because we're forced to declare all struct and array sizes at compile time! Not a malloc or free in sight. Obviously, it's extremely limiting - pretty limiting as far as algorithms beyond "read data off bus, store in fixed array, perform numeric calculation, write back to bus." But I've gotta say, it's pretty freeing to write C in such a limited environment.
From an environmental perspective, I wonder how much energy is consumed (and emissions generated) for garbage collection and interpreters. These things exist to make programming easier but are then duplicated across thousands of servers.
If everyone used some compiled language that was just a little simpler, a little safer, had just a little better memory management/tooling, or like here, had better hardware support, how much would that reduce global emissions caused by data centers?
> These things exist to make programming easier but are then duplicated across thousands of servers.
I think about this kind of thing, but more in regard to Electron et al.
Garbage collection, by contrast, seems like a more worthwhile trade-off, because there's a great argument to be made that garbage collection isn't just easier for programmers but safer for users, in terms of security. Since it mostly removes an major class of vulnerabilities.
Here's a benchmark that includes an energy comparison: https://thenewstack.io/which-programming-languages-use-the-l... . It's interesting that speed does not directly correlate with energy consumption and the the energy consumption of functional languages was much higher than imperative.
Not sure why you’re getting downvoted when you’re the only response with actual data whereas the other responses are defending clearly less efficient practices with gobbledegook about O(1) memory algorithms and whatnot.
I think most of it is coming from overly defensive users of GC languages . Sometime the world you want is not the world that exists.
Garbage collection is mostly an issue of latency, not overall CPU usage. malloc and free are not O(1) anymore than garbage collection is.
[edit]
I forgot memory usage as a third resource. The search space of GC algorithms can be considered on a triangle of throughput, latency, and RAM usage. You can make one of these better by making one or both of the other worse.
The programming language benchmarks game is not great for finding idiomatic code, since it tends to be highly hand-tuned code.
Dynamic allocation is slow, so both the C and non-C code will avoid it in the benchmarks game (the problems are usually sufficiently fixed-size that it will boil down to a single malloc() at the beginning, which is idiomatic in a lot of embedded C, but not for non-embedded C).
Highly dynamic languages are slower not because the GC is slower, but because the GC encourages (fails to discourage) a lot of allocations. If you were to write Haskell or Ocaml like-code in C you would put a lot of stress on the malloc/free implementations. Those have gotten a lot better in the past 20 years or so (web browsers in particular tend to stress malloc/free a lot as well), but it's still more of a question as to "which malloc best fits my workload" rather than a "one true malloc" which is a factor seen with garbage collectors as well.
You can argue over implementation details, it's what programmers do best, but at least it's data and you haven't presented any to support your claims.
Shouldn't it be relatively simple to implement malloc/free as a wrapper around a garbage collector and compare like with like? Maybe we'd need some more annotations to cover copying, referencing, etc.
For that matter if GC was faster I'm sure we'd have seen this in a number of large C/C++/rust programs already
>but because the GC encourages (fails to discourage) a lot of allocations.
Yeah, but allocations in OCaml and Haskell are super cheap, nearly as cheap as on stack. Typical OCaml small allocation is:
* add the number of words you wanna allocate to the heap pointer (%r15 on amd64)
* check if it surpassed the limit
+ if it's fine — just write your data in -8(%r15)
+ if there is no space left — run GC
By energy use, Garbage Collection is generally better than typical manual approach, except for when "GC" is actually a naive refcount.
Exact results depend on what metrics are you targeting - a stop the world pause can result in very efficient (in total time and power use) system, but one that has problematic latencies. A metronome style approach will have lower efficiencies but timings more predictable than reasonably optimized malloc()/free()
I didn't want to start a language war but yeah, basically. Swift and/or ObjC + ARC seem to be attacking the same problems on the embedded side out of necessity. It's too bad Apple is so apathetic about Linux servers.
Even just replacing Python with Go might save a massive amount of KWh consumed.
I'm more interested in optimizing the client side, at least where the web is concerned. I keep wondering when Google is going to start generating energy ratings for the top few thousand sites on the web. It would shame the low performers into producing front ends that are no longer glacial. I was loading wunderground.com's 10 day forecast for the umpteenth time lately and reflecting on how horribly slow and inefficient it is... Surely website energy ratings must be in the works somewhere, right?
It's only conceptually similar to the "bump allocation in the nursery" part of garbage collection, IMO, as it skips the whole tracing part that makes RC, GC, and even RAII substantially less efficient than stack allocation. When people talk about GC having great performance compared to manual memory management they are very often only considering objects that die before the first major collection.
There is a much more pressing case for migrating JS, Python and Ruby codebases to Java, .Net, Haskell, or Go than for abandoning automatic memory management and migrating those to Rust or C.
And even that more pressing case isn't pressing enough most of the time.
Anyway, your ideals of simpler, safer, better tooling, and better hardware support are clearly in conflict with each other.
I'm skeptical; GC is closely tied to programming language run-times. How is some accelerator going to know which pointers in an object are references to other GC objects and which are non-GC-domain pointers (like handles to foreign objects and whatnot)? How does the accelerator handle weak references and finalization?
People aren't going to massively rewrite their language run-times to target a boutique GC accelerator.
I remember ruby had some approach with reusing previously allocated yet out of scope objects. I can very well imagine taking this concept above and beyond, and having virtually separated stacks for each type...
For small short lived scripts and applications, do we even need to free any memory these days? For example you write a script which takes several seconds to execute, moves files, computes stuff with strings, etc. Should we really invest time and effort in the script interpreter to free the memory, where instead we can just exit normally and let the OS handle the clean up.
I would imagine this kind of paradigm could be much faster to run because of less runtime work being performed. The allocator used could also be a simple linear allocator which just returns the next free address and increments the pointer. If using multiple threads there could be one per thread.
"exec() is the world's most efficient garbage collector." [1]
This idea is both worthy, and old.
[1] DJB, paraphrased, when responding to a criticism of the design of qmail, which rapidly forked very small processes which did a simple task and exit. Unable to find a link to the mailing list.
> For small short lived scripts and applications, do we even need to free any memory these days? ..we can just exit normally and let the OS handle the clean up.
That makes me think: why couldn't larger programs be composed of many such small, short-lived scripts/processes that give up all allocated memory upon exit?
I suppose there could be accumulated overhead for starting many such processes, and also the issue of allowing shared memory spaces that are explicitly not automatically freed. I'm way out of my depth in this line of thinking though, so, just speculating.
It does seem like just doing it in hardware may be a linear gain but isn't a fundamentally better algorithm. There's a proof that you do need to pause your program eventually, if you want to be sure you get all the garbage.
Hardware is fundamentally parallel, CPUs are fundamentally serial; it is possible for a hardware solution to have a super-linear speedup in time.
As a simple example, what is the time complexity of zeroing out n bytes of memory. With a CPU, this is O(n). However, with proper hardware support, this can be done in O(1) time.
For a simple garbage collecting example (no idea how their chip does it), consider a simple mark and sweep algorithm. Assume the chip has an internal object graph. At the begining of GC, only root nodes are tagged. At each step, the neighboor of every tagged node is tagged. With a CPU, this step takes at least Ω(n) time, where n is the number of tagged nodes. However, if this is done entirely by hardware, then (depending on how the hardware is designed) each node can independently look at its neighboors and complete a single step in just O(1) time.
Moving stuff to hardware gives you a set of primitives that is asymptotically different than the primitives you have on a general purpose CPU.
But hardware that has to interface with the main memory can't be fundamentally parallel, because of memory bandwidth limitations. If you want to make this part of the memory, then my objection does not apply. But if it's an external chip to the memory, you still are fundamentally serial.
How much does it need to interface with main memory? If the external chip has its own memory then it can maintain its own view of the object graph. The software can update it when references are made/deleted and query it when an allocation needs to be made.
There is still the overhead of communicating when references are made/deleted, but you need that anyway, as something that only looks at memory doesn't know what is a pointer, or the size of objects.
The communication overhead is then linear in with respect to the amount of updates to the object graph, not the size of the object graph.
You could even go so far as to not put the chip on the memory bus at all.
I'm presuming some kind of a heap. That heap has something like a list of free blocks. Who keeps that list - the main memory, or this other chip? If the main memory does, then when you garbage collect blocks you have to change the main memory. If the chip keeps the list, then yes, the chip can do it all internally. But...
What about multitasking? If there are N programs running, and each has their own heap, the chip has to be able to keep track of each of them. Or it has to be able to keep track of one monster heap that uses almost all of the available memory, even if it's occupied by a bunch of small allocations. All this without using any (main) memory itself. This chip would have to have a large amount of onboard memory itself to pull that off. The very worst scenario would be to be a long way into a run of a program that thrashed the heap, and then the GC chip ran out of slots. You can't (easily) go back and re-do the heap to have the heap control in the main memory, and you can't continue with the chip managing the heap.
Can you find this proof? I haven't had any luck. And it's a pretty surprising result to me, assuming "all the garbage" means garbage that was created before the collection cycle started; it's a trivial and useless proof that garbage created 1 nanosecond ago still exists.
So my idea for GC is to offload it to a separate machine through a communications channel. The main CPU sends messages to the co-processor whenever it allocates memory, or whenever it mutates (whenever it writes a pointer to allocated memory or to the root set- there could be special versions of the move instructions which send these messages as a side-effect). There is a hardware queue for these messages and the main processor stalls if it's full (if it's getting ahead of the co-processor).
The co-processor then maintains a reference graph any way it likes in its own memory. It determines when memory can be freed using any of the classic algorithms, and sends messages back to the main processor to indicate which memory regions can be freed.
This has some nice characteristics: the co-processor does not necessarily disturb the cache of the main processor (it can have its own memory). Garbage collection is transparent as long as the co-processor can process mutations at a faster rate than the main processor can produce them. The queue handles cases where the mutation rate is temporarily faster than this.
Not really though you may want to. The main CPU could compact memory if it felt fragmentation was high enough to warrant it. By its nature the traffic is asynchronous so the CPU won't do a perfect compaction (it won't have processed all the inbound messages to mark garbage as free) but it would probably be close enough.
The big win here is making it all async. No need to stop the world, issue write barriers to application activity, or otherwise synchronize for GC. You'd have a bigger highwater mark for used memory but otherwise the main CPU can pickup the "the memory from x to y is now free" messages on any thread and whenever it is convenient.
Compacting without tracing can't really be done, and not all moving algorithms are collect and compact (though collect and compact seems to have won over other moving algorithms in the SMP erra)
You can usually precisely control garbage collection by turning it off or forcing it to run. The cognitive load to handle memory manually is not insignificant. If you control memory manually, you eventually end up designing some kind of mechanism like ref counting or something else to handle memory cleanup automatically. And there's significant reasoning that ref counting might not be the most desired solution for all use cases. Best is a combination of the ability to handle memory manually, with some more automated garbage collection when there's a need to write stuff that doesn't necessarily have to be the absolute fastest. Kitchen sink languages like C++ tend to have both and don't force the developer in either direction. Best would be to #define out 'new' and make manual memory handling explicit.
The irony of the thing is that in manual memory management languages you end up doing your own garbage collectors and in garbage collector languages you end up doing your own manual management. Unfortunately if you look in a language to solve such complex problems you are heading straight to severe disappointment land. Same shit different package. I still prefer dynamic languages by a long margin because of their ability to do decent metaprogramming and reflection which is essential for managing any form of data. Pick your poison and enjoy the hype while it lasts.
>> consumes a lot of computational power—up to 10 percent or more of the total time a CPU spends on an application.
I stopped reading there. 10% is nothing. For such a useful feature as automatic garbage collection, for the vast majority of applications, I'd gladly give away 50% of the CPU.
In terms of ensuring code correctness and robustness, if I had to choose static typing or automatic garbage collection, I'd pick garbage collection every time. It adds a lot of value in terms of development efficiency and code simplicity.
> for the vast majority of applications, I'd gladly give away 50% of the CPU.
Don't you think you're over-generalizing a bit much from your own circumstances? If these issues don't matter to you then you basically don't belong in this conversation. They matter a lot to people who write software that runs on large numbers of servers, with both CPU and memory utilization pushed to the limits. That's a lot of us. If you have a million machines, even a 0.1% improvement on either axis is huge. That's racks upon racks worth of equipment that doesn't need to be installed, powered, or maintained. If I could save 10% of 10% ("nothing" according to you) across such a large fleet, I'd be a hero and I'd be rewarded accordingly.
Don't forget game development, especially users of the Unity game engine, where their "aggressive" GC has often been the bane of projects such as Kerbal Space Program. I have been struggling lately with my own Unity GC issues, where it's running the GC every frame, subsequently dropping my FPS from the 90 I need for VR down to 50 every few frames. Even the new experimental GC they have implemented seems to have no effect.
Would probably help if they had constant-time GC available, where you bind mutator ("application code") and collector (GC) to specific quanta of time to run, thus providing reliable timings.
10% is a lot. That's at least a few big cloud data center's worth of servers at this point. I'd gladly give the operators a reason to shut these down and save electricity.
Worse, it's a tradeoff and not a constant overhead. As many Java based server products show, things are fine if you do things such that you avoid garbage collection. GC is only problematic if you are doing bad things like constantly creating lots of objects, or worse, keeping them around for too long.
E.g. Elasticsearch uses a lot of memory mapped files these days instead of heap memory (which they used more heavily in the past). That has done wonders for stop the world garbage collect cycles, which used to be a source of cluster inconsistencies because nodes tend to drop out of the cluster when they stop responding due to garbage collection issues. These days that's much less of an issue.
On servers, idling CPUs is the norm. We run our java servers on t2s in amazon. CPU throttling is not a problem for us. Our servers run out of IO before they run out of CPU. Garbage collects are not an issue. The little there is seems to have little or no impact on response times.
I'd say the innovation in this space comes from new approaches like e.g. Rust with its borrowing mechanisms or functional programming where due to everything being stateless, there's no need to garbage collect that much. It would be interesting to see other languages with borrowing mechanisms; it doesn't sound like something that can easily be retrofitted to existing ones.
The competition for "least true statement on HN" is always fierce, but you've just submitted a contender. All the people who have lots of servers work really hard to coalesce workloads and drive up utilization. I know because that's what I do, and it's what everyone around me does. It's why the cloud you use is cost-effective. The machines that are mostly idle are clients, not servers.
When do you start sending alerts? 10%, 50%, or 90% CPU? For us we like our cheap T2s. We have multiple for redundancy; not for performance reasons. All the places I've been tended to over provision servers to be able to deal with spikes in usage. 90% is typically bad news.
Whatever your alert threshold is, that still defines a fixed amount of work per machine and thus how many machines you need to satisfy a given workload. If you need 100 machines to stay below your alert threshold, and you make a 1% improvement in efficiency, you now need 99 machines to stay below that same threshold. Whatever it is. For just about any load distribution that's not already problematic (e.g. if all of the load gets concentrated on one machine you're screwed no matter what).
I work all day every day on dozens of clusters much bigger than that, and so do a lot of other people. For us, spending half a year to get a 1% improvement is an opportunity big enough to get excited about.
> but the automated process that CPUs are tasked with consumes a lot of computational power—up to 10 percent or more of the total time a CPU spends on an application.
Is that even a problem when most CPUs are idle 90% of the time even when doing typical daily tasks?
It is a problem if you have a server process running that you want to be extremely responsive (latency <100ms) and that then suddenly decides to do garbage collection for a minute or two before answering incoming requests.
Hmm so there's a coprocessor that does the GC... doesn't it need to lock the memory away from the main CPU while it does that? And doesn't this lead back to unpredictable pauses and slowdowns?
I never understood the need for Garbage Collectors.
In my opinion, the difficulties of memory management are extremely overrated. I write code in C/C++ for almost 20 years and I never encountered a difficult bug that would have been avoided with a Garbage Collector.
If a coder really has a hard time with manual memory management it means he can't really code, this is a beginner problem...
I only work in GCed languages, so I don't know how manual memory management works, except segfaults in some courses at university. Guess I should quit my job, as you say I apparently can't really code :/ Thanks for letting me and others here know!
In my experience GC makes programmers sloppy in their resource usage. Just allocate a bunch of memory annnd... whatever, the GC will take care of that. But there are a lot of other resources besides memory that aren't automatically cleaned up like that. So what happens is people forget to close network sockets, forget to unsubscribe event handlers, forget to set certain references to null to actually allow the GC to do its work, etcetera... The existence and over-reliance on GCs has led to a mindset where many programmers are just not aware that what you create must also be destroyed at some point.
Let me rephrase this.
In the CS / Programming space, we have quite a lot of difficult problems to solve, in my opinion, memory management is really easy compared for example to multithreading.
In fact the so called Garbage Collector do not really solve the problem of memory management, there are still a lot of potential for leaks and poor memory management when GC is used.
And it adds a lot of complexity to the language runtime, and it gets in the way, I've seen a lot of discussions about how to avoid triggering the GC, in the end, the solutions are more complex than good old manual memory management.
I am not saying that memory management is trivial if it was it could efficiently be handled by the compiler or the runtime.
There is no magic bullet solution for easy memory management, one has to choose the right policy considering the context, which is most of the time out of reach for the compiler.
The context is mostly the expected lifetime of the memory allocation.
Maybe I'm naive, but with multi-core CPUs, and parallel GCs, isn't it somehow the same? One core is mostly only used for GC, while the others do other things?
Edit: I guess they mention their chip itself can do it at a high level of parallelism, so that's probably one more advantage. But CPUs with additional slower cores and a lot more cores are in the works as well.
No, because it is in general intractable to figure out object lifetime at compile time. Rust has just solved the problem for more cases than, say, C++ does, or perhaps just with more rigor.
That should be an ideal application. Usually the trouble with 3rd-party garbage collection is that it has to discern pointer vs. any other machine word. That's more of a problem with C; this is why the Boehm GC library calls itself "conservative". A runtime like V8 can follow a spec when it allocates memory so everything is properly marked.
I don't care about being absolutely fast when writing code. The convienience of not having to care about memory management is far more important to me. That's why I like GCs.
Seems like we could just as easily stop using garbage collection. ... or even go back to reference counting / smart pointers and just live with the “limitation” that we can’t have circular references.
Reference counting nor "manual" memory management are not faster than GC by themselves. Both tend to have pretty bad slowdowns that tend to result in optimizations... By batching operations similar to "GC pass", because there are different metrics of performance and occasional pause appears to be good compromise for most.
Personally I'm partial to deterministic schemes akin to IBM Metronome, where GC runs in bounded static time.
Why scare quote "limitation"? It literally is a limitation, since it prevents me from doing something in code that can be useful at times. It certainly has tradeoffs.
Sure, if you’re using pre-1990’s memory management. Best practices have evolved far beyond the explicit use of malloc & free and unsafe string handling.
Most popular dynamic languages don’t use garbage collection under the hood anyway. In Python and many other runtimes, it’s primarily reference counted.
Python largely only gets away with ref counting because it doesn't support concurrent access. The GIL means it's basically single-threaded ref counting, which is much cheaper than an actual atomic ref count (like std::shared_ptr)
Scattering atomic ref counts everywhere gets extremely expensive very quickly.
I'd be willing to wager you use plenty of software, codecs, games, and more that would not function nearly half as well without carefully managed memory.
Carefully managing memory often means using a language feature that handles the bulk for you. Neither manual nor GC. And you can be careful about how you use memory even in the presence of a GC, getting the same kind of optimization without any safety risks. It's not a tradeoff between manual management and "who cares?".
I have over a decade of professional experience entirely in C and C++. I believe that manual memory management is the wrong choice for the vast majority of applications. It is error prone and often slower than garbage collection.
Not necessarily common because garbage collected languages tend to not get used for high performance tasks, but a huge amount of modern optimization is in improving contiguous memory access and avoiding pointer chasing to use CPU cache more effectively, in java allocating arrays of primitives type is about the only way to do that. Similarly in c# they added spans to achieve this with better ergonomics, but both basically come down to avoiding the garbage collector for better performance.
Another big one is trying to allocate as much as possible on the stack, either through the language like c/c++/c# or by using local variables and hoping the compiler will do the right thing for you with java. A lot of the work gone into java vm's over the decades have been trying to improve this. Even java sacrifices it's OO purity to make integers and floats value types on the stack, that was to get non-laughable performance.
> This...isn’t true.
It's not even remotely controversial, saying garbage collection is slower is like saying water is wet.
Depending on the workload different memory allocation strategies perform better or worse. GC systems usually even have the fastest allocation. The only problem is that you see deallocation as one big block that is completely disconnected from allocation.
In throughput-optimized GC setups (not latency optimized) that's the case. Though free(), despite being seemingly "constant", can have unpredictable pause times as well when you hit a bad point.
I argue that some limitations, like clear resource ownership, are good... but there are still workarounds that aren’t too arduous. Weak pointers and the like are decent enough abstractions to handle the rare case... or just going manual. It’s not a limitation in practice, no.
It hasn’t been a problem for me in over a decade in a multitude of languages. On Java codebases? I’ve witnessed some frightening levels of sloppiness that are only solvable with a profiler.
Reference counting is already slower than having a GC in most practical applications, not to mention that having no circular references may make the code where you need 'em either slower or means you'll have to write a workaround, which is often also overhead and makes things more complicated.
The benefit of not using GC is improved understanding of the code and a better architecture. These things will also lead to better performance.
GC was a mistake. The main reason it is still used outside scripting languages is the notion that non-GC languages need to be low level. Which in practice is kind of true just because we haven't had any real competition in that area.
Not having GC in no way improves understanding of architecture or gives better performance. Hordes of C/C++ programmers that don't understand memory management but swear by malloc()/free() are a good example of that. And it's not like C/C++ are only language without GC, they are just popular now.
Typical programmer is taught nothing about memory management, till maybe they learn bits and pieces (often by hearsay and cargo culting). It doesn't matter if we're talking manual or automatic. Without such understanding one cannot write performant code or make a good architecture.
Also - a good working definition of low-level language is language that makes you care about things irrelevant to your task...
Being explicit usually means that you have to drop whole classes of optimizations, or manually perform the analysis yourself (probably resulting in errors).
It's not that controversial of a statement. The choice has largely been dictated by the fact that non-GC meant C or C++, and it is not hard to find reasons to avoid them.
Now we also have rust, but rust is new and also has a low-level aura, making comparisons to high-level languages difficult.
That Go has a GC is a shame, such a missed opportunity.
It's not at all zero CPU overhead, not even close. retain & release are thread-safe, meaning atomic ref count. Very comparable in cost to std::shared_ptr<T> or Rust's Arc<T>. Both of which also automatically insert the calls to inc & dec ref counts.
It's cool that you don't need to bother with specifying the type as being std::shared_ptr<T> or Arc<T>, but it's not particularly novel, either. It's "just" syntax sugar (or lack of syntax sugar I guess?)
ARC optimizations do let it retain/release less often. For example, instead of a callee adding the return value to an autorelease pool then having the caller immediately retain the returned value, the autorelease+retain pair can be elided entirely.
> For example, instead of a callee adding the return value to an autorelease pool then having the caller immediately retain the returned value, the autorelease+retain pair can be elided entirely.
Are you just referring to standard RVO? If you return a std::shared_ptr in C++ today you won't get the equivalent of retain+release, either, RVO avoids that.
> For example, if the user puts logging statements in retain, they should not be surprised if those statements are executed more or less often depending on optimization settings
> In general, ARC does not perform retain or release operations when simply using a retainable object pointer as an operand within an expression
> However, C and C++ already call this undefined behavior because the evaluations are unsequenced, and ARC simply exploits that here to avoid needing to retain arguments across a large number of calls.
> ARC performs no extra mandatory work on the caller side, although it may elect to do something to shorten the lifetime of the returned value.
It’s not zero — those -retain and -release calls still exist when necessary and have a small penalty — but it does minimize them well, and is lower overhead and vastly more predictable than GC.
It's a little better than that, because the overhead of ObjC method dispatch is avoided. There are no calls to -retain or -release (if the receiver hasn't overridden those methods), it's just a C function call to objc_release et al. No objc_msgSend involved.
In Java, I created thread local resource pools including strings which eliminate garbage collection in sensitive routines. Of course it’s much faster in java to perform pooled string comparison with ==. Likewise I always use the indexed version of a for loop to avoid the iterator otherwise allocated.
GC in java is great for non-priority code which is most of the application.
Seems worth it if the software developed is a fraction of all software written. Sure your VM/runtime will have to be refactored, but not the millions of programs that run on it.
Think about it, instead of finding a way of expressing when we're done with instances, we have a giant for loop that iterates over all of memory over and over and over to guess when we're done with things. What a mess! If your co-worker proposed this as a solution you'd probably slap them. This article proposes hardware accelerating that for loop. It's like a horse-drawn carriage accelerated by rockets. It's the fastest horse.