This much complexity and the tribal knowledge required to bypass it, so early in the the life of a programming language, is a really bad sign.
It means the complexity of the problem space is not solved by the language, and the responsibility of solving it is being passed on to the users.
You might react to this by saying that slow build times are not a big deal. That's a mistake -- iteration is key to productivity and other languages do it better.
This is more of a step-by-step deep-dive into the ways to profile, diagnose and work through the timing and performance problems. Anyone who doesn't know about many of these tools will probably find them useful (or at least useful to know that they exist) in the future.
Besides, for iteration people are probably in debug, not release mode, which the article mentions is initially 19s vs 2m+. As far as I can see the main conclusion seems to be that there was a compiler regression causing the issues, which is fixed in nightly.
I don't think many users would need to go through these steps to diagnose the problem beyond "wait for a newer compiler release".
In general I don't think being slightly slower in a systems language in cases where you've explicitly asked for full optimisations is that much of a problem when there are simple ways to mitigate the feedback loop (e.g. debug compilations). Would I like these things to be faster? Sure. But with the size of the rust projects I use, it's not noticeable slower than the large established C++ projects I have to deal with (that aren't tuned for compilation speed).
> This is more of a step-by-step deep-dive into the ways to profile, diagnose and work through the timing and performance problems.
I'd argue that when it comes to doing work, I only want to deep-dive into my code. Not third party libraries that "vow" to have been tested + performant, let alone the language/build tools themselves.
I'd be interested to hear what system/field/language you use with compilers that come with guarantees that they will never, ever introduce performance regressions for compile times under any circumstances (presumably with a hefty SLA to match), and where the addition of e.g. an extra innocuous header by some other part of your team will never never ever introduce unexpected nonlinear compile times that require investigations of what the compiler is spending times on.
I suspect this only applies to an extremely small portion of real world code and projects (direct hand optimised asm, perhaps something like forth?)
As far as I am aware the only "guarantees" (interpreting "vow" to mean this) with rust are that edition-bound code will continue to compile correctly, modulo bugs which may be introduced/need fixing.
Of course, it feels like rust could do better, the reputation it has hasn't come from nowhere, but, personally, it's a far cry from e.g. the projects I work on, where we are mandated (by inter-company politics) to use a custom SCons atrocity on C++ that takes two minutes to run a no-op build.
I thought this level of entitlement was restricted to open source users, not devs. AFAIK Rust never promised anything re: the compile-time performance of its standard library, and I know of no 3rd party library that does. You link in a 3rd party library, then that's your code now for better or for worse.
I agree, and even as someone who holds Rust in high regard, I’m becoming increasingly frustrated by this. There doesn’t seem to be one clear root of this problem.
Some of it is the language not being designed with ease of compilation in mind, unlike let’s say, to pick an extreme example, Go. Some of it is the tooling and default settings, especially around incremental compilation, linkers etc. People are working on this, but it takes time.
But unfortunately, a lot of the problem is with the ecosystem, as hinted at in the article. There seems to be no limit to the amount of code bloat and compile time complexity that people are willing to accept to win some microbenchmarks. This includes some very popular crates with lots of dependents, like Tokio.
In the C++ world, the popular “boost” library has a similar problem, and I know shops that avoid it, or large parts of it, simply because the compile times would be unbearable. I really hope that parts of the Rust community start to make similar decisions.
Tokio maintainer here. I've actually been spending a bunch of time recently looking in to how we can reduce Tokio's compile-times. I haven't really been able to find any big wins yet, but one thing has been pretty clear from the benchmarks: The number of dependencies of Tokio is not the problem. They all compile pretty fast and can all compile in parallel. Tokio itself takes a lot longer than the dependencies.
(Except for tokio-macros which takes a long time because it depends on syn and quote. Consider disabling it if you don't need it.)
So I've read "syn" and "slow rust compiles" in pretty much every discussion of "slow rust compiles" and finally googled it and... syn is a Rust parser? For use in macros? Because procedural macros operate on tokens and not an AST? Hm. I'm sure there are reasons for how it ended up like this, but it does smell kinda funny.
If I recall the reason is "rustc's representation of the AST is an implementation detail and unstable, thus macro code may not rely on it (and it will not be exposed)". Hence "syn" came along and provided a stable API and format. "quote" is the inverse function, turning an AST back into tokens.
This is quite interesting. Is `syn` just an interface to the Rust compiler's Rust parser or is it a completely separate implementation of the parser that works at compile-time?
Hopefully it's not the latter but then I wonder where the slowness comes from.
Yeah, but then why not make it the only one? One reason is that proc-macros can "see" the syntax outside the point of declaration(or enclosed). This means you must do macro_rules! {struct.... } instead of declare the struct first then forward it.
First of all, Tokio is my favorite async framework, so thank you and the Tokio team! And second, I appreciate that cargo/rustc compiler errors are clear, helpful, and guide me toward cleaner and thread-safe coding where most other languages happily let me create unsafe hidden race bugs all day long with no warning … oh but they compile so much faster! (I don’t care) The slower compile time really doesn’t bother me when I’m getting a more stable end result… but I’m glad the maintainers are working on improving compile time anyway.
To me the extra compile time makes sense when rust is showing me many unsafe coding issues that languages seem to ignore even with their strict warnings enabled.
… and I don’t get all of the complaining about rust. I’ve written code in a lot of languages and rust has its quirks too, as they all do, but I really like it!
> But unfortunately, a lot of the problem is with the ecosystem, as hinted at in the article. There seems to be no limit to the amount of code bloat and compile time complexity that people are willing to accept to win some microbenchmarks. This includes some very popular crates with lots of dependents, like Tokio.
But isn't this the right trade off for something like Tokio which is at the base of applications which expect to be fast? Or else those applications wouldn't bother with async runtimes and what-have-you. There's also the fact that during the dev cycle, you'll compile tokio & friends once and be done with it. And for CI pipelines, there's caching.
I'm however not a professional developer, so I'm curious if I'm missing something.
Yes, a slow compiletime of a dependency is less painful since you don’t do full recompiles that often, but things can spiral out of control if you have say 50 dependencies and many of them are real slow and you are on your old dual core laptop and find it takes hours, and so on.
Hold up, this is some severe goalpost moving. I realize these comments are coming from different people, but this thread started at:
“tribal knowledge required to bypass it, so early in the the life of a programming language, is a really bad sign.”
and within a few replies we ended at (paraphrasing):
“Granted compilation time is incremental, libraries are only compiled once, debug mode is much faster, but it’s really going to be an issue with a huge project on absurdly obsolete hardware.”
As the author of a project that relies on 50+ projects with some massive dependencies (wgpu, Tokio, rayon), yes it can take a good long while to compile from scratch, but it’s not “hours”, and after first compile that’s all paid for going forward.
Honestly this all seems like grasping at one of Rust’s perceived weak points, but really there are so many better criticisms, I don’t know why compile time gets so much attention.
Including me! There’s a lot of daylight between “an old dual core laptop” and a last gen thread ripper. For example, I’m running on a quad core i7 from 2015.
I don’t think I follow the point you’re trying make. Are you trying to say Rust compile times are a problem for you on your 13 year old laptop in the context of writing GUI code on battery? If so I think that’s such a specific scenario that it’s hard to draw any generally applicable conclusions.
I also own devices like laptop graphics workstations.
The point I am making is Rust only for those with enough income to buy hardware powerful enough to have a usable workflow or for anyone regardless of what hardware they can afford?
As for coding on the go, classical desktops are a dying breed.
Your laptop from 2009 predates even Rust's earliest stable version by 6 years. Including computers that are older than the language itself is completely unreasonable.
That is because the languages are different and choose different tradeoffs. The tradeoffs might make a language inapplicable in your domain.
Rust has always been marketed as a "systems programming language". C++ is Rust's closes competitor in this domain space and it suffers from terribly slow compiling as well. I would say that due to headers, massive portability baggage and a lack of a standard build system, compiling C++ is even worse. And that is a language that has been around for 30+ years.
I agree that build times are important (i.e. PR build takes us 30 minutes which is insane), but sometimes you don't really have a choice. When you do have a choice, then by all means reap the benefits of fast compilation that languages such as Go or C can offer.
I work on a large rust project professionally, and we have tried to make build times a priority for CI. I’m just noting a couple of things that helped us a lot, in case they’re useful.
- sccache is pretty much essential
- beefiness: GitLab’s default runners for example are way too underpowered and made our builds take like 30 minutes (as opposed to 5 on a beefy machine in CircleCI). We’re working on switching to self-hosted runners for GL
- replacing diesel with sqlx saved us a fair bit of time
- caching the cargo home can help. It wasn’t a massive gain relative to sccache, though. Similar story with caching the target directory
- lld linking was another relatively small win, but a win nonetheless
- splitting things up into smaller libraries and binaries (all in one workspace) helped us a fair bit, since building/testing some targets can now totally skip some libraries, but I imagine YMMV
Persisting it across jobs, using whatever your CI framework lets you do. We cache this stuff using the workspace cargo.lock checksum as the primary cache key
C++ projects tend to support massively parallel builds, despite what people tend to complain about with header files. The issue here in this article was partly that Rust only compiles crates in parallel, but while people tend to have lots and lots of C++ translation units, they don't have lots and lots of Rust crates. Languages seem to be moving in the wrong direction here :/.
That's a bit misleading though. The problem with header files is that they tend to get (textually) included in lots of compilation units, and hence rebuilt a lot of times. I did an experiment on a codebase at work where turning on unity builds -- so effectively everything is a single c++ file -- resulted in a 2x speedup in real time, and a 12x reduction in 'user' time.
Of course, such approaches lead to slower hot rebuilds. There's trade-offs worthy of a book. But the point is that C++ is only massively parallel by adding a lot of extra work.
This problem could be overcome by reducing the complexity of header files, but that's generally difficult if template-types are used. In general, the solution is to write code that's closer to C than C++ in the header files. In particular using opaque pointers to hide data.
We're actually seeing a very similiar problem in the linked article. The chief problem is monomorphization (~templates) and the solution is to use boxing (~data hiding via opaque pointers).
That matches my experience as well. It takes a nontrivial amount of discipline, but it’s possible to structure a C++ project for fast builds. In theory, having a module system that isn’t based on text substitution should give Rust an advantage, but on the minus side, the crate structure gives you fewer knobs to optimize.
Avoid over-templating, don't inline expensive functions, put expensive #includes inside .cpp files instead of headers, use internal linkage whenever possible, etc. Doable in theory, painful to get there in practice if you've already dug yourself into a hole, and you can't get out of said hole without convincing the rest of your team/organization to change their coding practices accordingly.
- plugin architecture for the software ; plugins are dynamic libraries when developing (everything is linked statically for releases which take of course much longer to build with lto, etc.)
- building on Linux (it's seriously slower on Windows, when using the exact same compiler on the exact same SSD)
It’s a large topic, but the most important goal is to minimize the amount of code that’s transitively included in headers.
That can mean avoiding templates, or (if you’re willing to do the work) using explicit instantiations in one source file only.
Use the PImpl pattern so that users of a class don’t need to know the declarations of its private members.
And whenever reasonable, design your method signatures so that objects are passed by pointer/reference — this means you can work with forward declarations instead of always having to include the full class declaration.
Do all of that only where reasonable. Small headers usually don’t matter much, and avoiding dependencies on headers that users are likely to include anyway (like <string>) doesn’t win anything either.
I rarely touch C(++) but every time I have to compile something a bit more complicated than the simplest of projects it is noticeably slow. Other languages I use, scala and go and rust, have higher initial cost but seem to suffer less when the sources grow.
A recent example, linkerd proxy compiled in a few minutes, while it took me hours to compile envoy proxy.
Then there are the stories in how ridiculously long it takes to build Chrome or the V8 engine. I’m really not sold that real world c(++) projects are fast to build.
I have a vague memory that at a previous job I was building envoy proxy in a build automation pipeline and it regularly OOMed. While troubleshooting I noticed that most of the time and memory was spent compiling one or two generated C++ files; they contained many many many type parameters. When you have to generate templates something is wrong.
To be more precise, rust is a competitor to C++ and not the other way around.
Also, compiling C++ is _not_ worse than compiling rust projects but again the other way around.
Although compilation times are a hog in both of these languages, it is well known that this is actually one of the biggest rust pain points.
In general, long compilation times are mostly attributed by the complexity of things (algorithms) that compilers are trying to do for you in order to produce machine code and not because there's a lack of standardized build system (a bad idea) nor because of existence of headers.
In particular, biggest "offender" in C++ compilation model is almost always due to the usage of template metaprogramming whereas in rust I'd envision it's the compile time guarantees that must be offered through mechanisms such as borrow checker and alike.
I disagree that it's a pain point in C++ or in Rust. You're generally just doing a lot more stuff and shifting the pain to incremental debug builds (so that the slow path doesn't need recompilation) is a perfectly viable solution.
Template metaprogramming has never been the pain point in C++ for me because nobody wanted to use templates (after a certain points), huge deps are.
Anecdotally, the slow builds I've experienced in Rust were case sof overzealous Serialize implementation (aka, people adding Serialize to everything, just in case) or deps on large external projects (often in C++).
Coming from C++, where you either have someone who knows how to debug build times, I think the comparison with rust is unfair. cargo-timings is pretty nice and gives you a hint on what to look for.
This article actually shows debugging build times is not that hard.
I did that for a personal project (which compiled part of OpenCV) and I'm certainly no ninja.
I remember attempting similar tasks in C++ and getting completely lost.
I live and breathe C++ full time for more than a decade now. Compiling C++ _has_ always been pain point and big efficiency killer. Especially if you come from other languages (I don't but I enjoy learning other stuff too). It doesn't matter if you're directly writing your own template code or not because you're always gonna pull it in indirectly, even in most simplistic and inrealistic case when your sole dependency is C++ runtime library.
OTOH debugging C++ build times is no more than recompiling your code with -ftime-trace and inspecting it with ninjatracing, which btw is a great tool.
However, getting the data as you see is not the real challenge here but in how to use this data to optimize your build pipeline while keeping the functionality and not introducing more technical burden. That is very difficult and for many projects questionable if the benefits will outweigh the total cost invested in it.
In general, long compilation times are mostly attributed by
the complexity of things (algorithms) that compilers are
trying to do for you in order to produce machine code and
not because there's a lack of standardized build system (a
bad idea) nor because of existence of headers.
that is not true - most of the project i worked for suffered from header parsing a lot - sometimes more then 50%, in 100k-1MioK lines of code projects
many C++ developer tend to think that the code generation is the time consuming part but that is less true - you will see if you start benchmarking the compilation with todays tools available in Clang and VS2019+
I tend to disagree with this. I've optimized build times on number of occasions on different projects, each being in the venue of multi-million LoC, and none of them benefited greatly from employing precompiled headers, which essentially should help alleviate exact this gap.
Running the build with -ftime-trace almost always has shown in my experiments that the bottlenecks are either in exploding number of template instatiations or more often than not in linking time.
Compiler authors can often make compilation faster, but they have to work at it and think carefully through the problem. It has to be a focus not an afterthought.
Years ago the GNAT Ada compiler authors intentionally avoided implementing precompiled headers. They instead focused on very fast lexical analysis (using a small handwritten lexer optimized for lower case letters because that was the usual case). By careful optimization of a small part of the compiler they made the recompilation of headers faster than loading a precompiled header!! More info: https://dwheeler.com/essays/make-it-simple-dewar.html
Maybe the Rust team needs to add timers to their test suite. This kind of big regression in compilation time should have been caught by the test suite.
I assure you it's not an afterthought. We track performance changes every week, going as far as reverting highly desired features to avoid regressions. The only reason we accept (small) regressions is for correctness fixes. This regression was only partly exercised by our perf test suite. That's how we grow it: see bad behaviour in the wild, add to the suite.
The source of rust compilation slowness usually boil down to excercising a part of the language that has O(n²) behaviour, like match arm evaluation, trait bound evaluation, or huge expansion due to monomorphization or macro expansion, and the resulting huge linking times from that, or the bottleneck that proc-macros introduce (they get compiled before they can get executed, only relevant on fresh compiles).
The regression was (partly) fixed by adding caching: https://github.com/rust-lang/rust/pull/90423/files and if I recall correctly was introduced by a feature change that fixed a bunch of incorrectly rejected deeply nested trait bounds.
In general, Ada has a pretty good compilation model. It has a formalized version of C++'s physical design with spec/body separation. It's also defined grammatically to required the context clause with dependencies first before everything else in a compilation unit (subunits or library units). `separate` also allows you to separately compile those things which might change a lot.
The rust compiler doesn’t spend a lot of time doing things like type checking and borrow checking, except in pathological edge cases (like in the article), and these edge cases are quickly fixed (eg this one was a regression that’s already fixed).
Most time tends to be spent in LLVM. That will also go down over time as the quality of emitted LLVM IR improves due to optimizations in rustc itself.
In my experience, this has been the other way around. C++ gives you many tools to optimise compilation and when one one takes advantage of them (precompiled headers/shared libs/no boost, etc), the edit-compile-test cycle is fairly fast.
Rust is consistently slow to compile and not much idea of how to go about optimising compile speed.
Both languages are snails compared to Go, of-course.
Rust 1.58 has a bug that introduced a compile time regression, and the knowledge required to bypass it is to just switch to the preview of the next release, which is trivial to set up.
So I really don’t know what you are talking about in your comment. What’s hard about this?
Often when building compilers you trade the speed of compilation against the speed of execution[1], and Rust has (traditionally) chosen the speed of executiong over making the compiler fast.
This won't necessarily always be so, there's e.g. a new compiler backend called cranelift in the works, that makes compilations 30%ish faster, and in return the code isn't as optimized - which is good for a faster inner cycle.
I'm not disagreeing, but I could cold compile a million lines of Delphi code in a few minutes on a crappy laptop more than a decade ago. Delphi managed to be both fast to compile and a fast language; of course, it's not as sophisticated or complex a language as C++ or Rust. But still...
The problem of Rust is that is made with a C++ mindset. Is VERY hard to make compiler fast if it follow what C/C++ do (Go is the only evidence against!).
Is a death by thousands cut. The syntax is the FIRST and big one. How you chose it will impact all the pipeline. Then the rest...
Ada isn't entirely safe. It does avoid C-string issues using length-based strings, bounds-checking array access and other features like access types instead of pointers, but it is still possible to hold onto an access to dynamically allocated memory after deletion (though you can somewhat mitigate this with smart pointers).
But the things that make Rust safe are not that much slower. Macros, Generics, all that syntax, all that lang complexity (ie: Not just the semantics but the way is surfaced) have a bigger impact.
The biggest issue cited in this article is a regression in the type checker, which is an open bug that apparently is just lower priority than whatever else the Rust team is working on instead of ensuring compilation time is fast.
Only once that was worked around did they article really start having to pay attention to the code generation and linking stages, and the issues there seem to mostly be about lack of parallelism caused by the "crates" model combined with link time optimizations and incremental compiles being off by default. You don't need to make a fundamentally faster code generator that makes worse output to deal with this: the performance losses you get from parallel, incremental, and separate compilation techniques are so minimal that only the most serious C++ projects go out of their way to combine translation units and activate LTO.
> It means the complexity of the problem space is not solved by the language, and the responsibility of solving it is being passed on to the users.
What languages do it best?
JavaScript is a dystopia of stacked bundlers. Python dependencies and deployment is a toxic superfund wasteland.
You’re not wrong that it’s a problem. But as far as I can tell build+deployment is an unsolved problem for non-hobby projects in most, if not all, languages.
All that remains is for it to gain a build tool like mix or cargo. Futzing around with esy works but takes more time than I'm willing to regularly invest in it.
I really go and forth on this. By separating the package manager and build tool, they leave the door open for either to be swapped out (which is what esy does, incidentally). The flexibility comes at a price of complexity, but it usually works well after climbing the learning curve.
I don't disagree and you're correct, the problem is that I'm usually investing 1-2h segments in playing with OCaml once every few weeks. Having 80% of that time swallowed by a build tool leaves a sour taste. :|
I get what you're saying but it applies to people who already work full time with OCaml. For guys like myself who want to periodically dip their toes and make a slow and steady progress... it makes us feel we're not welcome. (Obviously we're not owed a made bed but I'm hopeful that you get my point and you wouldn't interpret it as entitlement.)
A potential improvement: a very opinionated guide to esy, perhaps? I'd love to author such a guide one day (and would be delighted if somebody beats me to it).
I don't take it as entitlement but more as a case of expectations not matching with reality. With 1 to 2 hrs every few weeks, it's very difficult to ramp in a meaningful way with most languages. Unless they're toy languages or unless people have spent large amounts of time/money for those use cases.
Perhaps you would be best off finding a fully set-up Docker image with a working installation of OCaml/opam, and connecting VSCode to that to act as your dev environment. That's how some educators are teaching it nowadays.
Yep, agreed again. I'll invest the proper amount of hours to not only nail an automated project bootstrapping with esy which I'll then proceed to script with bash/zsh. I'll also exercise a few paths like adding an external dependency, splitting the project to a library and binary, being able to use a REPL (`utop` I think?) in the project, etc.
After that's done I'll also document the entire thing. At that point that might make a good candidate for a blog post to enrich the OCaml ecosystem, too.
Only after that point can I play casually with it, that's true. There's always an initial ramp-up price that has to be paid and in my case that price was higher than I wanted. I'm still willing to go through it all because I absolutely love what OCaml is in every way that I've been exposed to so far (with the exception of build tooling as already said).
While I am going through this I can indeed make use of Docker images and see what they do that I failed to do before. Could be very illuminating.
Build times matter a lot. I've spent a lot of my career working to reduce build and test cycles to improve productivity. Rust is the one environment where no matter how much time we threw at the problem it never got better.
The two biggest time savings where accomplished by using a version without the regression bug, and using incremental builds in debug mode for local dev. Bugs can be introduced in any language, so the only gotcha to be fixed is that maybe incremental compile should be the default.
In some verticals the only way to properly exercise and test the resulting binary is in release mode. Think things like DBs, game engines, etc. It's why good debug information in release is needed.
I'm very happy with my overall experience using Rust, but I would never say "slow build times are not a big deal". I can work around them just fine by myself because it's the kind of thing I'm used to doing, but as more of my team (especially more junior members) start using it for projects, the slowness becomes a big deal. And I don't see that as a problem with my team at all — it's very much a problem with Rust. Especially if you're doing the kind of work where you're switching fairly quickly between different projects (bug blitz, cross-cutting platform work, etc.), you simply can't afford to be wasting time trying to work around bad-by-default compiler performance.
That said, I don't think that the level of slowness we're talking about here is actually an intrinsic property of Rust. Specifically, I think there are a handful of strategies that would really improve things:
- For teams, dead-simple out-of-the-box sccache integration, and officially supported Terraform (etc.) projects for setting it up in your corporate cloud environment. Then your "cold" builds for a project you've never touched before could be a lot quicker.
- A set of inbuilt use-case-driven named profiles for Cargo beyond just "debug" and "release", which you could set as the default for your "debug" or "release" builds. I'm thinking things like "prioritise-runtime-performance-and-incremental-build-time" (maybe with a better name) that sets the right kinds of flags to make that work, given that all those things are achievable if you tweak flags manually.
- Using a faster linker by default, e.g. Mold. Although I don't know whether there are portability or other issues here. Maybe it could just be used where possible?
- Binary repositories. I think people are too quick to dismiss these on the grounds that different features enabled on a crate makes it produce a very different output. You could cache the three most common configurations or something.
TL;DR: I think just not a lot of time has gone into tackling this at a high level, because there have been other priorities, and there are whole lot of directions that could be explored that might make a big difference.
> Using a faster linker by default, e.g. Mold. Although I don't know whether there are portability or other issues here.
mold supports precisely one platform: x86_64 Linux. It also doesn't support linker scripts beyond what is necessary to link libc, and given the creator honestly suggested the suckless practice of editing and recompiling the source code as a possible replacement for them I have my doubts that it'll have a satisfactory replacement for, say, the things I'm currently doing. (Admittedly those things are in ARMv4T rather than x86_64, but still.)
mold supports not only x86-64 but also i386 and ARM64. And besides libc, it can link almost all user-land Linux programs already (I tested that by compiling all Gentoo packages with mold).
If you have to use a linker script for kennel development or embedded programming, mold doesn't work for you, though.
Rust seems to have inherited two favorite C++ development process features: long compilation times and incomprehensible error messages that aren't really related to the actual issue (e.g. just how C++ barfed out a bunch of template errors, borrow checker really loves to barf out very obscure error messages when there's an issue with lifetimes).
If you encounter incomprehensible error messages, I would encourage you to file a ticket at http://github.com/rust-lang/rust/issues. We consider them bugs and are diligent about dealing with them.
Thank you for the great report. It indeed seems to have fallen through the cracks, with no one touching it (it's missing the T-compiler tag, which means I didn't triage it) since it was first cathegorized.
I wouldn't call that lack of output representative of the experience of using rustc though, if I'm allowed to let my pride on my work flare up a bit.
Sorry, I did not mean to say that this was representative of the overall experience. On the contrary, I'm usually quite satisfied with the compiler messages. Thanks for your work!
Thank you! The "bad" thing about having high standards is that when it isn't met it is quite jarring, more so than if it were homogeneously bad or meh, and that ends up bring more frustrating.
While compilation time is a well known Rust pain point, Rust error messages are really one of the best part of the language.
Complicated error messages are clearly no a common occurrence unless you rely heavily on meta-programming tricks, and that is not common in Rust, unlike C++.
And the intention is to improve those too. It's hard, and I would prefer crate writers show more restrain than they do, but we have to meet users where they are.
Saying in it inherited them, just proves you probably didn't use either.
While compilation times are longer than say Go, they definitely never felt too long, compared to say Java.
But your comment around errors takes the cake. Like maybe, if you use some combination of macros expansion and traits it could get confused.
But Rust errors are on par with Elm. They show the line, they show what went wrong and how to fix. This is a far cry from C++, where using even a bit of templates, results in unintelligible mess.
> they definitely never felt too long, compared to say Java.
Woah... now I am the one who have to ask: what the hell are you on about??
I work on Rust and Java projects, if you exclude running tests, Java compiles very very fast compared to pretty much any language... Rust is a lot slower to compile even on the much smaller projects I've worked on... in this post, a very small project (I think it's like 16,000 LoC, it's mentioned in the beginning) was taking over a minute even on hot builds... To get a build to take this long in Java you would need to have several million LoC! Even if you don't use something like Gradle to cache compilation units and just compile everything from scratch with javac.
You mean, by the time you split up the project into a bunch of tiny crates and mention:
> I don't love how the dependency graph looks
After adding this complexity to the project, yes, some of the crates are building in ~3s, if they are the only ones that needed to be rebuilt.
Meanwhile, in a Java project there is none of this artificial crate splitting. Sure, Java projects have other nonsense, but they don't force you to split up projects into tiny pieces to get decent compile times.
It's a tradeoff: if you make the file the language's compilation unit, then you can't have "circular imports" to allow end users divide their code in an ergonomic manner with fewer "arbitrary" restrictions. We can argue all year on whether Rust settled on the right side of that trade-off.
I'm just surprised that no one has (to my knowledge, anyway) tried this even experimentally. Wouldn't the potential build speed improvement be worth at least trying out the one-file-per-crate strategy?
Maven is very slow. Gradle and Bazel avoid doing wasteful things like Maven does so generally are a lot faster... if you want to know the speed of compiling Java you should just run `javac`, assuming Maven speed is Java compilation speed is far from correct even if a lot of projects still use Maven to build Java projects.
Use Maven Daemon. (https://github.com/apache/maven-mvnd) . With Maven Daemon, traditional maven builds get a spectacular boost factor that put them ahead of Gradle. (Haven't tried Bazel though)
Build times have regressed considerably the past few releases (at least for me). Not that I'm complaining about the tradeoffs, the new features are generally worth it. Whats most annoying is that it's often hard to tell which features are expensive until you've already wasted a bunch of time introducing em.
But I've got to say I'm super jealous of the level of introspection the rust compiler allows here. I'd definitely be willing to change a few C# coding patterns in code bases I maintain to benefit build times, but as is, that's a really painful trial and error process. And sometimes you get lucky finding the root cause, but if you don't... well, it's not a great experience.
Rust's errors are definitely the best of any language I've used. That's one reason it's nice to write Rust after a day fighting with SQL Server for example which seems to still think in 2021 that "Syntax error" is a reasonable diagnostic message.
As to "How to fix it" however if you have lifetime confusion the error reported by the borrow checker probably isn't going to suggest refactoring your code with a design that makes sense even though that's likely the correct solution. It would be cute if the Rust compiler said "I think you want a Boxed collection of strings here to use the algorithm you're attempting, change how this is represented in all the related structs too, here's how:" but that's unrealistic and, more importantly, no other language is doing a better job here.
While Rust makes the errors themselves look easy, the right suggestion is context sensitive anyway. In my AoC solutions, unwrap() was the correct answer anywhere the problem is "I have an Option or a Result and I wish that I didn't". Because it's AoC if I'm wrong to be so cavalier I will eat a panic immediately, and I don't plan to maintain this stuff. In a real project I probably want to explicitly handle None and Error in some places, and pass on Errors in other places, even when I don't I should write an expect() in case I'm maintaining this years from now.
More likely in lifetime confusion scenarios the equivalent C++ compiles but just has mysterious crashes or bugs and the equivalent Java has runtime Null reference errors, Concurrency errors, a truly enormous memory leak, data races, other mysterious mutation bugs, or all of the above. I'll take a compile time error from the borrow checker thanks.
> It would be cute if the Rust compiler said "I think you want a Boxed collection of strings here to use the algorithm you're attempting, change how this is represented in all the related structs too, here's how:" but that's unrealistic
I think we could provide such messages in some cases. :)
For example, if you have an impl Trait return type and have multiple returned values of different types, we suggest using Box<dyn Trait> when valid. It's just a lot of work to get those in place. You're effectively implementing a language that does those things automatically, plus all the work to print out the actual diagnostic.
I teach C++ and Rust to college students, and their number one praise of Rust over C++ is how much more helpful the error messages are. It’s really night and day, so I can’t even fathom how you would equate the two.
It's easy to fathom once you go beyond the school examples and start working on bigger codebases. The error messages are verbose and simple until you hit cases where borrow checker complains about an error in a completely different part of the codebase than where the actual issue here is. This is where it behaves similar to C++ (and, honestly, some other languages) and there all the verbosity in the world doesn't help.
Have you really used Rust on big codebases? One of the big advantage of Rust is especially that it fails far earlier than C++. Borrow checker error are local error that prevent bad behavior to propagate and cause problem on unexpected part.
Rust has some of the best compiler error messages out there so not sure what you're talking about there. And C++'s compiler error message issue stopped being an issue since like GCC 4.
I had a hunch that Warp would be the answer when I started reading.
I hit the same issue and worked around it the same way (boxing). My build times also went from 1-2 minutes down to a couple seconds.
Generally speaking, most of my Rust work has been reasonably fast on modern hardware. Waiting 3-5 seconds for incremental builds isn't a big deal. I've only seen the excessive compile times with cold starts or in weird edge cases like the Warp issue.
This article is fairly strange. The author keeps changing settings to make the release profile more like a debug profile. Wouldn't I want LTO for releases?
Also this is a really long article with a lot of cool tricks and interesting information. Like the author's binary crate, they might consider splitting this article apart. Or at least adding a TOC!
As I read it, the flag changes are to find the source of the slowdown. I.e. if disabling LTO for a moment would improve the compilation speed drastically, then I probably still want it enabled but may now want to see if there are ways to "please" the LTO, to massage the input to be somehow more LTO-friendly and allow it to work faster - or find some other ways to optimize LTO (like how they experiment with alternative linkers in other parts of the article). It helps to understand whether the biggest gains can be had from focusing on optimizing this particular area. Also it helps quickly and roughly establish what's the "best case" improvement they can probably achieve, as presumably the debug-build set of options is bent on compilation speed.
Still, in the end the biggest performance gain seems to have been found faaar away from LTO in this case.
(Note: I am a long time systems software developer but I don't use Rust, so while this made perfect sense to me maybe it shouldn't have due to some Rust thing I don't know about.) I'm guessing the issue is how there is really a stage between "debug" and "release" where you need the code to run fast enough to be worth testing and using but you don't need to eek out then final 0.5% of performance by cranking all the settings. When working on such projects you only drop down to a "debug build" if you end up entirely screwed trying to figure out why something is broken because the optimizer has messed up the debugger, and you care OK with the lower performance as you've already isolated the issue.
The long meandering nature is kind of the authors style. He typically picks a problem then goes deep into it exploring whatever tangents may crop up while doing so.
I've always wondered why useful flags like `-Z timings` are not available on stable. I guess they're worried about the output changing in a future stable release, but I don't think anyone would expect the reports to look exactly the same and list exactly the same compilation phases, etc, for all eternity. I think that for diagnostic features, people would be just fine with a loose stability guarantee that allows the output to be improved in the future.
I've been working on stabilizing some options like this, with exactly that approach: the functionality is stable but the exact details aren't. For instance, instrument-coverage will work like that, generating coverage data that requires the current version of LLVM coverage tools to work with.
timings doesn't seem especially hard to stabilize; I'll take a look at it.
But you can't just say that the flag is stable while the output may change.
Third party tools might rely on the output. This is okay for things read by humans (say the error messages) but fails for things that tools are parsing, like data needed for code coverage tools. Stabilization of instrument-coverage, as proposed, is a bad idea in my opinion. I'm sad that the concerns have not been addressed: https://github.com/rust-lang/rust/pull/90132#issuecomment-94...
I would expect the stability guarantee to cover outputting _something_ human-readable, and that's it. If you want to parse the output, I would scope that under a separate feature such as `-Z timings-json`. This is similar to error messages, where the colorized output and text can change at any time, but tools can pass a flag to get stable JSON output.
It will be hugely appreciated, thank you. It's honestly annoying periodically checking the flags docs and wondering "these things don't seem to change much, when will we have them in stable?".
As another poster down-thread said, human output can change at any time and this shouldn't be viewed as a stability problem. And if the machine output is still in the air then maybe it pays off to branch it out in a separate flag?
It sounds like these tools should be using some machine format. Perhaps json or something as opposed to human readable strings. Then again, also the layout of the json might change but at least that can potentially be done in a backwards compatible manner.
I came from java and compared to that even our slowest rust build time is faster than I have ever experienced in java. I remember we had a Java springboot application who had a buildtime of 8 minutes on my 2015 MacBook. I am so happy I don't have to deal with this anymore.
Kindly check LOC and number of dependencies of your Java App and then compare with this Rust app. Java compiles damn fast, its all the massive chain of dependencies that people keep pulling with it that is usually the issue. That and lots of annotation processing.
Avoid these things and Java is extremely fast to compile
PS: Also if you are using Maven - leverage Maven Daemon.
PPS: Languages like Go are extremely fast to compile not just because the language is simpler but also because the standard library is so good and deep that deps are not needed.
The long compile times are actually one of Rust's problems I could live with.
Having to depend on so many third party libs is a much bigger issue for long term maintainability and security. Also the time choosing the libs is not neglectible. At work we currently developing a mid size project (CLI and HTTP API) with Go and the standard lib is simply amazing.
Long compile times and having lots of small crates in your dependency tree, surprisingly, are related in that, as the author mentions in the article, splitting libraries into separate crates makes the compilation of each individual crate much faster because crates you depend on are compiled in parallel, and don't ever need to be re-compiled after the first compilation (so hot builds become much faster) while the current crate being compiled becomes naturally much smaller, hence faster to compile, when you pull chunks of it into other crates.
Whenever I read articles like this one, it leaves me wondering in what parallel reality do I live in.
Or I could put it differently and say that authors making blogposts complaining about full rebuild cycle taking staggering 2 minutes live in a pink bubble.
Everyday reality in somewhat complex industry-placed product is 10x that if you're lucky. More often than not it's few hours before you're able to build the whole thing.
I think it helps to have some of the not-exactly-stated background: their build did not originally take 2mn, it suddenly shot up to this from around 20s seemingly overnight, so they investigated what happened, and as that's lime they also produced a nice article out of it.
That's my assumption anyway because that's about what I went through, minus the blog post: had a project I was working on, at one point I noticed the clean builds were getting really long, we're talking >10mn. Initially I figured it might be Serde (as that project is really serde-heavy) and the compile time had crept up on me unnoticed, so I moved all the serialized and deserialized types in a sub-crate in case that did anything... and it did not.
Then I busted out the -Ztimings getting about the same result Lime did (ungodly amount of time spent producing the binary for no clear reason), tried fiddling with some compile-time options (went nowhere near as deep there, I looked at -Z self-profile fairly quickly IIRC), finally found the evaluate_obligation stuff. From there I didn't bust out the profiler to see what was what, instead through I don't quite remember which chain got me to issue 91598 (https://github.com/rust-lang/rust/issues/91598).
Added a follow to the issue and various PRs proposed for it, and locked the project to 1.56.
Same here, I don't get what the complaint is. A friend at a major bank says they used to take 7 hours to build their C++ trading system before someone cleaned it up.
2m to build a release and a few seconds to build incremental seems fine to me. Most of the time when I'm coding I just want the compiler to tell me if the syntax is wrong, and it is pretty snappy with rust.
Let's say you had a build that took 7 hours but if you took all the steps the author did, you would be able to build it in 30 minutes locally for debugging. Would that be interesting? That's about the amount of relative improvement the author saw.
Size of input varies, finding solutions to unnecessarily long compiles in rust are useful across many of them.
Rust's slow compiles are such a turn off for me. Like why does it take tens of seconds to recompile when I am just changing a single number in a file? Does it really need to waste so much of my time to change a single byte in the output binary?
I once had a similar project to OP that took 30 seconds to compile a one-line change (albeit on a 10-year-old CPU). I split it into about 7 crates, and got compiles down to 3 seconds. Since then I'm always vigilant about keeping crates nice and small. I think as long as you're keeping an eye on your crate sizes, compile times won't get away from you.
This is different from C++ where each individual source file can be compiled in parallel, so it's something I had to re-learn.
I did it by hand. It's a mostly mechanical transformation, so in theory it could be automated, but I've never seen an automated refactor for any language that can e.g. take a giant file and decide how to split it up into 5 smaller files. But that's something I'd love to see exist
> That's a pretty bad incentive though and you end up with very large dependency trees similar to JS. And we know how that went.
That's... not the same thing at all.
Rust lets you have multiple crate in a single workspace, which allows parallelising the build (because crates are the concurrency unit of building rust).
That's got nothing whatsoever to do with pulling random crates, which is a separate issue.
I don't think there is a moral justification for circular dependency scopes to always be individual files, especially given that import statements count as circular. The same issue in other languages where circular scopes are individual files leads to some horrible workarounds:
A 'moral' justification is not needed. We are talking about potentially massive build speed improvement here. Sounds like it could just be a build setting.
> you end up with very large dependency trees similar to JS. And we know how that went.
The problem in the JS ecosystem is not the number of dependencies, but the number of entities you have to trust. There is no real difference between having one big crate in a repo managed by one entity or fifteen small crates in a repo managed by one entity.
You are right. There is no technical reason it should be as slow as it is. It's just that not enough resource was spent to make it fast. I mean, if this were a higher priority, Rust issue #26600 would have been fixed years ago.
> Currently, GlobalISel is within 1.5 the speed of FastISel according to https://llvm.org/docs/GlobalISel/index.html and they have some ambitions for getting it within 1.1 or 1.2 in time, so it seems likely that GlobalISel will close this issue before FastISel grows the relevant support.
Again, if this were a priority, it would have happened. But compile time is low priority. Its priority is lower than a quite niche feature like cross-language LTO, which happened in 2019.
I think speed is separate from caching, albeit overlapping. If the compiler can see exactly what changes your code will have in the final binary, then it can do very very little work. However, it would also suffice for the compiler to just be so fast that it can do a complete compile from zero to finished in some acceptable time (tcc and I think Go favor this). Those are different goals, or at least different ways of achieving the desired outcome.
> Like why does it take tens of seconds to recompile when I am just changing a single number in a file?
Impossible to tell without knowing more. Steps to reproduce the issue would help. Just consider how TFA went through many different things to investigate.
> Does it really need to waste so much of my time to change a single byte in the output binary?
Does it actually only change a single byte in the binary? Changing a single value can cascade much further than that.
It was a rhetorical question based off my experience with rust. I was running into this issue with building a site using warp along with some other dependencies. Those projects have been deleted off my system. I am not interested in chasing down performance issues with the compiler. Remaking the site using C++ did not have me run into slow compile times. I just want to be able to quickly iterate on stuff I work on.
>Does it actually only change a single byte in the binary?
It would be possible. I could do it myself using binary ninja / ghidra. I just want to be able to iterate quickly and try out multiple different values. I understand it's not that simple of a problem, but I just want it to work else I'll gravitate to making projects in some other language.
I feel the same pain, specially since when using C++ I tend to just use binary libraries for all the code I don't own, so even cold builds are quite fast even when talking about C++.
Just by having cargo support binary libraries would be an improvement for cold builds, but I understand it isn't a priority.
Are you using incremental compilation? Ive personally never had problems with incremental compiles. Sure, production builds are slow, but that's rarely a problem.
Contrary to most, I actually like the long compile times.
I also like programming in Rust without a linter.
It really makes me think about what it is I'm doing. I do not have the luxury of typing something down, compiling it, and running it after every change to see: "does it work now?"
To be fair, that's only in my personal projects. I understand this is a major issue in the "ship it quickly -- everything else be damned" environment of commercial programming, where one must rely on all these aides to get something reasonably done in a reasonable amount of time.
It was a major headache when I first learned the language, because the syntax is so rigid and exacting; but it did force me to really understand the language, instead of just being able to throw shit at the wall and consult technical docs everytime I wanted to do something.
No offense, but I want to hate this opinion, yet I kind of agree with it on some level. There was a time where I could write several pages of compilable code on paper, using the standard library and 3rd party dependencies, without looking at docs or anything. I could do that after a relatively short time after using a language.
Now, I can still do the same, but that's after spending years using a particular language or ecosystem.
I wouldn't want to go back to writing code on paper without docs, but it definitely made you think differently when it came to both writing code and learning. Kind of like how when you had a question about something before the internet, you had to stew with it until you could answer it yourself, or until you met someone who could answer your question, or you did your own research. Now, I can take out my phone and answer most questions within 20 seconds.
This could be interesting to look into. It looks like there already exists a project that can generate BUILD files for external crates.
My 2 main gripes with Bazel was that it was a pain having to rewrite the build system for all of my dependencies and that it's claim of being reproducible is weak in the sense that there are not even warnings when you use resources from the base system (eg. compilers can include files from the base system which may not be the same / exist on another) (this is a problem since I wanted to use a header from the dependency I built with Bazel and not something already on my system)
Are you enabling sandboxing? Use `--spawn_strategy=sandboxed`, or better yet `--spawn_strategy=worker,sandboxed --worker_sandboxing`. That should disallow using files from the base system.
This does disable multiplex workers,.which can make it more memory intensive. Working on that.
Yes, I was using their sandbox. They intentionally make their sandbox weak so you can use things like gcc from the system without having to bootstrap them.
I don't know exactly what I tried since this was maybe a year and a half ago. I tried asking on their slack, but I think I was told that it was not possible. I don't have the project around anymore to try out your suggestion.
This is controlled by their default toolchain that includes /usr/include and such. You can define your own toolchain with different include directories.
Yep, agree on all fronts, the last point being especially painful when working on software you distribute to end users whose systems aren't in your control.
> Very interesting that the author managed to hit a pathological case in rustc.
FWIW lots of people hit this one, because entire pretty-base libraries got impacted by the regression: it hit everything which makes significant use of deeply nested "decorator" types, like iterator, future, and stream combinators.
Lime hit it because Warp is a big, big user of large nested decorator types (its entire routing and response system is based on that), pretty much every user of Warp with more than a handful of routes in their project hit this issue.
Instead of using sccache and turning off incremental compilation it could help to change the approach and retain the build folder from the main branch CI builds for all other builds, since both sccache and incremental compilation are kind of overlapping with incremental compilation being the integrated solution while sccache seems to be more like a workaround?
Warp uses a lot of generics, which can produce extremely deep and long types. In turn, these cause a particular compile stage to take a long time (> 40 seconds).
Boxing disguises/splits the chain by taking a pointer to a trait object, which doesn't require the same depth of checks.
(This is especially important if the algorithm is superlinear, which is implied but not stated in the article.)
The problem is not the length of the types, but rather the number of "obligations" introduced by each type. If you have Foo<Bar<Baz>>, and each one of those types has a single trait bound, the resulting type now has >3 obligations to evaluate (> because you also have Sized and WF checks for them as well). The deeper the type depth, the more potential obligations you end up with, which have in some cases can have exponential evaluation. We have caching in place to only evaluate a given obligation once, but it can't always kick in. I'm sure there are still things we can improve there, though.
If you have T and box it, a new type Box<T> is created. The compiler not only has to do typeck to make sure that you're using Box<T> in both sides of an assignment, it also has to evaluate the bound on Box (in this case it's only that it is Well-Formed, an internal concept, because Box accepts unsized types, struct Box<A: ?Sized>), but also evaluate all the bounds on T. If T: Foo, and the type you used for T is Bar<X: Qux>, then by the time rustc actually has to confirm the type Box<Bar<X>> is correct, it has to check that Bar<X>: Foo and X: Qux, but you never wrote those close to the Box, these flow from other definitions. Multiply for every type with an explicit or implicit where clause (T: Foo, where <A as B>::C: D, associated types, etc.).
Another thing: you can also coherce a type to Box<dyn Trait>, which has the side effect of erasing any obligations not directly mentioned by Trait, which is why one of the solutions to the perf regression (or bad compilation times in general), is to use trait objects more often.
It's not about boxing itself, but about using dynamic dispatch that is enabled by boxing.
1. Type erasure — compiler works with a single abstract interface rather than big complex concrete types.
2. Replaces monomorphisation (a fancy word for big copy'n'paste of code for every type it's used with) with dynamic dispatch that has a single implementation for all users.
A rust project with 420 deps but only 7k LOC suddenly took 2 minutes to compile with cache. Turns out it's mostly a compiler regression that is already fixed. The blog shows all kinds of profiling and analysis of the Rust compiler.
Very interesting and detailed article how to profile the Rust toolchain, but TBH, after the reveal that the project uses several hundred dependencies my first thought was: "well, there's your problem". One thing or another will always be broken in such a complex scenario.
"In another language, say, C or C++, we might write a Makefile by hand.
What? No. Nobody does that anymore."
Oh these Rustaceans or how they call themselves. Look at the gaming industry, if you're playing any AAA/AA game it's most likely C++, there's some market share on C# (where Unity is used), but still the whole gaming industry is C++, and it's bigger than movie/music industries combined.
So yes, people "do that", and it's not going to change anytime soon.
Ok, I didn't read all that but that is 100W of power.
My solution to be able to work on a 5W Raspberry 4 is to use C (arrays for cache misses), compile with GCC to a .so and hot-deploy that into my engine.
That way I have a zero dependencies engine (only stb_ttf and kuba_zip) that compiles from scratch in 30 seconds and the game .so takes ~1 second.
I don't even have to use a build system or many cores to compile.
It means the complexity of the problem space is not solved by the language, and the responsibility of solving it is being passed on to the users.
You might react to this by saying that slow build times are not a big deal. That's a mistake -- iteration is key to productivity and other languages do it better.