It's hard to see features through the programming language theory jargon, but solid theoretical foundations have worked well for Rust so far.
Jargon terms like "sum types" or "affine types" may seem complicated, but when you see it's actually "enums with data fields", it makes so much sense, and prevents plenty of state-related bugs.
Proposed "effects" mean that when you're writing an iterator or a stream, and need to handle error or await somewhere in the chain, you won't suddenly have a puzzle how to replace all of the functions in the entire chain and your call stack with their async or fallible equivalents.
"linear types" means that Rust will be able to have more control over destruction and lifetime of objects beyond sync call stack, so the tokio::spawn() (the "Rust async sucks" function) won't have to be complaining endlessly about lifetimes whenever you use a local variable.
I can't vouch for the specifics of the proposed features (they have tricky to design details), but it's not simply Rust getting more complex, but rather Rust trying to solve and simplify more problems, with robust and generalizable language features, rather than ad-hoc special cases. When it works it makes the language more uniform overall and gives a lot of bang for the buck in terms of complexity vs problems solved.
What worked for rust is having enums, sane-ish error handling, having sane integer types and the borrow checker, good tooling. The rest is just not that useful compared to how much garbage it creates
You didn't mention parametric polymorphism, which is incredibly useful and important to the language. I'm guessing you intentionally excluded async, but to describe it as "not that useful" would just be wrong, there is a large class of programs that can be expressed very simply using async rust but would be very complicated to express in sync rust (assuming equivalent performance).
No, stackful coroutines requires a runtime. Not going to work on embedded, which is where async rust shines the strongest.
If you don't care about embedded that is fine. But almost all systems in the world are embedded. "Normal" computers are the odd ones out. Every "normal" computer has several embedded systems in it (one or more of SSD controller, NIC, WiFi controller, celular modem, embedded controller, etc). And then cars, appliances, cameras, routers, toys, etc have many more.
It is a use case that matters. To have secure and reliable embedded systems is important to humanity's future. We need to turn the trend of major security vulnerabilities and buggy software in general around. Rust is part of that story.
A stackfull coroutine is brittle and doesn't compose as cleanly as stackless coroutines. As a default language primitive, the latter is almost always the robust choice. Most devs should be using stackless coroutines for async unless they can articulate a technical justification for introducing the issues that stackfull coroutines bring with them.
I've implemented several stackfull and stackless async engines from scratch. When I started out I had a naive bias toward stackfull but over time have come to appreciate that stackless is the correct model even if it seems more complicated to use.
That said, I don't know why everyone uses runtimes like tokio for async. If performance is your objective then not designing and writing your own scheduler misses the point.
I understand what that is but I just don’t care. I am guessing the vast majority of people using rust also don’t care. Justifying the decision to create this mess by saying it is for embedded makes no sense to me.
Also don’t understand why you would use rust for embedded instead of c
Embedded systems vastly outnumber classical computers. Every classical computer has several embedded systems in them. As does appliances, cars, etc. So yes they are an incredible important use case to secure our modern infrastructure.
"Tight control over memory use" sounds wrong considering every single allocation in rust is done through the global allocator. And pretty much everything in rust async is put into an Arc.
I don't understand what kind of use case they were optimizing for when they designed this system. Don't think they were optimizing only for embedded or similar applications where they don't use a runtime at all.
Using stackfull coroutines, having a trait in std for runtimes and passing that trait around into async functions would be much better in my opinion instead of having the compiler transform entire functions and having more and more and more complexity layered on top of it solve the complexities that this decision created.
> "Tight control over memory use" sounds wrong considering every single allocation in rust is done through the global allocator.
In the case of Rust's async design, the answer is that that simply isn't a problem when your design was intentionally chosen to not require allocation in the first place.
> And pretty much everything in rust async is put into an Arc.
IIRC that's more a tokio thing than a Rust async thing in general. Parts of the ecosystem that use a different runtime (e.g., IIRC embassy in embedded) don't face the same requirements.
I think it would be nice if there were less reliance on specific executors in general, though.
> Don't think they were optimizing only for embedded or similar applications where they don't use a runtime at all.
I would say less that the Rust devs were optimizing for such a use case and more that they didn't want to preclude such a use case.
> having a trait in std for runtimes and passing that trait around into async functions
Yes, the lack of some way to abstract over/otherwise avoid locking oneself into specific runtimes is a known pain point that seems to be progressing at a frustratingly slow rate.
I could have sworn that that was supposed to be one of the improvements to be worked on after the initial MVP landed in the 2018 edition, but I can't seem to find a supporting blog post so I'm not sure I'm getting this confused with the myriad other sharp edges Rust's sync design has.
> > And pretty much everything in rust async is put into an Arc.
> IIRC that's more a tokio thing than a Rust async thing in general. Parts of the ecosystem that use a different runtime (e.g., IIRC embassy in embedded) don't face the same requirements.
Well, if you're implementing an async rust executor, the current async system gives you exactly 2 choices:
1) Implement the `Wake` trait, which requires `Arc` [1], or
2) Create your own `RawWaker` and `RawWakerVTable` instances, which are gobsmackingly unsafe, including `void*` pointers and DIY vtables [2]
Sure, but those are arguably more like implementation details as far as end users are concerned, aren't they? At least off the top of my head I'd imagine tokio would require Send + Sync for tasks due to its work-stealing architecture regardless of whether it uses Wake or RawWaker/RawWakerVTable internally.
I find it interesting that there's relatively recent discussion about adding LocalWaker back in [0] after it was removed [1]. Wonder what changed.
When a new technology emerges we typically see some people who embrace it and "figure it out".
Electronic synthesisers went from "it's a piano, but expensive and sounds worse" to every weird preset creating a whole new genre of electronic music.
So it seems plausible, like Claude's code, that our complaints about unmaintainable code are from trying to use it like a piano, and the rave kids will find a better use for it.
Their default solution is to keep digging. It has a compounding effect of generating more and more code.
If they implement something with a not-so-great approach, they'll keep adding workarounds or redundant code every time they run into limitations later.
If you tell them the code is slow, they'll try to add optimized fast paths (more code), specialized routines (more code), custom data structures (even more code). And then add fractally more code to patch up all the problems that code has created.
If you complain it's buggy, you can have 10 bespoke tests for every bug. Plus a new mocking framework created every time the last one turns out to be unfit for purpose.
If you ask to unify the duplication, it'll say "No problem, here's a brand new metamock abstract adapter framework that has a superset of all feature sets, plus two new metamock drivers for the older and the newer code! Let me know if you want me to write tests for the new adapters."
LLM code is higher quality than any codes I have seen in my 20 years in F500.
So yeah you need to "guide" it, and ensure that it will not bypass all the security guidance for ex...But at least you are in control, although the cognitive load is much higher as well than just "blind trust of what is delivered".
But I can see the carnage with offshoring+LLM, or "most employees", including so call software engineer + LLM.
Huh, that explains a lot about the F500, and their buzzword slogans like "culture of excellence".
LLM code is still mostly absurdly bad, unless you tell it in painstaking detail what to do and what to avoid, and never ask it to do a bigger job at a time than a single function or very small class.
Edit: I'll admit though that the detailed explanation is often still much less work than typing everything yourself. But it is a showstopper for autonomous "agentic coding".
> unless you tell it in painstaking detail what to do and what to avoid, and never ask it to do a bigger job at a time than a single function or very small class.
This is hyperbolic, but the general sentiment is accurate enough, at least for now. I've noticed a bimodal distribution of quality when using these tools. The people who approach the LLM from the lens of a combo architect & PM, do all the leg work, set up the guard rails, define the acceptance criteria, these are the people who get great results. The people who walk up and say "sudo make me a sandwich" do not.
Also the latter group complains that they don't see the point of the first group. Why would they put in all the work when they could just code? But what they don't see is that *someone* was always doing that work, it just wasn't them in the past. We're moving to a world where the mechanical part of grinding the code is not worth much, people who defined their existence as avoiding all the legwork will be left in the cold.
Maybe a bit, but unfortunately sometimes not so much. I recently had an LLM write a couple of transforms on a tree in Python. The node class just had "kind" and "children" defined, nothing else. The LLM added new attributes to use in the new node kinds (Python allows to just do "foo.bar=baz" to add one). Apparently it saw a lot of code doing that during training.
I corrected the code by hand and modified the Node class to raise an error when new attributes are added, with an emphatic source code comment to not add new attributes.
A couple of sessions later it did it again, even adding it's own comment about circumventing the restriction! X-|
Anyways, I think I mostly agree with your assessment. I might be dating myself here, but I'm not even sure what happened that made "coding" grunt work. It used to be every "coder" was an "architect" as well, and did their own legwork as needed. Maybe labor shortages changed that.
> It used to be every "coder" was an "architect" as well, and did their own legwork as needed.
I disagree. I remember in the days before "software engineer" became the rage that the standard job titles had a clear delineation between the people who thought the big thoughts with titles like "analyst" and the people who did the grunt work of coding who were "programmers". You'd also see roles in between like "programmer/analyst"
Might be a big company thing then, but I'm not wholly convinced. There's a big gap between designing the outline of a big system and coding instructions that can be followed without having to make your own decisions. The question of how much of that gap is filled by the "design" vs "coding" levels is a spectrum.
I think I see what you're saying and if so we're talking past each other a bit and I agree with what you're saying as well.
The point I was raising is by the time an IC developer sees something, there's already been a process of curation that happens that frames the possible solutions & constrains branch points. This is different from saying that an IC makes 0 implementation decisions. The C-suite has set a direction. A product manager has defined the shape of the solution. A tech lead, architect, or whatever may have further limited scope. And any of these could just already be in effect at a global scale or on the specific problem at hand. Then the IC picks up the work and proceeds to make the last mile decisions. And it's turtles all the way up. At almost all levels on the career ladder, there are people above and/or upstream of you who are pre-curating your potential decision tree.
As an analogy, I once had a fresh tech lead under me where they didn't understand this. Their team became a mess. They'd introduce raw tickets straight from the PM to their team without having thought about them at all and things ground to a halt due to decision paralysis. From their perspective that's how it was always done when they were an IC in that group. The team tackled the tickets together to work out how to accomplish their goals. It took a lot of effort to convince them that what they *didn't see* was their prior tech lead narrowing down the search space a bit, and then framing the problem in a way that that made it easier for the team to move forward.
I'm on board with that framing of the process, and I see how my original formulation was too rough.
I was reacting to "We're moving to a world where the mechanical part of grinding the code is not worth much". I have the impression that in the past just mechanically grinding the code was less of a thing than it apparently is today. Guidance, sure, but not as much as seems to be common (often necessarily so) today. But I'm sure that varies with a lot of factors, not just the calendar year.
Exactly. I was channeling the stereotypical dev that says they "just want to write code". To your point they're not literally *only* writing code, but this was the sort of person/mentality I was calling out.
What it says to me is they've actively avoided what appears to be becoming the most important skills in the new world. They're likely to find themselves on the short end of the stick.
Modern human programming has devolved to nothing more than modeling problems and systems using lines of code, procedures, sub-routines and modules, utilizing a “hack it till it works”(tm) methodology.
> utilizing a “hack it till it works”(tm) methodology.
Your post describes my coding perfectly. I don't have CS training of any type, never been formally involved in software development (recently started dabbling in OSS) and never used an LLM/agent for help (do use a local SLM for autocomplete and suggestions only).
Yet I can "code." I suspect a (pre-2023ish) software developer would likely tell me "go learn to code" if i asked for review. I don't know the formal syntax people expect to see and it has organization more typical of raging dumpster fires. Doesn't mean it's not code.
I'm with you, it's constantly doing stupid shit and ignoring instructions, and I've always been responsible for determining architecture and doing the "legwork." Unless the task is so small and well defined that it's less typing to tell the LLM (and clean up its output) then i may as well just do it myself
> The people who walk up and say "sudo make me a sandwich" do not.
My personal beef is the human devs get "make me a sandwich", and the LLM superfans now suddenly know how to specify requirements. That's fine but don't look down your nose at people for not getting the same info.
This is happening now at my company where leadership won't explain what they want, won't answer questions, but now type all day into Claude and ChatGPT. Like you could have Slacked me the same info last year knuckleheads...
Absolutely. Merely being a member of the business class does not magically mean one has the ability to specify business requirements much less product specifications. These are *not* the people I'm talking about now having superpowers.
I am picturing people who blend high level engineering and product skills, ideally with business sense.
> Merely being a member of the business class does not magically mean one has the ability to specify business requirements much less product specifications
Is this not why COBOL failed? Common Business-Oriented Language sure does look much more like natural language than a lot of other code, but it could never solve the abstraction needed to do the complex things.
I don't think LLMs will ever get rid of coders. Business people can no more tell an LLM what to build than they can a team of programmers. I've long argued that the contention between the "business monkeys" and "coding monkeys" is a good one. That the former focuses on making money and the latter focuses on making a better product. The contention is good because they need each other (though I do not think the dependence is symmetric).
Maybe one day AI will get there, but I don't see how it does without achieving AGI. To break down intent. To differentiate what was asked from what was intended. To understand the depth and all the context surrounding the many little parts. To understand the needs of the culture. The needs of the users. The needs of the business. This is all quite complex and it's why the number of employees typically grows quite rapidly.
How do we move forward without asking how we got here? Why we got here? How optimizing for decades (or much longer) led us to these patters. Under what conditions makes these patterns (near) optimal? I've yet to see a good answer to how LLMs actually address this. If typing was the bottleneck I think we would have optimized in very different ways.
My intuition tells me that llm’s combined with SWE’s with really amazing fundamentals will kill the code monkeys.
And frankly? That’s the best outcome. Code monkeys (in my view that’s an individual who writes out code just to complete a jira ticket) are a liability. Not only that but each additional person you have in an org means more noise creation.
If this forces the code monkeys to level up to compete… again a good thing.
The code base should not be elongated nor complicated. I’m not even a SWE by trade, rather a CEO, and this is my preferred outcome.
I agree with your first paragraph but not the second one. In many cases it's easier for me to directly write the code that satisfies the unwritten acceptance criteria I have in my head than to write those criteria down in English, have an LLM turn them into code, and then have to carefully review that code to see if I forgot some detail that changes everything.
> easier for me to directly write the code that satisfies the unwritten acceptance criteria I have in my head than to write those criteria down in English
Yes, and for team or company code, "there's the problem".
Those acceptance criteria are guardrails for the change that comes after, and getting those out of your head into English is more important over the long haul than your undocumented short-term solution to the criteria.
Virtually all teams — because virtually all PgMs, PjMs, TLs, and Devs — miscalculate this.
Easier for you, not better for team or firm.
• • •
FWIW, perpetuation of this problem isn't really a fault of culture or skill or education. It's largely thanks to "leadership" having no idea how to correctly incentivize what the outcome should holistically be, as they don't know enough to know what long-haul good looks like.
FWIW, you can make that easier for them by having the LLM derive your acceptance criteria into English (based not only on code but on your entire conversation+iteration history) and write that up, which you can read and correct, after the countless little iterations you made since your head-spec wasn't as concrete as you imagined before you started iterating.
Even if you refuse to do spec driven development, LLMs can do development-driven spec. You can review that, you must correct it, and then ... Change can come after more easily — thanks to that context.
> Those acceptance criteria are guardrails for the change that comes after, and getting those out of your head into English is more important over the long haul than your undocumented short-term solution to the criteria.
I have a lot of context about the system/codebase inside my head. 99.9% of it is not relevant to the specific task I need to do this week. The 0.1% that is relevant to this task is not relevant to other tasks that I or my teammates will need to do next week.
You're suggesting that I write down this particular 0.1% in some markdown file so that LLM can write the code for me, instead of writing the code myself (which would have been faster). Chances are, nobody is going to touch that particular piece of code again for a long time. By the time they do, whatever I have written down is likely out of date, so the long term benefit of writing everything down disappears.
> after the countless little iterations you made since your head-spec wasn't as concrete as you imagined before you started iterating.
That's exactly the point. If I need to iterate on the spec anyway, why would I use an intermediary (LLM) instead of just writing the code myself?
> getting those out of your head into English is more important over the long haul than your undocumented short-term solution to the criteria.
I think there may be miscommunication going on, or I may be misreading the conversation. What I do not know is what valicord means by "satisfies the unwritten acceptance criteria".
In one interpretation, I think they make a ton of sense. We invented formal languages to solve precisely this problem. The precision and pedantic nature of formal languages (like math and code[0]) is to solve ambiguity. If this is the meaning, then yes, code is far more concise and clear[1] than a natural language. That's why we invented formal languages after all. So they may be having trouble converting it to English because they are unsatisfied with the (lack of) precision and verbosity. That when they are more concise that people are interpreting it incorrectly, which is only natural. Natural languages' advantage is their flexibility, but that's their greatest disadvantage too. Everything is overloaded.
But on the other hand, if they are saying that they are unable to communicate the basics (it seems you have read in this way) then I agree with you. Being able to communicate your work is extremely important. I am unsure if it is more important than ever, but it is certainly a critical skill. But then we still have the ambiguous question of "to who?" The type of writing one does significantly differs depending on the audience.
Only valicord can tell us[edit], but I think we're just experiencing the ambiguity that makes natural languages so great and so terrible. I think maybe more important than getting the words out of ones head is to recognize the ambiguity in our language. As programmers this should be apparent, as we often communicate in extremely precise languages. But why I'd say it is more important than ever is because the audience is more diverse than ever. I'd wager a large number of arguments on the internet occur simply due to how we interpret one another's words. The obvious interpretation for one is different for another.
[0] Obviously there's a spectrum with code. C is certainly more formal than Python and thus less ambiguous.
[1] Clear != easy to understand. Or at least not easy to understand by everyone. This is a skill that needs training.
[edit] Reading their response, I think it is the first interpretation.
This is the point I'm raising. I agree with you, but what I'm saying is I think the skillset you describe is the next on the chopping block.
The acquaintances of mine who are absolutely *killing* it with these tools are very experienced, technically minded, product managers. They have an intimate knowledge of how to develop business requirements and how to convert them into high level technical specifications. They have enough technical knowledge to understand when someone is bullshitting them, and what the search space for the problem should be. Historically these people would lead teams of engineers to develop for them, and now they're sitting down and having LLMs crank out what they want in an afternoon. They no longer need engineers at all.
My contention is that people with that sort of skillset will have an advantage due to their experience with skills like finding product fit, identifying user needs, and defining business requirements.
Of course, the people I'm talking about were already killing it in the old paradigm too. I'll admit it's a bit of a unicorn skillset I'm describing.
It's almost as if architecture and code quality mattered just as before and that those who don't know proper engineering principles and problem decomposition will not succeed with these new tools.
If you're using words in English to "tell it in painstaking detail", you're doing it the hard way.
You can provide it with a tool to do that. Agents run tools in a loop, give it good tools. We have linters, code analysers, fuzzers and everything else.
Configure them correctly, tell the agent to use them (in painstaking detail) and it can't mess things up.
As far as I've ever heard, "le code" used in a codebase is uncountable, like "le café" you'd put in a cup, so we would still say "meilleur que tout le code que j'ai vu en 20 ans" and not "meilleur que tous les codes que j'ai vus en 20 ans".
There is a countable "code" (just like "un café" is either a place, or a cup of coffee, or a type of coffee), and "un code" would be the one used as a password or secret, as in "j'ai utilisé tous les codes de récupération et perdu mon accès Gmail" (I used all the recovery codes and lost Gmail access).
I got curious and had to fire up the ol LLM to find out what the story is about the words that aren't pluralized - TIL about countable and uncountable nouns. I wonder if the guy giving you trouble about your English speaks French.
I speak Russian and some English, but the question was about universal quantification: author declares that LLMs generate code of better quality than "any codes" he seen in his career.
I'm native French and nobody would consider code countable. "codes" makes no sense. We'd talk about "lines of code" as a countable in French just like in English.
Codes is a proper grammatical word in English, but we don’t use it in reference to general computer programming.
You can for example have two different organizations with different codes of conduct.
There is though nothing technically wrong with seeing each line of code as an complete individual code and referring to then multiple of them as codes.
You'll find, at times, that those communicating in a language that's not their primary language will tend to deviate from what one whose it was their primary language might expect.
If that's obvious to you than you're just being rude. If it's not obvious to you, then you'll also find this is a common deviance (plural 'code') from those who come from a particular primary language's region.
Edit; This got me thinking - what is the grammar/rule around what gets pluralized and what doesn't? How does one know that "code" can refer to a single line of code, a whole file of code, a project, or even the entirety of all code your eyes have ever seen without having to have an s tacked on to the end of it?
"Codes" as a way to refer to programs/libraries is actually common usage in academia and scientific programming, even by native English speakers. I believe, but am not sure, that it may just be relatively old jargon, before the use of "programs" became more common in the industry.
As for the grammar rule, it's the question of whether a word is countable or uncountable. In common industry usage, "code" is an uncountable noun, just like "flour" in cooking (you say 2 lines of code, 1 pound of flour).
It's actually pretty common for the same word to have both countable and uncountable versions, with different, though related, meanings. Typically the uncountable version is used with a measure of quantity, while the countable version denotes different kinds (flours - different types of flour; peoples - different groups of people).
> Typically the uncountable version is used with a measure of quantity, while the countable version denotes different kinds (flours - different types of flour; peoples - different groups of people).
This was very helpful, thank you! (I had just gotten off the phone with Claude learning about countable and uncountable nouns but those additional details you provided should prove quite valuable)
> what is the grammar/rule around what gets pluralized and what doesn't? How does one know that "code" can refer to a single line of code, a whole file of code, a project, or even the entirety of all code your eyes have ever seen without having to have an s tacked on to the end of it?
Well, the grammar is that English has two different classes of noun, and any given noun belongs to one class or the other. Standard terminology calls them "mass nouns" and "count nouns".
The distinction is so deeply embedded in the language that it requires agreement from surrounding words; you might compare many [which can only apply to count nouns] vs much [only to mass nouns], or observe that there are separate generic nouns for each class [thing is the generic count noun; stuff is the generic mass noun].
For "how does one know", the general concept is that count nouns refer to things that occur discretely, and mass nouns refer to things that are indivisible or continuous, most prototypically materials like water, mud, paper, or steel.
Where the class of a noun is not fixed by common use (for example, if you're making it up, or if it's very rare), a speaker will assign it to one class or the other based on how they internally conceive of whatever they're referring to.
FWIW, I've noticed that scientists (native English speakers at least) will say "codes" rather "code". I don't know if this is universal or just specific domains (physics) nor if this is common or rare, but I've noticed it.
Uhuh. Let me present you Rudolph. For the next 15 minutes, he will paste pieces of top rated SO answers and top starred GH repos. Then he will suffer complete amnesia.
He might not understand your question or remember what he just did, but the code he pastes is higher quality than any codes you have seen in your 20 years in F500! For 20$ a month, he's all yours, he just needs a 4 hour break every 5 hours. But he runs on money, like gumball machine, so you can wake him with a donation.
Oh, you are responsible for giving him precise instructions, that he often ignores in favour of other instructions from uncle Sam. No, you can't see them.
If you a) know what you are doing and b) know what an llm is capable of doing, c) can manage multiple llm agents at a time, you can be unbelievably productive. Those skills I think are less common than people assume.
You need to be technical, have good communication skills, have big picture vision, be organized, etc. If you are a staff level engineer, you basically feel like you don’t need anyone else.
OTOH i have been seeing even fairly technical engineering managers struggle because they can’t get the LLMs to execute because they don’t know how to ask it what to do.
it's like that '11 rules for showrunning' doc where you need to operate at a level where you understand the product being made, and the people making it, and their capabilities, in order to make things come out well without touching them directly.
if you can do every job + parallelize + read fast, and you are only limited by the time it takes to type, claude is remarkable. I'm not superhuman in those ways but in the small domains where I am it has helped a lot; in other domains it has ramped me to 'working prototype' 10x faster than I could have alone, but the quality of output seems questionable and I'm not smart enough to improve it
How is that supposed to work? Humans are notoriously poor at multi-tasking. If you spend all day context switching between agents you’re going to have a bad time.
For me, I'll do the engineering work of designing a system, then give it the specific designs and constraints. I'll let it plan out the implementation, then I give it notes if it varies in ways I didn't expect. Once we agree on a solution, that's when I set it free. The frontier models usually do a pretty good job with this work flow at this point.
Really? Because this perfectly explains why it will never replace them: it needs an exact language listing everything required to function as you expect it.
> If you ask to unify the duplication, it'll say "No problem, here's a brand new metamock abstract adapter framework that has a superset of all feature sets, plus two new metamock drivers for the older and the newer code! Let me know if you want me to write tests for the new adapters."
Nevermind the fact that it only migrated 3 out of 5 duplicated sections, and hasn’t deleted any now-dead code.
It's not reality. I'm really not a fan of the way that people excuse the really terrible code LLMs write by claiming that people write code just as bad. Even if that were true, it is not true that when you ask those people to do otherwise they simply pretend to have done it and forget you asked later.
Yes and both are right. It’s a matter of which is working as expected and making fewer mistakes more often. And as someone using Claude Code heavily now, I would say we’re already at a point where AI wins.
> it is not true that when you ask those people to do otherwise they simply pretend to have done it and forget you asked later.
I had a coworker that more or less exactly did that. You left a comment in a ticket about something extra to be done, he answered "yes sure" and after a few days proceeded to close the ticket without doing the thing you asked. Depending on the quantity of work you had at the moment, you might not notice that until after a few months, when the missing thing would bite you back in bitter revenge.
You may have had one. It clearly made a pretty negative impression on you because you are still complaining about them years later. I find it pretty misanthropic when people ascribe this kind of antisocial behavior to all of their coworkers.
It's still relatively recent. Anyway I'm not saying everyone is like this, absolutely (not even an important chunk), but they do exist.
At the same time it's not true that current LLMs only write terrible code.
"Even if that were true, it is not true that when you ask those people to do otherwise they simply pretend to have done it and forget you asked later."
The point is, that's not the typical experience and people like that can be replaced. We don't willingly bring people like that on our teams, and we certainly don't aim to replace entire teams with clones of this terrible coworker prototype.
Not only have i never had a coworker as bad as these people describe, the point is as you say: why would I want an LLM that works like these people's shitty coworkers?
My worst coworkers right now are the ones using Claude to write every word of code and don't test it. These are people who never produced such bad code on their own.
So the LLMs aren't just as bad as the bad coworkers, they're turning good coworkers into bad ones!
Couple of reasons, but mainly speed and avaiability.
I can give Claude a job anytime and it will do it immediately.
And yes, I will have to double check anything important, but I am way better and faster at checking, than doing it myself.
So obviously I don't want a shitty LLM as coworker, but a competent one. But the progress they made is pretty astonishing and they are good enough now that I started really integrating them.
In the long run, good code makes everyone much happier than code that is bad because people are being "nice" and letting things slide in code review to avoid confrontation.
Maybe, but it lets them pump out much, much more code than they otherwise would have been able to. That's the "100x" in their AI productivity multipliers.
My sense is that the code generation is fast, but then you always need to spend several hours making sure the implementation is appropriate, correct, well tested, based on correct assumptions, and doesn't introduce technical debt.
You need to do this when coding manually as well, but the speed at which AI tools can output bad code means it's so much more important.
Well when you write it manually you are doing the review and sanity checking in real time. For some tasks, not all but definitely difficult tasks, the sanity checking is actually the whole task. The code was never the hard part, so I am much more interested in the evolving of AIs real world problem solving skills over code problems.
I think programming is giving people a false impression on how intelligent the models are, programmers are meant to be smart right so being able to code means the AI must be super smart. But programmers also put a huge amount of their output online for free, unlike most disciplines, and it's all text based. When it comes to problem solving I still see them regularly confused by simple stuff, having to reset context to try and straighten it out. It's not a general purpose human replacement just yet.
Set the boundaries and guidelines before it starts working. Don't leave it space to do things you don't understand.
ie: enforce conventions, set specific and measurable/verifiable goals, define skeletons of the resulting solutions if you want/can.
To give an example. I do a lot of image similarity stuff and I wanted to test the Redis VectorSet stuff when it was still in beta and the PHP extension for redis (the fastest one, which is written in C and is a proper language extension not a runtime lib) didn't support the new commands. I cloned the repo, fired up claude code and pointed it to a local copy of the Redis VectorSet documentation I put in the directory root telling it I wanted it to update the extension to provide support for the new commands I would want/need to handle VectorSets. This was, idk, maybe a year ago. So not even Opus. It nailed it. But I chickened out about pushing that into a production environment, so I then told it to just write me a PHP run time client that mirrors the functionality of Predis (pure-php implementation of redis client) but does so via shell commands executed by php (lmao, I know).
Define the boundaries, give it guard rails, use design patterns and examples (where possible) that can be used as reference.
They aren't holding it wrong, it's a fundamental limitation of not writing the code yourself. You can make it easier to understand later when you review it, but you still need to put in that effort.
Work in smaller parts then. You should have a mental model of what the code is doing. If the LLM is generating too much you’re being too broad. Break the problem down. Solve smaller problems.
You are correct but developers are not yet ready to face it. The argument you'll always get is the flawed premise that it's less effort to write it yourself (While the same people work in teams that have others writing code for them every day of the week).
So in my experience with Opus 4.6 evaluating it in an existing code base has gone like this.
You say "Do this thing".
- It does the thing (takes 15 min). Looks incredibly fast. I couldn't code that fast. It's inhuman. So far all the fantastical claims hold up.
But still. You ask "Did you do the thing?"
- it says oops I forgot to do that sub-thing. (+5m)
- it fixes the sub-thing (+10m)
You say is the change well integrated with the system?
- It says not really, let me rehash this a bit. (+5m)
- It irons out the wrinkles (+10m)
You say does this follow best engineering practices, is it good code, something we can be proud of?
- It says not really, here are some improvements. (+5m)
- It implements the best practices (+15m)
You say to look carefully at the change set and see if it can spot any potential bugs or issues.
- It says oh, I've introduced a race condition at line 35 in file foo and an null correctness bug at line 180 of file bar. Fixing. (+15m)
You ask if there's test coverage for these latest fixes?
- It says "i forgor" and adds them. (+15m)
Now the change set has shrunk a bit and is superficially looking good. Still, you must read the code line by line, and with an experienced eye will still find weird stuff happening in several of the functions, there's redundant operations, resources aren't always freed up. (60m)
You ask why it's implemented in such a roundabout way and how it intends for the resources to be freed up?
- It says "you're absolutely right" and rewrites the functions. (+15m)
You ask if there's test coverage for these latest fixes?
- It says "i forgor" and adds them. (+15m)
Now the 15 minutes of amazingly fast AI code gen has ballooned into taking most of the afternoon.
Telling Claude to be diligent, not write bugs, or to write high quality code flat out does not work. And even if such prompting can reduce the odds of omissions or lapses, you still always always always have to check the output. It can not find all the bugs and mistakes on its own. If there are bugs in its training data, you can assume there will be bugs in its output.
(You can make it run through much of this Socratic checklist on its own, but this doesn't really save wall clock time, and doesn't remove the need for manual checking.)
I've had very consistent success with plan mode, but when I haven't I've noticed many times it's been working with code/features/things that aren't well defined. ie: not using a well defined design pattern, maybe some variability in the application on how something could be done - these are the things I notice it really trips up on. Well defined interfaces, or even specifically telling it to identify and apply design principles where it seems logical.
When I've had repeated issues with a feature/task on existing code often times it really helps to first have the model analyze the code and recommend 'optimizations' - whether or not you agree/accept, it'll give you some insight on the approach it _wants_ to take. Adjust from there.
Ok so here are the actual course corrections I had to make to push through a replacement implementation of a btree.
Note that almost all of the problems aren't with the implementation, it basically one shot that. Almost all the issues are with integrating the change with the wider system.
"The btree library is buggy, and inefficient (using mmap, a poor design idea). Can you extract it to an interface, and then implement a clean new version of the interface that does not use mmap? It should be a balanced btree. Don't copy the old design in anything other than the interface. Look at how SkipListReader and SkipListWriter uses a BufferPool class and use that paradigm. The new code should be written from scratch and does not need to be binary compatible with the old implementation. It also needs extremely high test coverage, as this is notoriously finnicky programming."
"Let's move the old implementation to a separate package called legacy and give them a name like LegacyBTree... "
"Let's add a factory method to the interfaces for creating an appropriate implementation, for the writer based on a system property (\"index.useLegacyBTree\"), and for the reader, based on whether the destination file has the magic word for the new implementation. The old one has no magic word."
"Are these changes good, high quality, good engineering practices, in line with known best practices and the style guide?"
"Yeah the existing code owns the lifetime of the LongArray, so I think we'd need larger changes there to do this cleanly. "
"What does WordLexicon do? If it's small, perhaps having multiple implementations is better"
"Yes that seems better. Do we use BTrees anywhere else still?"
"There should be an integration test that exercises the whole index construction code and performs lookups on the constructed index. Find and run that."
"That's the wrong test. It may just be in a class called IntegrationTest, and may not be in the index module."
"Look at the entire change set, all unstaged changes, are these changes good, high quality, good engineering practices, in line with known best practices and the style guide?"
"Remove the dead class. By the way, the size estimator for the new btree, does it return a size that is strictly greater than the largest possible size? "
"But yeah, the pool size is very small. It should be configurable as a system property. index.wordLexiconPoolSize maybe. Something like 1 GB is probably good."
"Can we change the code to make BufferPool optional? To have a version that uses buffered reads instead?"
"The new page source shoud probably return buffers to a (bounded) free list when they are closed, so we can limit allocation churn."
"Are these latest changes good, high quality, good engineering practices, in line with known best practices and the style guide?"
"Yes, all this is concurrent code so it needs to be safe."
"Scan the rest of the change set for concurrency issues too."
"Do we have test coverage for both of the btree reader modes (bufferpool, direct)?"
"Neat. Think carefully, are there any edge cases our testing might have missed? This is notoriously finnicky programming, DBMSes often have hundreds if not thousands of tests for their btrees..."
"Any other edge cases? Are the binary search functions tested for all corner cases?"
"Can you run coverage for the tests to see if there are any notable missing branches?"
"Nice. Let's lower the default pool size to 64 MB by the way, so we don't blow up the Xmx when we run tests in a suite."
"I notice we're pretty inconsistent in calling the new B+-tree a B-Tree in various places. Can you clean that up?"
"Do you think we should rename these to reflect their actual implementation? Seems confusing the way it is right now."
"Can you amend the readme for the module to describe the new situation, that the legacy modules are on the way out, and information about the new design?"
"Add a note about the old implemenation being not very performant, and known to have correctness issues."
"Fix the guice/zookeeper issues before proceeding. This is a broken window."
"It is pre-existing, let's ignore it for now. It seems like a much deeper issue, and might inflate this change scope."
"Let's disable the broken test, and add a comment explaining when and any information we have on what may or may not cause the issue."
"What do you think about making the caller (IndexFactory) decide which WordLexicon backing implementation to use, with maybe different factory methods in WordLexicon to facilitate?"
"I'm looking at PagedBTreeReader. We're sometimes constructing it with a factory method, and sometimes directly. Would it make sense to have a named factory method for the \"PagedBTreeReader(Path filePath, int poolSize)\" case as well, so it's clearer just what that does?"
"There's a class called LinuxSystemCalls. This lets us do preads on file descriptors directly, and (appropriately) set fadviseRandom(). Let's change the channel backed code to use that instead of FileChannels, and rename it to something more appropriate. This is a somewhat big change. Plan carefully."
"Let's not support the case when LinuxSystemCalls.isAvailable() is false, the rest of the index fails in that scenario as well. I think good names are \"direct\" (for buffer pool) and \"buffered\" (for os cached), to align with standard open() nomenclature."
"I'm not a huge fan of PreadPageSource. It's first of all named based on who uses it, not what it does. It's also very long lived, and leaking memory whenever the free list is full. Let's use Arena.ofAuto() to fix the latter, and come up with a better name. I also don't know if we'll ever do unaligned reads in this? Can we verify whether that's ever actually necessary?"
"How do we decide whether to open a direct or buffered word lexicon?"
"I think this should be a system property. \"index.wordLexicon.useBuffered\", along with \"index.wordLexicon.poolSizeBytes\" maybe?"
"Is the BufferPoolPageSource really consistent with the rest of the nomenclature?"
"Are there other inconsistencies in naming or nomenclature?"
The same as asking one of your JRs to do something except now it follows instructions a little bit better. Coding has never been about line generation and now you can POC something in a few hours instead of a few days / weeks to see if an idea is dumb.
Yeah. Due diligence is exponentially more important with something like Claude because it is so fast. Get lazy for a few hours and you've easily added 20K LOC worth of technical debt to your code base, and short of reverting the commits and starting over, it'll not be easy to get it to fix the problems after the fact.
It's still pretty fast even considering all the coaxing needed, but holy crap will it rapidly deteriorate the quality of a code base if you just let it make changes as it pleases.
It very much feels like how the most vexing enemy of The Flash is like just some random ass banana peel on the road. Raw speed isn't always an asset.
The cost of reverting the commits and starting over is not so high though. I find it is really good for prototyping ideas that you might not have tried to do previously.
It's cheap only if this happens shortly after the bad design mistakes, and there aren't other changes on top of them. Bad design decisions ossify fairly quickly in larger projects with multiple contributors outputting large volumes of code. Claude Code's own "game engine" rendering pipeline[1] is a good example of an almost comically inappropriate design that's likely to be some work to undo now that it's set.
If a human dropped a PR on me that took "several hours" to go through (10k+ lines or non-trivial changes), I'd jump in my car and drive to the office just to specifically slap them on the back of the head ffs.
I'd highly recommend working top down, getting it to outline a sane architecture before it starts coding. Then if one of the modules starts getting fouled up, start with a clean sheet context (for that module) incorporating any cautions or lessons learned from the bad experience. LLMs are not yet good at working and reworking the same code, for the reasons you outline. But they are pretty good at a "Groundhog Day" approach of going through the implementation process over and over until they get it right.
+1 if you are vibe coding projects from scratch. if the architecture you specify doesn't make sense, the llm will start struggling, the only way out of their misery is mocking tests. the good thing is that a complete rewrite with proper architecture and lessons learned is now totally affordable.
Not trying to be snarky, with all due respect... this is a skill issue.
It's a tool. It's a wildly effective and capable tool. I don't know how or why I have such a wildly different experience than so many that describe their experiences in a similar manner... but... nearly every time I come to the same conclusion that the input determines the output.
> If they implement something with a not-so-great approach, they'll keep adding workarounds or redundant code every time they run into limitations later.
Yes, when the prompt/instructions are overly broad and there's no set of guardrails or guidelines that indicate how things should be done... this will happen. If you're not using planning mode, skill issue. You have to get all this stuff wrapped up and sorted before the implementation begins. If the implementation ends up being done in a "not-so-great" approach - that's on you.
> If you tell them the code is slow
Whew. Ok. You don't tell it the code is slow. Do you tell your coworker "Hey, your code is slow" and expect great results? You ask it to benchmark the code and then you ask it how it might be optimized. Then you discuss those options with it (this is where you do the part from the previous paragraph, where you direct the approach so it doesn't do "no-so-great approach") until you get to a point where you like the approach and the model has shown it understands what's going on.
Then you accept the plan and let the model start work. At this point you should have essentially directed the approach and ensured that it's not doing anything stupid. It will then just execute, it'll stay within the parameters/bounds of the plan you established (unless you take it off the rails with a bunch of open ended feedback like telling it that it's buggy instead of being specific about bugs and how you expect them to be resolved).
> you can have 10 bespoke tests for every bug. Plus a new mocking framework created every time the last one turns out to be unfit for purpose.
This is an area I will agree that the models are wildly inept. Someone needs to study what it is about tests and testing environments and mocking things that just makes these things go off the rails. The solution to this is the same as the solution to the issue of it keeping digging or chasing it's tail in circles... Early in the prompt/conversation/message that sets the approach/intent/task you state your expectations for the final result. Define the output early, then describe/provide context/etc. The earlier in the prompt/conversation the "requirements" are set the more sticky they'll be.
And this is exactly the same for the tests. Either write your own tests and have the models build the feature from the test or have the model build the tests first as part of the planned output and then fill in the functionality from the pre-defined test. Be very specific about how your testing system/environment is setup and any time you run into an issue testing related have the model make a note about that and the solution in a TESTING.md document. In your AGENTS.md or CLAUDE.md or whatever indicate that if the model is working with tests it should refer to the TESTING.md document for notes about the testing setup.
Personally, I focus on the functionality, get things integrated and working to the point I'm ready to push it to a staging or production (yolo) environment and _then_ have the model analyze that working system/solution/feature/whatever and write tests. Generally my notes on the testing environment to the model are something along the lines of a paragraph describing the basic testing flow/process/framework in use and how I'd like things to work.
The more you stick to convention the better off you'll be. And use planning mode.
> Whew. Ok. You don't tell it the code is slow. Do you tell your coworker "Hey, your code is slow" and expect great results?
Yes? Why don't you?
They are capable people that just didn't notice something, id I notice some telemetry and tell them "hey this is slow" they are expected to understand the reason(s).
So, you observed some telemetry - which would have been some sort of specific metric, right? Wouldn't you communicate that to them as well, not just "it's slow"?
"Hey, I saw that metric A was reporting 40% slower, are you aware already or have any ideas as to what might be causing that?"
Those two approaches are going to produce rather distinctly different results whether you're speaking to a human or typing to a GPU.
Yeah if my co-worker can't start figuring out why the code is slow, with a reasonable reference to what the code in question is, that is a knock against their skills. I would actually expect some ideas as to what the problem is just off the top of their heads, but that the coding agent can't do that isn't a hit against it specifically, this is now a good part of what needs to be done differently.
The suggestion to tell the agent to do performance analysis of the part of the code you think is problematic, and offer suggestions for improvements seems like the proper way to talk to a machine, whereas "hey your code is slow" feels like the proper way to talk to a human.
As someone who leads a team of engineers, telling someone their code is slow is not nice, helpful or something a good team member should do. It’s like telling them there’s a bug and not explaining what the bug is. Code can be slow for infinite reasons, maybe the input you gave is never expected and it’s plenty fast otherwise. Or the other dev is not senior enough to know where problems may be. It can be you when I tell you your OOP code is super slow, but you only ever done OOP and have no idea how to put data in a memory layouts that avoids cpu cache misses or whatever.
So no that’s not the proper way to talk to humans.
And AI is only as good as the quality of what you’re asking. It’s a bit like a genie, it will give you what you asked , not what you actually wanted. Are you prepared for the ai to rewrite your Python code in C to speed it up? Can it just add fast libraries to replace the slow ones you had selected? Can it write advanced optimization techniques it learned about from phd thesis you would never even understand?
>As someone who leads a team of engineers, telling someone their code is slow is not nice, helpful or something a good team member should do
right, I'm sure there are all sorts of scenarios where that is the case and probably the phrasing would be something like that seems slow, or it seems to be taking longer than expected or some other phrasing that is actually synonymous with the code is slow. On the other hand there are also people that you can say the code is slow to, and they won't worry about it.
>So no that’s not the proper way to talk to humans
In my experience there are lots of proper ways to talk to humans, and part of the propriety is involved with what your relationship with them is. so it may be the proper way to talk to a subset of humans, which is generally the only kinds of humans one talks to - a subset. I certainly have friends that I have worked to for a long time who can say "what the fuck were you thinking here" or all sorts of things that would not be nice if it came from other people but is in fact a signifier of our closeness that we can talk in such a way. Evidently you have never led a team with people who enjoyed that relationship between them, which I think is a shame.
Finally, I'll note that when I hear a generalized description of a form of interaction I tend to give what used to be called "the benefit of a doubt" and assume that, because of the vagaries of human language and the necessity of keeping things not a big long harangue as every communication must otherwise become in order to make sure all bases of potential speech are covered, that the generalized description may in fact cover all potential forms of polite interaction in that kind of interaction, otherwise I should have to spend an inordinate amount of my time lecturing people I don't know on what moral probity in communication requires.
But hey, to each their own.
on edit: "the what the fuck were you thinking here" quote is also an example of a generalized form of communication that would be rude coming from other people but was absolutely fine given the source, and not an exact quote despite the use of quotation marks in the example.
A normal human conversation would specify which code/tasks/etc., how long it's currently taking, how much faster it needs to be, and why. And then potentially a much longer conversation about the tradeoffs involved in making in faster. E.g. a new index on the database that will make it gigabytes larger, a lookup table that will take up a ton more memory, etc. Does the feature itself need to be changed to be less capable in order to achieve the speed requirements?
If someone told me "hey your code is slow" and walked away, I'd just laugh, I think. It's not a serious or actionable statement.
Well, I would say something like "We seem to be having some performance issues the business has noticed in the XYZ stuff. Shall we sit down together and see if we can work out if we can improve things?"
There was a 20+ person team of well paid, smart (mostly Java) programmers that dealt for months with slow application they were building, that everyone knew was slow. I nagged them for weeks to set up indexes even for small, 100 row tables. Once they did things started running orders of magnitude faster.
Your expectations for people (and LLMs) are way too high.
Great answer, and the reason some people have bad experiences is actually patently clear: they don’t work with the AI as a partner, but as a slave. But even for them, AI is getting better at automatically entering planning mode, asking for clarification (what exactly is slow, can you elaborate?), saying some idea is actually bad (I got that a few times), and so on… essentially, the AI is starting to force people to work as a partner and give it proper information, not just tell them “it’s broken, fix it” like they used to do on StackOverflow.
My comment was a summary of the situation, not literal prompts I use. I absolutely realize the work needs to be adequately described and agents must be steered in the right direction. The results also vary greatly depending on the task and the model, so devs see different rates of success.
On non-trivial tasks (like adding a new index type to a db engine, not oneshotting a landing page) I find that the time and effort required to guide an LLM and review its work can exceed the effort of implementing the code myself. Figuring out exactly what to do and how to do it is the hard part of the task. I don't find LLMs helpful in that phase - their assessments and plans are shallow and naive. They can create todo lists that seemingly check off every box, but miss the forest for the trees (and it's an extra work for me to spot these problems).
Sometimes the obvious algorithm isn't the right one, or it turns out that the requirements were wrong. When I implement it myself, I have all the details in my head, so I can discover dead-ends and immediately backtrack. But when LLM is doing the implementation, it takes much more time to spot problems in the mountains of code, and even more effort to tell when it's a genuinely a wrong approach or merely poor execution.
If I feed it what I know before solving the problem myself, I just won't know all the gotchas yet myself. I can research the problem and think about it really hard in detail to give bulletproof guidance, but that's just programming without the typing.
And that's when the models actually behave sensibly. A lot of the time they go off the rails and I feel like a babysitter instructing them "no, don't eat the crayons!", and it's my skill issue for not knowing I must have "NO eating crayons" in AGENTS.md.
If I was on the "replace all the meatsacks AGI ftw" team then I would have referred to it as an oracle, by your own logic, wouldn't I have?
It's a tool. It's good for some things, not for others. Use the right tool for the job and know the job well enough to know which tools apply to which tasks.
More than anything it's a learning tool. It's also wildly effective at writing code, too. But, man... the things that it makes available to the curious mind are rather unreal.
I used it to help me turn a cat exercise wheel (think huge hamster wheel) into a generator that produces enough power to charge a battery that powers an ESP32 powered "CYD" touchscreen LCD that also utilizes a hall effect sensor to monitor, log and display the RPMs and "speed" (given we know the wheel circumference) in real time as well as historically.
I didn't know anything about all this stuff before I started. I didn't AGI myself here. I used a learning tool.
But keep up with your schtick if that's what you want to do.
Oracles have their use too, but as long as you keep confusing "oracle" and "tool" you will get nowhere.
P.S. The real big deal is the democratization of oracles. Back in the day building an oracle was a megaproject accessible only to megacorps like Google. Today you can build one for nothing if you have a gaming GPU and use it for powering your kobold text adventure session.
>I used it to help me turn a cat exercise wheel (think huge hamster wheel) into a generator that produces enough power to charge a battery that powers an ESP32 powered "CYD" touchscreen LCD that also utilizes a hall effect sensor to monitor, log and display the RPMs and "speed" (given we know the wheel circumference) in real time as well as historically.
So what? That's honestly amateur hour. And the LLM derived all of it from things that have been done and posted about a thousand times before.
You could have achieved the same thing with a few google searches 15 years ago (obviously not with ESP32, but other microcontrollers).
Right - it's not a big deal and it LITERALLY is amateur hour. But I did it. I wouldn't have done it prior, sure I could have done a bunch of google searches but the time investment it would have taken to sift through all that information and distill it into actionable chunks would have far exceeded the benefit of doing so, in this case.
The whole point is that it is amateur hour and it's wildly effective as a learning tool.
The fact it derived everything from things that have been done... yea, that's also the point? What point are you trying to make here? I'm well aware it's not a great tool if you're trying to use it to create novel things... but I'm not a nuclear physicist. I'm a builder, fixer, tinkerer who happens to make a living writing code. I use it to teach me how to do things, I use it to analyze problems and recommend approaches that I can then delve into myself.
I'm not asking it to fold proteins. (I guess that's been done quite a bit too, so would be amateur as well)
>The whole point is that it is amateur hour and it's wildly effective as a learning tool.
You sound so proud of your accomplishment, and I question if there's really nothing to be proud of here. I doubt you really learned anything, a machine told you what to do and you did it, like coloring by numbers - it doesn't make you an artist. You won't be able to build upon it, without asking the machine to do more of the thinking for you. And I think that's kind of sad.
>I'm a builder, fixer, tinkerer who happens to make a living writing code
I have to doubt that. If you were all those things, you would have been able to complete that project with very little effort, and without a machine telling you what to do.
OP was writing how great the LLM is, and that he couldn't do this stuff as easily before LLMs. And that simply isn't true.
Instead of breaking down the task himself into achievable steps, the LLM did that "thinking" for him. This will inevitably lead to atrophy of the brain. If you don't exercise your brain, and let the tin-can tell you what to do, you're going to get pretty dull. It's well known that keeping your brain active, solving problems, will keep your mental abilities strong. Using LLMs is the opposite of that.
lmao - I'm not at all proud of what you called an accomplishment. I literally said it _is_ amateur hour, it's hacked together, not safe, not stylish, not well engineered. But it does work. And despite your assumption about me learning anything - I had _no idea_ how generators worked. The realization that spinning an electric motor would result in electricity being produced blew my mind and got me asking claude things related to that, then I wanted to interface a wheel against my wheel to spin a stepper motor to get a charge and had the hair brain idea to just make the whole thing the generator instead. None of this was stuff I knew.
Despite this thing I made being rather useless in the grand scheme of things it was _wildly_ illuminating in terms of my understanding of electricity and the various objects around me and how they function. Which has spurred another rabbit hole that is having _real measurable effect_ for a host of feral cats to live a more comfortable life. (Not the wheel generator thing)
> a machine told you what to do and you did it, like coloring by numbers - it doesn't make you an artist.
I never claimed to be an artist ;) And, maybe it's different for you, but someone or something showing me how to do something is quite literally the best way for me to learn. /shrug
> I have to doubt that. If you were all those things, you would have been able to complete that project with very little effort, and without a machine telling you what to do.
> Do you tell your coworker "Hey, your code is slow" and expect great results? You ask it to benchmark the code and then you ask it how it might be optimized.
...Really? I think 'hey we have a lot of customers reporting the app is laggy when they do X, could you take a look' is a very reasonable thing to tell your coworker who implemented X.
> If they implement something with a not-so-great approach, they'll keep adding workarounds or redundant code every time they run into limitations later.
Are you using plan mode? I used to experience the do a poor approach and dig issue, but with planning that seems to have gone away?
I have no idea what I'm doing differently because I haven't experienced this since Opus 4.5. Even with Sonnet 4.5, providing explicit instructions along the lines of "reuse code where sensible, then run static analysis tools at the end and delete unused code it flags" worked really well.
I always watch Opus work, and it is pretty good with "add code, re-read the module, realize some pre-existing code (either it wrote, or was already there) is no longer needed and delete it", even without my explicit prompts.
I'm guessing there's a very strong prior to "just keep generating more tokens" as opposed to deleting code that needs to be overcome. Maybe this is done already but since every git project comes with its own history, you could take a notable open-source project (like LLVM) and then do RL training against against each individual patch committed.
Perhaps the problem is that you RL on one patch a time, failing to capture the overarching long term theme, an architecture change being introduced gradually over many months, that exists in the maintainer’s mental model but not really explicitly in diffs.
right, it would have to a specialized tool that you used to do analysis of codebase every now and then, or parts that you thought should be cleaned up.
Obviously there is a just keep generating more tokens bias in software management, since so many developer metrics over the years do various lines of code style analysis on things.
But just as experience and managerial programs have over time developed to say this is a bad bias for ranking devs, it should be clear it is a bad bias for LLMs to have.
I think this is in the training data since they use commit data from repos, but I imagine code deletions are rarer than they should be in the real data as well.
deleting and code cleanup is perhaps more an expression of seniority, and personal preferences. Maybe there should be the same kind style transfer with code that you see with graphical generative AI, "rewrite this code path in the style of Donald Knuth"
Yes, this is exactly the experience I have had with LLMs as a non-programmer trying to make code. When it gets too deep into the weeds I have to ask it to get back a few steps.
Yes that’s my observation too. I have to be double careful the longer they run a task. They like to hack and patch stuff even when I tell it I don’t prefer it.
The solution is to know when to use an existing solution like sqlite and when to create your own. So the biggest problem with LLMs is that they don't repel or remind you about possible consequences (too often). But if they would, I would find it even more awkward... and this is one of the reasons I prefer Claude Code over Codex.
I use the restore checkpoint/fork conversation feature in GitHub Copilot heavily because of this. Most of the time it's better to just rewind than to salvage something that's gone off track.
I feel like there's two types of LLM users. Those that understand it's limitations, and those that ask it to solve a millennium problem on the first try.
I do this all the time but then you end up with really over engineered code that has way more issues than before. Then you're back to prompting to fix a bunch of issues. If you didn't write the initial code sometimes it's difficult to know the best way to refactor it. The answer people will say is to prompt it to give you ideas. Well then you're back to it generating more and more code and every time it does a refactor it introduces more issues. These issues aren't obvious though. They're really hard to spot.
Generative AI changed the equation so much that our existing copyright laws are simply out of date.
Even copyright laws with provisions for machine learning were written when that meant tangential things like ranking algorithms or training of task-specific models that couldn't directly compete with all of their source material.
For code it also completely changes where the human-provided value is. Copyright protects specific expressions of an idea, but we can auto-generate the expressions now (and the LLM indirection messes up what "derived work" means). Protecting the ideas that guided the generation process is a much harder problem (we have patents for that and it's a mess).
It's also a strategic problem for GNU.
GNU's goal isn't licensing per se, but giving users freedom to control their software. Licensing was just a clever tool that repurposed the copyright law to make the freedoms GNU wanted somewhat legally enforceable. When it's so easy to launder code's license now, it stops being an effective tool.
GNU's licensing strategy also depended on a scarcity of code (contribute to GCC, because writing a whole compiler from scratch is too hard). That hasn't worked well for a while due to permissive OSS already reducing scarcity, but gen AI is the final nail in the coffin.
It's not a problem. If you give a work to an AI and say "rewrite this", you created a derivative work. If you don't give a work to an AI and say "write a program that does (whatever the original code does)" then you didn't. During discovery the original author will get to see the rewriter's Claude logs and see which one it is. If the rewriter deleted their Claude logs during the lawsuit they go to jail. If the rewriter deleted their Claude logs before the lawsuit the court interprets which is more likely based on the evidence.
But the AI has the work to derive from already. I just went to Gemini and said "make me a picture of a cartoon plumber for a game design".
Based on your logic the image it made me of a tubby character with a red cap, blue dungarees, red top and a big bushy mustache is not a derivative work...
(interestingly asking it to make him some friends it gave me more 'original' ideas, but asking it to give him a brother and I can hear the big N's lawyers writing a letter already...)
Except Claude was for sure trained on the original work and when asked to produce a new product that does the same thing will just spit out a (near) copy
Ok, but what if in the future I could guarantee that my generative model was not trained on the work I want to replicate. Like say X library is the only library in town for some task, but it has a restrictive license. Can I use a model that was guaranteed not trained on X to generate a new library Z that competes with X with a more permissive license? What if someone looks and finds a lot of similarities?
I think there could be a market for "permissive/open models" in the future where a company specifically makes LLM models that are trained on a large corpus of public domain or permissively licensed text/code only and you can prove it by downloading the corpus yourself and reproducing the exact same model if desired. Proving that all MIT licensed code is non-infringing is probably impossible though at that point copyright law is meaningless because everyone would be in violation if you dig deep enough.
> Generative AI changed the equation so much that our existing copyright laws are simply out of date.
Copyright laws are predicated on the idea that valuable content is expensive and time consuming to create.
Ideas are not protected by copyright, expression of ideas is.
You can't legally copy a creative work, but you can describe the idea of the work to an AI and get a new expression of it in a fraction of the time it took for the original creator to express their idea.
The whole premise of copyright is that ideas aren't the hard part, the work of bringing that idea to fruition is, but that may no longer be true!
> “Changing the equation” by boldly breaking the law.
Is it? I think the law is truly undeveloped when it comes to language models and their output.
As a purely human example, suppose I once long ago read through the source code of GCC. Does this mean that every compiler I write henceforth must be GPL-licensed, even if the code looks nothing like GCC code?
There's obviously some sliding scale. If I happen to commit lines that exactly replicate GCC then the presumption will be that I copied the work, even if the copying was unconscious. On the other hand, if I've learned from GCC and code with that knowledge, then there's no copyright-attaching copy going on.
We could analogize this to LLMs: instructions to copy a work would certainly be a copy, but an ostensibly independent replication would be a copy only if the work product had significant similarities to the original beyond the minimum necessary for function.
However, this is intuitively uncomfortable. Mechanical translation of a training corpus to model weights doesn't really feel like "learning," and an LLM can't even pinky-promise to not copy. It might still be the most reasonable legal outcome nonetheless.
> GNU's goal isn't licensing per se, but giving users freedom to control their software.
I think that's maybe misunderstanding. GNU wants everyone to be able to use their computers for the purposes they want, and software is the focus because software was the bottleneck. A world where software is free to create by anyone is a GNU utopia, not a problem.
Obviously the bigger problem for GNU isn't software, which was pretty nicely commoditized already by the FOSS-ate-the-world era of two decades ago; it's restricted hardware, something that AI doesn't (yet?) speak to.
Honestly, good. Copyright and IP law in general have been so twisted by corporations that only they benefit now, see Mickey Mouse laws by Disney for example, or patenting obvious things like Nintendo or even just patent trolling in general.
The biggest recording artist in the world right now had to re-record her early albums because she didn't own the copyright, imagine how many artists don't get that big and never have that opportunity.
That individual artists are still defending this system is baffling to me.
> The biggest recording artist in the world right now had to re-record her early albums because she didn't own the copyright, imagine how many artists don't get that big and never have that opportunity.
Not only that, but Taylor Swift only could do so because she wrote the songs herself, and therefore had the composition copyright to her songs.
Most artists that were put together by the label don't have such a luxury.
The complaint isn't that iPad is useless, but that it would be equally useful to nearly every happy iPad user if it had a few generations older CPU.
iPad works for lots of people, but the things that iPad is best for don't really need a powerful CPU.
There are few "Pro" apps that you can run to prove it's possible to run them (except for plugins, OS-level helper apps, extra hardware, background processing that doesn't randomly die, scripting more fine-grained than shortcuts, competent file browser, etc.) but you can max out the CPU for a few minutes and go back to a macbook for real work.
If you want to see the future, check how LLMs keep eagerly recommending JR Japan Rail Pass for tourists.
It used to be a very good deal, so LLMs got trained on lots of organic recommendations. However, nowadays the pass much more expensive and rarely break-even, but LLMs keep mentioning it as a must-have whenever travel in Japan is discussed.
In Rust, a Future can have only exactly one listener awaiting it, which means it doesn't need dynamic allocation and looping for an arbitrary number of .then() callbacks. This allows merging a chain of `.await`ed futures into a single state machine. You could get away with awaiting even on every byte.
> Conservatism consists of exactly one proposition, to wit: There must be in-groups whom the law protects but does not bind, alongside out-groups whom the law binds but does not protect
For this administration the law isn't something that binds them, but something they can use against others.
Jargon terms like "sum types" or "affine types" may seem complicated, but when you see it's actually "enums with data fields", it makes so much sense, and prevents plenty of state-related bugs.
Proposed "effects" mean that when you're writing an iterator or a stream, and need to handle error or await somewhere in the chain, you won't suddenly have a puzzle how to replace all of the functions in the entire chain and your call stack with their async or fallible equivalents.
"linear types" means that Rust will be able to have more control over destruction and lifetime of objects beyond sync call stack, so the tokio::spawn() (the "Rust async sucks" function) won't have to be complaining endlessly about lifetimes whenever you use a local variable.
I can't vouch for the specifics of the proposed features (they have tricky to design details), but it's not simply Rust getting more complex, but rather Rust trying to solve and simplify more problems, with robust and generalizable language features, rather than ad-hoc special cases. When it works it makes the language more uniform overall and gives a lot of bang for the buck in terms of complexity vs problems solved.
reply