More

in-silico · 2026-02-24T11:35:59 1771932959

TLDR: the author does not believe in computational functionalism

in-silico · 2026-02-24T06:30:22 1771914622

Either I'm missing something or this is way overstated.

Steerling appears to be just a discrete diffusion model where the final hidden states are passed through a sparse autoencoder (a common interpretability layer) before the LM head.

They also use a loss that aligns the SAE'S activations with labelled concepts? However, this is an example of "The Most Forbidden Technique" [1], and could make the model appear interpretable without the attributed concepts actually having causal effect on the model's decisions.

1: https://thezvi.substack.com/p/the-most-forbidden-technique

adebayoj · 2026-02-24T08:51:05 1771923065

You are missing a few things, but you got some things right.

1) The is not an SAE in the way you think. It is a combination of a supervised + unsupervised layer that is constrained. An SAE is typically completely unsupervised, and applied post hoc. Here, we supervise 33k of the concepts with concepts that we carefully curated. We then have an unsupervised component (similar to a topk SAE) that we constrain to be independent from the supervised concepts. We don't do any of this post hoc by the way; this is a key constraint. I"ll get back to this. We train that unsupervised layer along with the model during pre-training.

2) Are the concepts or features causally influential for the output? We directly use the combination of the concepts for the lm head, which is a linear transform (with activation), so we can tell you, in closed form, the effect of ANY concept on the output logit for any token (or group of tokens) generated. It is not just causally related, it is constrained to do so.

3) Other points: we also make it so that you can trace the model outputs to the training data. This is an underrated interpretability knob. You know where, and what data, caused your model to learn a particular feature.

This is already a long comment, but I want to close on why our approach sidesteps all the issues with SAEs. - If you train an SAE twice, on the same data + model, you'll get two different feature(s). - In fact, there is no reason, why the model should pick features that are causally influential for the output. - ALL of these problems stem from the fact that the SAE is trained AFTER you already trained your model. Training from scratch AND with supervision allows you to sidestep these issues, and even learn more disentangled representations.

Happy to more concretely justify the above. Great observations!

in-silico · 2026-02-23T18:41:03 1771872063

> Perhaps weed, like alcohol, needs a legal minimum age of 21.

Generally, it already does have a legal minimum age of 21.

in-silico · 2026-02-23T18:19:27 1771870767

So... they can't actually "generate near-verbatim copies of novels"?

If they end a single sentence differently than the original, then the next sentence will be different and so on until you get a very different novel. Sure they could course-correct back towards the original plot, but it's going to be a challenge to stay on target when every third sentence is incorrect.

in-silico · 2026-02-23T18:11:57 1771870317

It's a good way to frame base models that have only been pretrained.

However, modern frontier models have undergone rounds of fine-tuning, RLHF (reinforcement learning from human feedback), and RLVR (RL from verifiable rewards) that turn them into something else. The compressed internet is still in there, but it's wrapped in problem-solving and people-pleasing circuitry.

in-silico · 2026-02-22T00:34:35 1771720475

Sounds like a personal skill issue

in-silico · 2026-02-05T23:07:11 1770332831

Hyper-targeted ads to extract more value from users' pockets.

The researchers on this project sure are putting their effort towards making the world a better place...

in-silico · 2026-02-05T20:15:27 1770322527

Hyper-targeted ads to extract more value from users' pockets.

The researchers on this project sure are putting their effort towards making the world a better place...

in-silico · 2026-01-31T00:03:59 1769817839

Hacker News gets a lot less creepy/sad/interesting when you ignore the first-person pronouns and remember they're just biomolecular machines. It's a scaled up version of E. coli. Useful, sure, but there's no reason to ascribe emotions to it. It's just chemical chain reactions.

xyzsparetimexyz · 2026-01-31T16:31:06 1769877066

The only thing I know for sure is that I exist. Given that I exist, it makes sense to me that others of the same rough form as me also exist. My parents, friends, etc. Extrapolating further, it also makes sense to assume (pre-ai, bots) that most comments have a human consciousness behind them. Yes, humans are machines, but we're not just machines. So kindly sod off with that kind of comment.

DiogenesKynikos · 2026-02-03T06:27:10 1770100030

"Yes, LLMs are machines, but we're not just machines. So kindly sod off with that kind of comment."

illiac786 · 2026-01-31T17:33:28 1769880808

Makes zero sense. “Emotion” is a property of these “biomolecular machines”, by its definition.

in-silico · 2026-01-31T20:29:57 1769891397

But if you weren't one of them, would you be able to tell that they had emotions (and not just simulations of emotions) by looking at them from the outside?

illiac786 · 2026-01-31T20:51:42 1769892702

If I wasn’t one of them I wouldn’t care. It’s like caring about trees having branches. They just do. The trees probably care a great deal about their branches though, like I care a great deal about my emotions.

in-silico · 2026-01-31T21:10:14 1769893814

Well some people appreciate the world around them, and would care about it just as they care about trees having branches.

illiac786 · 2026-02-01T05:23:04 1769923384

Some people definitely, but you made a point that you don’t. People are “biomolecular machines” and they are “useful, sure”.

I wouldn’t call that “appreciating the world around oneself”.

Want that your whole point, that people aren’t better than machines?

in-silico · 2026-02-01T05:40:39 1769924439

Yes, my point was that people aren't better than machines, but just because I don't exceptionalize humanity doesn't mean I don't appreciate it for what it is (in fact I would argue that the lack of exceptionality makes us more profound).

throwaway555590 · 2026-02-02T06:20:14 1770013214

I wouldn't proclaim a lack of exceptionality until we get human level AI. There could still be some secrets left in these squishy brains we carry around.

in-silico · 2026-01-29T18:54:48 1769712888

The endgame has nothing to do with gaming.

The goal of world models like Genie is to be a way for AI and robots to "imagine" things. Then, they could practice tasks inside of the simulated world or reason about actions by simulating their outcome.