the practical value here is for regulated domains. in healthcare and finance you...

adebayoj · 2026-02-24T09:14:41 1771924481

Good point. Historically, people have thought that there is a interpretability vs quality/performance tax. This is not true; at least not in this case.

Here are a bunch of questions you can answer without any quality degradation with interpretable models: 1) what part of the input context led to the output chunk that the model generated? 2) what part of the training data led to the output chunk?

In this case, we go more invasive, and actually constrain the model to also use human understandable concepts in its representations. You might think this leads to quality trade-offs. However, if you allow for the model to discover its own concepts as well (as long as they are not duplicates of the concepts you provided it), you don't see huge degradation.

I agree with the other commenters that this now gives us a huge boost in debugging the model.

snowhale · 2026-02-24T06:02:21 1771912941

the quality tax framing might actually undersell the value in regulated domains. if a hospital system can't deploy without explainability, a model that scores 95% and can trace its reasoning beats one that scores 97% and can't. the baseline isn't 'interpretable model vs better model' -- it's 'interpretable model vs no model at all.'

luulinh90s · 2026-02-24T05:43:14 1771911794

in the "Performance" section of the post: https://www.guidelabs.ai/post/steerling-8b-base-model-releas..., the authors show the model lags behind llama 8b but worth noting that llama 8b trained on > 2x more computes (see the FLOPs axis)

adebayoj · 2026-02-24T09:29:04 1771925344

Thanks for pointing this out. LLama 3 8B was trained on ~15T tokens. The Qwen models on 15-18T tokens as well. We trained on 1.35T tokens, and are within shot of these models on benchmarks. We expect to, at the very minimum, match these models' performance when we scale our token budget.

One side effect that we are excited about is that interpretable model training might make for a data efficient training process.