Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't think local as it stands with browsers will take off simply from the lead time (of downloading the model), but a new web API for LLMs could change that. Some standard API to communicate with the user's preferred model, abstracting over local inference (like what Chrome does with Gemini Nano (?)) and remote inference (LM Studio or calling out to a provider). This way, every site that wants a language model just has to ask the browser for it, and they'd share weights on-disk across sites.


It sounds good, but I'm not sure that in practice sites will want to "let go" of control this way, knowing that some random model can be used. Usually sites with chatbots want a lot of control over the model behaviour, and spend a lot of time working on how it answers, be it through context control, guardrails or fine tuning and base model selection. Unless everyone standardizes on a single awesome model that everyone agrees is the best for everything, which I don't see happening any time soon, I think this idea is DOA.

Now I could imagine such an API allowing to request a model from huggingface for example, and caching it long term that way, yes just like LM Studio does. But doing this based on some external resource requesting it, vs you doing it purposefully, has major security implications, not to mention not really getting around the lead time problem you mention whenever a new model is requested.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: