Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not true! Mistral is really really good, but I agree that there isn't a single decent open model from the USA.
 help



Mistral is cool and I wish them success but it consistently ranks extremely low on benchmarks while still being expensive. Chinese models like DeepSeek might rank almost as low as Mistral but they are significantly cheaper. And Kimi is the best of both worlds with incredible benchmark results while still being incredibly cheap

I know things change rapidly so I'm not counting them out quite yet but I don't see them as a serious contender currently


Sure, benchmarks are fake and I use Mistral over equivalently sized models most of the time because it's better in real life. It runs plenty fast for me, I don't pay for inference.

> it consistently ranks extremely low on benchmarks

As general purpose chatbots small Mistral models are better than comparably sized Chiniese models, as they have better SimpleQA scores and general knowledge of Western culture.


It’s really hard to beat qwen coder, especially for role play where the instruction following is really useful. I don’t think their corpus is lacking in western knowledge, although I wonder if Chinese users get even better results from it?

> It’s really hard to beat qwen coder, for role play

I am not sure if you actually tried that. Mistrals are widely asccepted go-to models for roleplay and creative writing. No Qwens are good at prose, except for their latest big Qwen 3.5.

> I don’t think their corpus is lacking in western knowledge,

It absolutely does, especially pop culture knowledge.


Instruct and coder just follow instructions so well though. I guess I’ve just never been able to make mistral work well, I guess.

Qwen3 30B A3B and that big 400+ B Coder were absolutely terrible at editing fiction. I would tell them what to change in the prose and they'd just regurgitate text with no changes.

Did you try asking Gemini what model to use and how to configure/set it up? It has worked wonders for me, ironically (since I’m using a big model to setup smaller local models).

> Did you try asking Gemini what model to use and how to configure/set it up?

That would besuboptimal, as Gemini has too old knowledge cutoff. I am long past the need for such an advice anyway, as I've been using local models since mid 2024.


Gemini will search the web for most things (at least if you are using it via the web search interface), it isn’t limited to the knowledge it was trained on. Actually, I’m a bit mortified that not everyone knows this. If you ask Gemini (from the search interface) about a current event that happened yesterday, they will use search to pull in context and work with that. Also about model that was released yesterday, it can do that.

It’s only a very low level model access where search isn’t used. Local models also need to be configured to use search, and I haven't had a use case to do that yet.

Gemini seems to call this “grounding with google search”. If you have Gemini installed in your enterprise, it will also search internal data sources for context.


> Gemini will search the web for most things (at least if you are using it via the web search interface), it isn’t limited to the knowledge it was trained on.

If decides to do so, and even then baked in knowledge would influence the result.

In any case I do not need Gemini or any other LLMs to figure out setting for my llama.cpp, thank you very much.


It has always searched the web for me, and it can give me pretty good guidance about a model released in the last week. All models ATM are trying to reduce dependence on internal knowledge mostly through RAG. Anyways, this part of LLMs has gotten much better in the last 6 months.

If you are able to figure out the right settings for a model Thats was released last week, then great for you! But it sounds like you just don’t trust LLMs to use current knowledge, and have some misconception about how they satisfy recent knowledge requests.


Why are you talking price when we are talking local AI?

That doesn't make any sense to me. Am I missing something?


15 missed calls from your local power company

Your electricity is free?

Apple silicon is crazy efficient as well as being comparable to GPUs in performance for max and ultra chips.

If you have the hardware to run expensive models, is the cost of electricity much of a factor? According to Google, the average price in the Silicon Valley Area is $0.448 per kWh. An RTX 5090 costs about $4,000 and has a peak power consumption of 1000 W. Maxing out that GPU for a whole year would cost $3,925 at that rate. It's not particularly more expensive than that hardware itself.

At that point it'd be cheaper to get an expensive subscription to a cloud platform AI product. I understand the case for local LLMs but it seems silly to worry about pricing for cloud-based offerings but not worry about pricing for locally run models. Especially since running it locally can often be more expensive

for almost the entire year, yes.

Arcee is working on that, see a blog post about their newest in progress model here: https://www.arcee.ai/blog/trinity-large

Its still not fully post trained and its a non-reasoning model, but its worth keeping an eye on if you dont want to use the Chinese models that currently are the best open-weight options.


To be fair there are lots of worse models than OpenAI's GPT-OSS-120b. It's not a standout when positioned next to the latest releases from China, but prior to the current wave it was considered one of the stronger local models you can reasonably run.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: