Not true! Mistral is really really good, but I agree that there isn't a single d...

culi · 2026-02-20T22:38:41 1771627121

Mistral is cool and I wish them success but it consistently ranks extremely low on benchmarks while still being expensive. Chinese models like DeepSeek might rank almost as low as Mistral but they are significantly cheaper. And Kimi is the best of both worlds with incredible benchmark results while still being incredibly cheap

I know things change rapidly so I'm not counting them out quite yet but I don't see them as a serious contender currently

thot_experiment · 2026-02-21T03:14:43 1771643683

Sure, benchmarks are fake and I use Mistral over equivalently sized models most of the time because it's better in real life. It runs plenty fast for me, I don't pay for inference.

BoredomIsFun · 2026-02-21T09:40:07 1771666807

> it consistently ranks extremely low on benchmarks

As general purpose chatbots small Mistral models are better than comparably sized Chiniese models, as they have better SimpleQA scores and general knowledge of Western culture.

seanmcdirmid · 2026-02-21T15:18:18 1771687098

It’s really hard to beat qwen coder, especially for role play where the instruction following is really useful. I don’t think their corpus is lacking in western knowledge, although I wonder if Chinese users get even better results from it?

BoredomIsFun · 2026-02-21T15:34:29 1771688069

> It’s really hard to beat qwen coder, for role play

I am not sure if you actually tried that. Mistrals are widely asccepted go-to models for roleplay and creative writing. No Qwens are good at prose, except for their latest big Qwen 3.5.

> I don’t think their corpus is lacking in western knowledge,

It absolutely does, especially pop culture knowledge.

seanmcdirmid · 2026-02-21T16:37:21 1771691841

Instruct and coder just follow instructions so well though. I guess I’ve just never been able to make mistral work well, I guess.

BoredomIsFun · 2026-02-21T17:44:40 1771695880

Qwen3 30B A3B and that big 400+ B Coder were absolutely terrible at editing fiction. I would tell them what to change in the prose and they'd just regurgitate text with no changes.

seanmcdirmid · 2026-02-22T01:03:19 1771722199

Did you try asking Gemini what model to use and how to configure/set it up? It has worked wonders for me, ironically (since I’m using a big model to setup smaller local models).

BoredomIsFun · 2026-02-22T07:04:39 1771743879

> Did you try asking Gemini what model to use and how to configure/set it up?

That would besuboptimal, as Gemini has too old knowledge cutoff. I am long past the need for such an advice anyway, as I've been using local models since mid 2024.

seanmcdirmid · 2026-02-22T11:34:12 1771760052

Gemini will search the web for most things (at least if you are using it via the web search interface), it isn’t limited to the knowledge it was trained on. Actually, I’m a bit mortified that not everyone knows this. If you ask Gemini (from the search interface) about a current event that happened yesterday, they will use search to pull in context and work with that. Also about model that was released yesterday, it can do that.

It’s only a very low level model access where search isn’t used. Local models also need to be configured to use search, and I haven't had a use case to do that yet.

Gemini seems to call this “grounding with google search”. If you have Gemini installed in your enterprise, it will also search internal data sources for context.

BoredomIsFun · 2026-02-22T19:22:03 1771788123

> Gemini will search the web for most things (at least if you are using it via the web search interface), it isn’t limited to the knowledge it was trained on.

If decides to do so, and even then baked in knowledge would influence the result.

In any case I do not need Gemini or any other LLMs to figure out setting for my llama.cpp, thank you very much.

seanmcdirmid · 2026-02-22T20:26:26 1771791986

It has always searched the web for me, and it can give me pretty good guidance about a model released in the last week. All models ATM are trying to reduce dependence on internal knowledge mostly through RAG. Anyways, this part of LLMs has gotten much better in the last 6 months.

If you are able to figure out the right settings for a model Thats was released last week, then great for you! But it sounds like you just don’t trust LLMs to use current knowledge, and have some misconception about how they satisfy recent knowledge requests.

Eupolemos · 2026-02-21T01:01:06 1771635666

Why are you talking price when we are talking local AI?

That doesn't make any sense to me. Am I missing something?

dirasieb · 2026-02-21T08:00:23 1771660823

15 missed calls from your local power company

culi · 2026-02-21T07:06:18 1771657578

Your electricity is free?

seanmcdirmid · 2026-02-21T15:18:57 1771687137

Apple silicon is crazy efficient as well as being comparable to GPUs in performance for max and ultra chips.

cpburns2009 · 2026-02-21T14:12:47 1771683167

If you have the hardware to run expensive models, is the cost of electricity much of a factor? According to Google, the average price in the Silicon Valley Area is $0.448 per kWh. An RTX 5090 costs about $4,000 and has a peak power consumption of 1000 W. Maxing out that GPU for a whole year would cost $3,925 at that rate. It's not particularly more expensive than that hardware itself.

culi · 2026-02-21T18:15:55 1771697755

At that point it'd be cheaper to get an expensive subscription to a cloud platform AI product. I understand the case for local LLMs but it seems silly to worry about pricing for cloud-based offerings but not worry about pricing for locally run models. Especially since running it locally can often be more expensive

thot_experiment · 2026-02-21T20:21:23 1771705283

for almost the entire year, yes.

ac29 · 2026-02-22T18:13:38 1771784018

Arcee is working on that, see a blog post about their newest in progress model here: https://www.arcee.ai/blog/trinity-large

Its still not fully post trained and its a non-reasoning model, but its worth keeping an eye on if you dont want to use the Chinese models that currently are the best open-weight options.

CamperBob2 · 2026-02-21T04:33:02 1771648382

To be fair there are lots of worse models than OpenAI's GPT-OSS-120b. It's not a standout when positioned next to the latest releases from China, but prior to the current wave it was considered one of the stronger local models you can reasonably run.