More

regularfry · 2026-03-03T17:10:06 1772557806

Production question, then, for those who know about these things: how far ahead would Apple have locked in their prices for buying RAM for this line, for the units that are part of the initial release?

regularfry · 2026-03-03T09:15:43 1772529343

This is a commercial use, isn't it? Might be clumsily worded, but it's out of the running for that reason alone.

regularfry · 2026-03-03T09:13:07 1772529187

> It still has a splash screen and takes quite a long time to load, like an application from the 90s.

Lots of it is single-threaded, which is an endless frustration on a machine with umpteen cores. Especially frustrating given that it means compute happens on the UI thread.

regularfry · 2026-03-02T15:59:57 1772467197

I think you're an order of magnitude out. Motorola shipped 36.6 million handsets total across 2024. They seem to have had 33 handset models available in that period, and they were in profit, so the break-even point is presumably somewhere below 1.1M handsets.

oblio · 2026-03-02T19:31:06 1772479866

If I'm off for the second group I'm probably also off for the first one. I'd be surprised if a purely privacy focused phone sells more than 200k units per year.

regularfry · 2026-03-02T15:32:21 1772465541

The paper doesn't talk about thermal properties, which is a shame. Would be good to know if this is a thermoplastic.

regularfry · 2026-03-01T15:38:17 1772379497

The architecture is also important: there's a trade-off for MoE. There used to be a rough rule of thumb that a 35bxa3b model would be equivalent in smarts to an 11b dense model, give or take, but that's not been accurate for a while.

regularfry · 2026-03-01T14:23:36 1772375016

It's not only "non-Chinese" to think about here. There's nobody really touching Qwen in the single-GPU size class and there hasn't been for a couple of generations.

regularfry · 2026-03-01T14:15:42 1772374542

They've uploaded the fix. If those are still broken something bad has happened.

regularfry · 2026-03-01T14:13:31 1772374411

I've got the unsloth q4_K_XL 35b running in llama.cpp on an i9/64G/4090 machine doing double-digit tokens per second with a 90k+ token context window available. The model's completely in VRAM.

regularfry · 2026-03-01T14:07:18 1772374038

There's also work on ternary models that's quite interesting, because the arithmetic operations are super fast and they're extremely cache efficient. Well worth looking into if that's the sort of thing that interests you.