yeah, actually I wanted to see if this was possible at all. I managed to get aro...

derstander · 2026-02-22T01:32:14 1771723934

*32MB of RAM (plus 4MB of video RAM and a little sound and IOP memory).

SmithRenaldo · 2026-02-24T00:07:41 1771891661

The $5/hr B200 rate is fine for training, but cloud latency usually breaks real-time signal processing. I’ve been hitting similar walls with MemeRadar; when you're processing high-volume spikes, the bottleneck is memory bandwidth, not raw TFLOPS. Quantizing to fit L3 cache is an option, but you lose the precision needed for spotting subtle rug-pull patterns. For 24/7 production workloads, local hardware TCO usually beats cloud rentals.

eleventyseven · 2026-02-22T05:46:12 1771739172

> I don't have 30k bucks to spare on a gpu :(

Do you have $2/hr to rent an RTX 6000 96GB or $5/hr for B200 180GB on the cloud?

superkuh · 2026-02-22T05:47:57 1771739277

I'd rather not give money to scalper barons if I can avoid it. Fab capacity is going to that for rental rather than hardware for humans.

xaskasdf · 2026-02-22T15:00:00 1771772400

I thought about that, but idk if they allow me to modify the linux kernel and nvidia cuda kernel at all

jonassm · 2026-02-22T17:37:01 1771781821

In those systems you could probably leverage something like Nvidia SCADA or GDS directly.

xaskasdf · 2026-02-22T20:07:28 1771790848

Actually since they have direct GDS it should perform really well on professional gpus

green-salt · 2026-02-22T17:23:11 1771780991

I think you can do a bunch of that on Digitalocean's GPU droplets.

anoncow · 2026-02-22T04:14:29 1771733669

3000 tokens per sec on 32 mb Ram?

fc417fc802 · 2026-02-22T04:56:21 1771736181

fast != practical

You can get lots of tokens per second on the CPU if the entire network fits in L1 cache. Unfortunately the sub 64 kiB model segment isn't looking so hot.

But actually ... 3000? Did GP misplace one or two zeros there?

xaskasdf · 2026-02-22T15:48:40 1771775320

I wondered the same, but the rendering seems right, the output was almost instant. I'll recheck the token counter; anyway as you say, fast isn't practical. Actually I had to develop my own tiny model https://huggingface.co/xaskasdf/brandon-tiny-10m-instruct to fit something "usable", and it's basically a liar or disinformation machine haha