OpenAI spent approximately $1,503,077 to smash the SOTA on ARC-AGI with their ne...

rvnx · on Dec 20, 2024

It sounds like they essentially brute-forced the solutions ? Ask LLM for answer, answer for LLM to verify the answer. Ask LLM for answer, answer for LLM to verify the answer. Add a bit of randomness. Ask LLM for answer, answer for LLM to verify the answer. Add a bit of randomness. Repeat 5B times (this is what the paper says).

farts_mckensy · on Dec 21, 2024

Evolution itself is the ultimate brute-force algorithm—it’s just applied over millennia. Trial and error, coupled with selection and refinement, is the only way to generate novelty when there’s no clear blueprint.

bluecoconut · on Dec 20, 2024

By my estimates, for this single benchmark, this is comparable cost to training a ~70B model from scratch today. Literally from 0 to a GPT-3 scale model for the compute they ran on 100 ARC tasks.

I double checked with some flop estimates (P100 for 12 hours = Kaggle limit, they claim ~100-1000x for O3-low, and x172 for O3-high) so roughly on the order of 10^22-10^23 flops.

In another way, using H100 market price $2/chip -> at $350k, that's ~175k hours. Or 10^24 FLOPs in total.

So, huge margin, but 10^22 - 10^24 flop is the band I think we can estimate.

These are the scale of numbers that show up in the chinchilla optimal paper, haha. Truly GPT-3 scale models.

rfoo · on Dec 20, 2024

Pretty sure this "cost" is based on their retail price instead of actual inference cost.

neuroelectron · on Dec 20, 2024

Yes that's correct and there's a bit of "pixel math" as well so take these numbers with a pinch of salt. Preliminary model sizes from the temporarily public HF repository puts the full model size at 8tb or roughly 80 H100s

az226 · on Dec 21, 2024

I thought that was a fake.

neuroelectron · on Dec 21, 2024

I didn't hear that but it could be. But it doesn't matter really because there's so much more to consider in the cost, R&D, including all the supporting functions of a model like censorship and data capture and so on.

ec109685 · on Dec 21, 2024

Yeah and can run off peak, etc.

Does seem to show an absolutely massive market for inference compute…