Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Isn’t that why they call it “ Semi-Private”?

There’s a fully private test set too as I understand it, that o3 hasn’t run on yet.



And o3 will not run on the private set unless it is a truly free and open source model (presumably also the case for ARC-AGI-2). This is the distinction between private and semi-private. In private you provide all the knowledge/weights/logic to operate without any external communication. Private benchmark results are the only true evaluation of performance on any benchmark -- reserved for a final evaluation. It is the only way to prevent shenanigans.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: