Isn’t that why they call it “ Semi-Private”? There’s a fully private test set to...

daveguy · on Dec 22, 2024

And o3 will not run on the private set unless it is a truly free and open source model (presumably also the case for ARC-AGI-2). This is the distinction between private and semi-private. In private you provide all the knowledge/weights/logic to operate without any external communication. Private benchmark results are the only true evaluation of performance on any benchmark -- reserved for a final evaluation. It is the only way to prevent shenanigans.