This is so insane that I can't help but be skeptical. I know FM answer key is private, but they have to send the questions to OpenAI in order to score the models. And a significant jump on this benchmark sure would increase a company's valuation...
Happy to be wrong on this.
openai and epochai are both startups with every incentive to peddle this narrative. when no one else can independently verify.
This is so insane that I can't help but be skeptical. I know FM answer key is private, but they have to send the questions to OpenAI in order to score the models. And a significant jump on this benchmark sure would increase a company's valuation...
Happy to be wrong on this.