Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>Frontier Math (25% on high compute, previous 2%)

This is so insane that I can't help but be skeptical. I know FM answer key is private, but they have to send the questions to OpenAI in order to score the models. And a significant jump on this benchmark sure would increase a company's valuation...

Happy to be wrong on this.



Nope, makes sense to me. Seems unreasonable to conclude the dataset is not compromised now.


the question is whether that 25% jump is also because of the compromised first test.


viewed from a skeptical lens of incentives:

openai and epochai are both startups with every incentive to peddle this narrative. when no one else can independently verify.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: