Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's not saturated. 85% is average human performance, not "best human" performance. There is still room for the model to go up to 100% on this eval.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: