Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does anyone have a feeling for how latency (from asking a question/API call to getting an answer/API return) is progressing with new models? I see 1.3 minutes/task and 13.8 minutes/task mentioned in the page on evaluating O3. Efficiency gains that also reduce latency will be important and some of them will come from efficiency in computation, but as models include more and more layers (layers of models for example) the overall latency may grow and faster compute times inside each layer may only help somewhat. This could have large effects on usability.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: