Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Back when GPT-3 came out, I wanted to understand how it works, so read the papers and made this post:

https://dugas.ch/artificial_curiosity/GPT_architecture.html

I hoped it would be simple enough for anyone who knows a bit of math / algebra to understand. But note that it doesn't go into the difference between GPT-3 and ChatGPT (which adds a RL training objective, among other things).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: