https://dugas.ch/artificial_curiosity/GPT_architecture.html
I hoped it would be simple enough for anyone who knows a bit of math / algebra to understand. But note that it doesn't go into the difference between GPT-3 and ChatGPT (which adds a RL training objective, among other things).
https://dugas.ch/artificial_curiosity/GPT_architecture.html
I hoped it would be simple enough for anyone who knows a bit of math / algebra to understand. But note that it doesn't go into the difference between GPT-3 and ChatGPT (which adds a RL training objective, among other things).