Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What about the human reinforcement part of it?


There's no traditional human reinforcement.

Models like gpt3 get turned into models like ChatGPT through RLHF (reinforcement learning from human feedback), by fine-tuning the model further on prompts in the style we'd like them to respond in, typically

User: question

Bot: Response

This is done by handcrafting or modifying data from places like stack exchange.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: