> Co-pilot gets to watch people figure stuff out There's a reason most jobs requ...

exe34 · on March 16, 2024

> You may read a book (or as an LLM ingest a ton of training data) and think you understand it, or the lessons it teaches, but it's not until the rubber hits the road and you try to do it yourself, and it doesn't go to plan, that you realize there are all sorts of missing detail and ambiguity, and all the fine advice in that programming book or stack overflow discussion doesn't quite apply to your situation, or maybe it appears to apply but for subtle reasons really doesn't.

Pre-training is comparable to reading the book. RLHF, and storing all the lifetime prompts and outputs would be comparable to "learning on the job". There are also hacks like the Voyager minecraft paper.

HarHarVeryFunny · on March 16, 2024

> storing all the lifetime prompts and outputs would be comparable to "learning on the job"

I'm not sure.

I guess we're talking about letting the LLM loose in a programming playground where it can be given requirements, design and write programs, test and debug them, with all inputs and outputs recorded for later off-line pre-training/fine-tuning. For this to be usable as training data, I guess it would have to be serialized text - basically all LLM interactions with tools (incl. editor) and program done via the console (line editor, not screen editor!).

One major question is how would the LLM actually use this to good effect? Training data is normally used to "predict next word", with the idea being that copying the most statistically common pattern is a good thing. A lot of the interactions between a fledgling programmer and his/her notes and tools are going to be BAD ideas that are later corrected and learnt from... not actions that really want to be copied. Perhaps this could be combined with some sort of tree-of-thoughts approach to avoid taking actions leading to bad outcomes, although that seems a lot easier said than done (e.g. how does one determine/evaluate a bad outcome without looking WAY ahead).