You are missing an important detail here - number of tokens - yes, you have 50 "...

		visarga on March 15, 2024 \| parent \| context \| favorite \| on: Quiet-STaR: Language Models Can Teach Themselves t... You are missing an important detail here - number of tokens - yes, you have 50 "steps" in network depth, but you could have extra tokens. Assuming you don't run out of tape, there is no reason for LLMs to be limited to simple operations.