Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

“You” is completely unnecessary. What needs to be defined is the content of the language being modeled, not the model itself.

And if there is an attempt to define the model itself, then this definition should be correct, should not contradict anything and should be useful.

Otherwise it’s just dead code, waiting to create problems.



I definitely agree with this.

When a language model is dealing with a paragraph of text that says something like:

   You are standing in an open field west of a white house, with a boarded front door.
   There is a small mailbox here.
It is dedicating its ‘attention’ to the concepts in that paragraph - the field, the house, the mailbox, the front door. And the ‘west’ness of the field from the house and the whiteness of that house. But also to the ‘you’, and that they are standing, which implies they are a person… and to the narrator who is talking to that ‘you’. That that narrator is speaking in English in second person present tense, in a style reminiscent of a text adventure…

All sorts of connotations from this text activating neurons with different weights making it more or less likely to think that the word ‘xyzzy’ or ‘grue’ might be appropriate to output soon.

Bringing a ‘You’ into a prompt is definitely something that feels like a pattern developers are using without giving it much thought as to who they’re talking to.

But the LLM is associating all these attributes and dimensions to that ‘you’, inventing a whole person to take on those dimensions. Is that the best use of its scarce attention? Does it help the prompt produce the desired output? Does the LLM think it’s outputting text from an adventure game?

Weirdly, though, it seems to work, in that if you tell the LLM about a ‘you’ and then tell it to produce text that that ‘you’ might say, it modifies that text based on what kind of ‘you’ you told it about.

But that is a weird way to proceed. There must be others.


> “You” is completely unnecessary.

It isn't, for at least two main reasons:

1) In LLMs, every token has some degree of influence on the output. Starting the prompt with "You" and writing it in second person attracts the model towards specific volumes in the latent space. This can have good or bad impact on the output, depending on the model.

2) Instruct-type models are fine-tuned to respond to second-person prompts. "You"-prompts are what those models expect. If you're working with a model that isn't instruction-tuned, use whatever you want.


Have you tried removing it and checking the results? Could it be that this is a cargo cult, people using You, simply because it was present in the ChatGPT prompt at the time it got leaked?


> Otherwise it’s just dead code, waiting to create problems

it's very possible that the pretense improves results: most recorded interactions /are/ between two people, after all.


Examples: HNN, StackOverflow, Reddit...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: