This is slick -- It's not explicitly documented anywhere but I hope OpenAI has the necessary callbacks to terminate generation when the API stream is killed rather than continuing in the background until another termination condition happens? I suppose one could check this via looking at API usage when a stream is killed early.
Yeah I did a CLI tool for talking to ChatGPT. I'm pretty sure they stop generating when you kill the SSE stream, based on my anecdotal experience of keeping ChatGPT4 costs down by killing it as soon as i get the answer I'm looking for. You're right that it's undocumented behavior though, on a whole the API docs they give you are as thin as the API itself.
I'm skeptical that the streaming API would really save that much cost. In my experience the vast majority of all tokens used are input tokens rather than completed tokens.
We're talking about the streaming API which streams generated text token by token, not the normal one-shot API. I have no insider knowledge but would agree with your intuition on the normal API.