I am a product of the impressions left by massive heaps of copyrighted content. One song on the radio is just a rhetorical device.
If OpenAI rented all humanity’s media from a library and used them to train an AI model then that seems 100% ethical to me.
Now if you ask the model to recite the script to Breaking Bad and it does so perfectly and I think that grants me copyright authority over it then we’re going to have problems. It’s just not the model or tool’s problem.
You’re lost in the weeds. I know that’s the point it’s why the whole song on the radio thought experiment got brought up. The question was, if an AI model trains on public radio waves, and hears a copyrighted song, is that infringement? My position is no, it’s not because the radio station had a license to broadcast that song on the radio.
Similar, if all the books used to train a model are available in the library, so long as someone rents the books, then they can be used to train a model.
The question was directed at you. I don’t know why you’re repeating it back to me like I didn't know what I was asking…
This LLMs are not trained on the radio, especially not exclusively.
Edit0: Did you get access to this book that inspired you legally, as in at a library, during a class, or having bought it yourself? Was it fair use?
Because none of those look like getting heaps of copyrighted stuff for free and claiming you didn't.
If instead you stole the book - then yes, this is similar. I don't care what you do with it - can't steal it and assume you are respecting copyright.