Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not sure whether the courts will determine if this is fair use or not. It's a difficult concept to define rigorously. If it's found to be fair, I'm arguing that its effect on work cannot be considered harmful to society in the long term.

"you didn't do most of the work, you just altered it, and then want to claim credit (and legal ownership) for it wholesale" is a good description of all scientific research that has ever occurred.



> "you didn't do most of the work, you just altered it, and then want to claim credit (and legal ownership) for it wholesale" is a good description of all scientific research that has ever occurred.

And this is why we shouldn't allow copyrights or patents for broad concepts, only for specific methodologies/ implementations/ designs (patent) and exact works (copyright). But in this case they are not starting with a broad scientific principle like e=mc2, they are starting with an exact work/ book.

Learning the concepts taught in a textbook and then making your own textbook fro the generally-derived knowledge is ok.

Taking the textbook and creating a list of notes, and then making a textbook from those notes, is copyright infringement and plagiarism.

ML models are not learning anything about the concepts behind the works they analyze. They are creating patterns of metadata about the content of the work itself, not the concepts therein, and creating an approximation of the content from those patterns in turn. This is directly akin to the metadata notes approach.

A more direct analogy would be reverse-engineering of software.

Legal RE is entirely possible, but requires a strict non-analysis of the software content itself, only the inputs and outputs can be analyzed. Never the software itself.

If you examine the content of the software, even if you do not actually copy any code, you are (potentially) committing copyright infringement or even IP theft.

All ML models do the latter; they look at the book/ song/ art content itself, and then attempt to produce something with similar effects.

Is the case a sure win against OpenAI? Not at all; courts are notoriously non-technical, and prosecutors may also not have the technical knowledge to properly contextualize the actions of OpenAI in a way the judge(s) can understand.

But do I think that based on all evidence it should be a sure loss for OpenAI? Absolutely.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: