When I did Photography at college, a lot of the work was looking at other works ...

rgbrgb · on April 16, 2023

I think it's more analogous to if you had tweaked one of those famous works directly in photoshop then turned it in. The model training likely results in near replicas of some of the training data encoded in the model. You might have a near replica of a famous photograph encoded in your head, but to make a similar photograph you would recreate it with your own tools and it would probably come out pretty different. The AI can just output the same pixels.

That's not to say there aren't other ways you might use the direct image (e.g. collage or sampling in music) but you'll likely be careful with how it's used, how much you tweak it, and with attribution. I think the weird problem we're butting up against is that AFAIK you can't figure out post-facto what the "influence" is from the model output aside from looking at the input (which does commonly use names of artists).

I work on an AI image generator, so I really do think the tech is useful and cool, but I also think it's disingenuous (or more generously misinformed) to compare it to an artist studying great works or taking inspiration from others. These are computers inputting and outputting bits. Another human analog would be memorizing a politician's speech and using chunks of it in your own speech. We'd easily call that plagiarism, but if instead every 3 words were exactly the same? Hard to say... it's both more and less plagiarism.

Just how much do you need to process a sampled work before you need to get permission of the original artist? It seems to be in music that if the copyright holder can prove you sampled them, even if it's unrecognizable, then you're going to be on the hook for some royalties.

simonw · on April 16, 2023

"The model training likely results in near replicas of some of the training data encoded in the model."

I don't think that's true.

My understanding is that any image generated by Stable Diffusion has been influenced by every single parameter of the model - so literally EVERY image in the training data has an impact on the final image.

How much of an impact is the thing that's influenced by the prompt.

One way to think about it: the Stable Diffusion model can be as small as 1.9GB (Web Stable Diffusion). It's trained on 2.3 billion images. That works out as 6.6 bits of data per image in the training set.

AuryGlenz · on April 16, 2023

Right. Apart from some (extremely famous) pieces of art that have been heavily repeated in the dataset you’re not going to be able to come close to recreating something directly.

rgbrgb · on April 16, 2023

Don't you think one of the images could be perfectly or perfectly enough encoded in that 1.9GB though? A funny example is Malevich's Red Square. Highly compressible! [0] Line drawings also can often be compressed to a polynomial.

> My understanding is that any image generated by Stable Diffusion has been influenced by every single parameter of the model - so literally EVERY image in the training data has an impact on the final image.

That's pretty interesting. Need to dig into the math more (lazy applications dev).

[0]: https://en.wikipedia.org/wiki/Red_Square_(painting)

bawolff · on April 17, 2023

Even if true for a small number of edge cases, i don't think that says anything meaningful about the model in general.