"The model training likely results in near replicas of some of the training data encoded in the model."
I don't think that's true.
My understanding is that any image generated by Stable Diffusion has been influenced by every single parameter of the model - so literally EVERY image in the training data has an impact on the final image.
How much of an impact is the thing that's influenced by the prompt.
One way to think about it: the Stable Diffusion model can be as small as 1.9GB (Web Stable Diffusion). It's trained on 2.3 billion images. That works out as 6.6 bits of data per image in the training set.
Right. Apart from some (extremely famous) pieces of art that have been heavily repeated in the dataset you’re not going to be able to come close to recreating something directly.
Don't you think one of the images could be perfectly or perfectly enough encoded in that 1.9GB though? A funny example is Malevich's Red Square. Highly compressible! [0] Line drawings also can often be compressed to a polynomial.
> My understanding is that any image generated by Stable Diffusion has been influenced by every single parameter of the model - so literally EVERY image in the training data has an impact on the final image.
That's pretty interesting. Need to dig into the math more (lazy applications dev).
I don't think that's true.
My understanding is that any image generated by Stable Diffusion has been influenced by every single parameter of the model - so literally EVERY image in the training data has an impact on the final image.
How much of an impact is the thing that's influenced by the prompt.
One way to think about it: the Stable Diffusion model can be as small as 1.9GB (Web Stable Diffusion). It's trained on 2.3 billion images. That works out as 6.6 bits of data per image in the training set.