These tools are amazing for prototyping. I had an idea for a promotional poster, and seeing my idea just by writing it felt like magic. The generated image had too many artifacts to use, but gave me a guideline to follow when creating the real thing in Pixlr.
AI content generation (text, image, source code, video, music) will be a huge boon for prototyping where applied judiciously.
Google's product is vaporware and we shouldn't afford them any airtime until they release something usable. They're just trying to butt in and get press off of the backs of the teams actually working in the open, and that's super lame.
Release your model, Google, or stop bragging and talking over the others here. You're greedily sucking oxygen out of the conversation, and as a trillion dollar monopoly you don't deserve anything for free off of the backs of others. Not when you're not contributing. Stop being the rich kid talking over everyone else about how awesome your toys are.
Anyhow, the real story is Stable Diffusion. They're actively demonstrating the correct way to run this as opposed to the entirely closed OpenAI DALL-E or the (again vaporware) Google non-product.
Even MidJourney uses Stable Diffusion under the hood, using sophisticated prompt engineering to make their product distinct and powerful.
I feel there's a strong argument to be made that these organizations should be required to release these models publicly. These are built on the works of the public at large, and the public should get the full benefit of them.
Whatever effort Google has put into building the model is infinitesimally small compared to the work of the creators they're harvesting.
I don't expect this to happen easily, if at all, but I'm strongly in favor of it, and would even support legislation to that effect.
They are afraid of being sued because they are using all the images they have scraped on all the website ever created. They are probably even using images not publicly available.
Well… midjourney used stable diffusion (with an additional guidance model I believe, not just prompt engineering) for their beta model which they already closed down again… it’s back to their old far inferior model.
The rumours are that it was too good at generating nudity for their comfort, and in particular that some users may have combined that with younger subjects.
I kind of get the sentiment about openness but I think it's way more nuanced than you are making out.
There are very good reasons for withholding SOTA models, primarily from the info hazard angle and avoiding escalating the capabilities race which is basically the biggest risk we have right now.
Google / Deepmind have actually made some good decisions to try and slow down the race (such as waiting to publish).
I'm not saying they are doing a good enough job, but that doesn't mean their approach isn't entirely without merit.
Even ignoring the infohazard angle if they published everything immediately that would escalate the race. By sitting on their capabilities and waiting for others to publish (e.g. PaLM, Imagen vs GPT-3, DALL-E) they are at least only playing catch up.
Can you talk more about the prompt augmentation the midjourney is doing behind the scenes? It's certainly true that you can put in a two-word phrase like "Time travelers" and get an amazing result back, which reveals just how much your prompt is getting dropped into a prompt soup that also gives it that midjourney look by default.
Yep, I feel that exact way about Nvidia Canvas [1]. It does not produce anything even close to usable as a final product, but it can produce an amazing start to a concept.
This was the first thing I tried with DALL-E. Took some photos of my house where I'm renovating, wiped out the construction debris and told it to fill it in with what I wanted.
It worked okay - one issue was DALL-E wants to keep "style" consistent so any stray bit of debris greatly affected the interpretation, but I did in fact get 1 design idea out of it which changed how I think we'll do a bit of it.
These things in many ways are just extremely enhanced search tools - "describe what you want to see"
AI content generation (text, image, source code, video, music) will be a huge boon for prototyping where applied judiciously.