Please generate an image with NO dogs

isyasad@lemmy.world · 5 days ago

Please generate an image with NO dogs

brucethemoose@lemmy.world · 5 days ago

Mistral likely does “prompt enhancement,” aka feeding your prompt to an LLM first and asking it to expand it with more words.

So internally, a Mistral text LLM is probably writing out “sure! Here’s a long prompt with no dog: …” and then that part is fed to the image generator.

Other “LLMs” are truly multimodal and generate image output, hence they still get the word “dog” in the input.