• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    Ā·
    5 days ago

    Mistral likely does ā€œprompt enhancement,ā€ aka feeding your prompt to an LLM first and asking it to expand it with more words.

    So internally, a Mistral text LLM is probably writing out ā€œsure! Hereā€™s a long prompt with no dog: ā€¦ā€ and then that part is fed to the image generator.

    Other ā€œLLMsā€ are truly multimodal and generate image output, hence they still get the word ā€œdogā€ in the input.