The Lone Banana Problem

The example here is trying to get Midjourney to produce an image of a single banana—as opposed to a bunch of two or more. I’ve hit this class of problem many times in my use of Midjourney, and as the article points out, subtle differences in the prompt can lead to success. A Hacker News commenter pointed out another case, “Three cats in a trenchcoat standing on each other’s shoulders, pretending to be a human, Vincent Adultman style,” which is also practically impossible to get out of Midjourney.

Another commenter points out that the language models front-ending tools like Midjourney are small and quite limited compared to LLM models like LLaMA, let alone GPT-4, and points to this paper where a larger model enhances prompt understanding.